# rds-performance

Part of **RDS**

# ApsaraDB RDS Performance Optimization Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|---------|------------------|
| Slow SQL Queries | Queries exceed expected execution time; visible in SQL Explorer or slow logs | High | Analyze execution plan with EXPLAIN, add missing indexes, optimize query structure |
| Long-Running Queries | Queries run for minutes or hours, consuming CPU/IOPS, blocking other operations | High | Identify via process list, kill if malicious, optimize inefficient DML/DDL, check for metadata locks |
| High I/O Utilization | Elevated disk read/write IOPS, low buffer pool hit ratio, temporary table spills | High | Tune InnoDB buffer pool, avoid large temporary tables, schedule DDL during off-peak, reduce cold data scans |
| Timing Discrepancy Between SQL Explorer and Slow Log | SQL Explorer shows longer duration than slow log entry | Low | Understand that SQL Explorer includes lock wait time; slow log records only execution time |

## Problem Details

### Problem 1: Slow SQL Queries

**Symptoms**
- Error message: Not an error per se, but queries appear in **Slow SQL** or **slow query log** with execution time > `long_query_time` (MySQL) or `log_min_duration_statement` (PostgreSQL)
- Behavior: Application latency increases; specific queries take seconds or minutes to complete
- Context: Occurs during peak load, after schema changes, or when new application features are deployed

**Root Cause**
- Missing or inefficient indexes leading to full table scans (`Seq Scan` in PostgreSQL, `type=ALL` in MySQL)
- Suboptimal query plans due to outdated statistics or complex joins
- Large result sets or unbounded `SELECT *` queries
- Inefficient use of functions on indexed columns

**Solution**
1. Use **SQL Explorer and Audit** (MySQL) or **Autonomy Service > Slow SQL** (PostgreSQL) to identify top slow queries:
   - Console path: `Console > RDS > Instances > [Instance] > SQL Explorer and Audit` (MySQL)
   - Console path: `Console > RDS > Instances > [Instance] > Autonomy Service > Slow SQL` (PostgreSQL)
2. For MySQL, enable and review the **slow query log**; for PostgreSQL, ensure `log_min_duration_statement` is set appropriately (default: 1000 ms)
3. Analyze the execution plan:
   ```sql
   -- For PostgreSQL
   EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM your_table WHERE your_column = value;
   ```
   ```sql
   -- For MySQL (use in session or via performance_schema)
   EXPLAIN FORMAT=JSON SELECT * FROM your_table WHERE your_column = value;
   ```
4. Add missing indexes based on `WHERE`, `JOIN`, and `ORDER BY` clauses:
   ```sql
   CREATE INDEX idx_column ON your_table(your_column);
   ```
5. Rewrite queries to avoid functions on indexed columns, reduce result set size, and use pagination

**Verification**
- Re-run the query and confirm reduced execution time in SQL Explorer
- Check that the execution plan now uses `Index Scan` or `Index Only Scan` instead of `Seq Scan`
- Monitor CPU and IOPS metrics to ensure reduction in resource consumption

### Problem 2: Long-Running Queries

**Symptoms**
- Behavior: Queries remain in `SHOW PROCESSLIST` (MySQL) for extended periods; high CPU or IOPS usage; connection pool exhaustion
- Context: Often triggered by inefficient batch jobs, uncommitted transactions, or SQL injection attacks

**Root Cause**
- Poorly optimized DML statements (e.g., `UPDATE`/`DELETE` without `WHERE` index)
- Uncommitted transactions holding locks
- Metadata lock waits due to concurrent DDL and DML operations
- Malicious or unintended full-table operations

**Solution**
1. Identify long-running queries:
   ```sql
   -- MySQL
   SHOW FULL PROCESSLIST;
   ```
2. For non-critical or malicious queries, terminate them:
   ```sql
   KILL <process_id>;
   ```
3. Optimize batch operations:
   - Break large `UPDATE`/`DELETE` into smaller chunks (e.g., 1,000 rows per transaction)
   - Add `LIMIT` and use indexed columns in `WHERE`
4. Avoid running DDL (e.g., `ALTER TABLE`) during peak hours to prevent metadata lock contention
5. Ensure all application transactions are properly committed or rolled back

**Verification**
- Confirm the query no longer appears in `SHOW PROCESSLIST`
- Monitor connection count and CPU usage to return to baseline
- Validate application functionality post-optimization

### Problem 3: High I/O Utilization

**Symptoms**
- Behavior: High disk read/write IOPS in **Monitoring & Alerts > Standard Monitoring**; low buffer pool hit ratio (< 99%); increased latency
- Context: Occurs after data growth, large report generation, or bulk imports

**Root Cause**
- Frequent reads of cold (non-buffered) data
- Excessive temporary table creation on disk (due to `tmp_table_size` limits or BLOB usage)
- Large DDL operations (e.g., `ALTER TABLE`) causing redo log and data file writes
- High binary log volume from large transactions

**Solution**
1. Navigate to **Performance** page in console:
   - Path: `Console > RDS > Instances > [Instance] > Performance`
2. Check **buffer pool hit ratio**; if low, consider scaling instance or increasing `innodb_buffer_pool_size` (if custom parameters allowed)
3. Reduce disk-based temporary tables:
   - Increase `tmp_table_size` and `max_heap_table_size` (within memory limits)
   - Avoid `SELECT *` with BLOB/TEXT columns in `GROUP BY` or `ORDER BY`
4. Schedule large DDL operations during maintenance windows
5. Optimize queries to reduce full table scans and unnecessary data reads

**Verification**
- Confirm IOPS and disk utilization decrease in **Standard Monitoring**
- Buffer pool hit ratio improves (> 99%)
- Application response time returns to normal

### Problem 4: Timing Discrepancy Between SQL Explorer and Slow Log

**Symptoms**
- Behavior: SQL Explorer shows a query took 5 seconds, but slow log reports 0.2 seconds
- Context: Observed when diagnosing slow queries across both tools

**Root Cause**
- **SQL Explorer** records total time including **lock wait time**, network round-trip, and queueing
- **Slow query log** (MySQL/PostgreSQL) records only **actual SQL execution time** on the database engine

**Solution**
1. Understand that discrepancy is **normal** and not an error
2. Use SQL Explorer to detect **blocking** or **contention** issues (high wait time indicates lock problems)
3. Use slow log to analyze **pure execution efficiency**
4. If lock waits are high, investigate long-running transactions or missing indexes causing row locks

**Verification**
- Cross-reference both tools: high SQL Explorer time + low slow log time → focus on concurrency/locking
- Resolve underlying lock contention (see Problem 2) to reduce total observed latency

## FAQ

**Q: How do I enable slow query logging in ApsaraDB RDS?**  
A: For MySQL, slow query logging is enabled by default. Adjust `long_query_time` via the **Parameters** page in the console. For PostgreSQL, the `log_min_duration_statement` parameter controls slow log threshold (default: 1000 ms). Both logs are accessible via **SQL Explorer** or **Log Management**.

**Q: What is a good buffer pool hit ratio, and how do I check it?**  
A: A healthy InnoDB buffer pool hit ratio is ≥ 99%. You can view it in the **Performance** tab of your RDS instance console under **Engine Metrics**. A low ratio indicates frequent disk reads, suggesting insufficient memory or inefficient queries.

**Q: How can I analyze a query’s execution plan in RDS for PostgreSQL?**  
A: Connect to your instance and run `EXPLAIN (ANALYZE, VERBOSE, BUFFERS)` followed by your query. This shows actual execution time, I/O buffers used, and whether indexes were utilized. Example:  
```sql
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM orders WHERE customer_id = 123;
```

**Q: Why are my batch UPDATE statements causing high I/O?**  
A: Large single-transaction `UPDATE`/`DELETE` operations generate massive redo logs and dirty pages, overwhelming I/O. Split them into smaller chunks (e.g., 1,000 rows per commit) and ensure the `WHERE` clause uses an indexed column.

**Q: How long are slow query logs retained in ApsaraDB RDS?**  
A: Slow query logs are retained for **7 days** by default. You can access them via **SQL Explorer and Audit** (MySQL) or **Autonomy Service > Slow SQL** (PostgreSQL) within this window.