# rds-monitoring

Part of **RDS**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for the following routing skills. If you're unsure which path to take, check the corresponding routing skill:

> - [Optimize database performance using diagnostic tools](../../intent/rds-optimize-performance/SKILL.md)
> - [Monitor and analyze database performance metrics](../../intent/rds-monitor-performance/SKILL.md)

# ApsaraDB RDS Monitoring and Alerts

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|--------|----------|------|
| Monitoring Metrics | Synchronous | Query available and enabled enhanced monitoring metrics for instances. |
| Monitoring Frequency | Synchronous | Modify and query monitoring frequency settings for instances. |
| Query Available Metrics | Synchronous | Get a list of metrics available for monitoring RDS instances. |
| Query Instance Performance | Synchronous | Retrieve performance parameters and metrics for RDS instances. |
| Query Monitoring Data | Synchronous | Fetch detailed monitoring data for RDS instances. |
| Query Slow Logs | Synchronous | Retrieve and analyze slow query logs to identify performance issues. |
| Query Error Logs | Synchronous | Retrieve database error logs for troubleshooting. |
| Manage SQL Audit | Synchronous | Enable/disable SQL collection and manage retention policies. |
| Query SQL Logs | Synchronous | Access SQL audit logs and log files. |

## API Calling Mode

### Authentication
The primary authentication method is bearer token authentication using Alibaba Cloud API credentials.

- **Header format**: `Authorization: Bearer <your_api_key>`
- **Environment variable**: `ALIYUN_API_KEY`
- While other authentication methods exist (such as AccessKey ID/Secret with signature), bearer token authentication is recommended for simplicity and consistency across endpoints.

### Service Endpoint
The APIs use region-specific endpoints following this pattern:

`https://rds.{region}.aliyuncs.com`

Common regions include:
- `cn-hangzhou` (China Hangzhou)
- `cn-shanghai` (China Shanghai) 
- `cn-beijing` (China Beijing)

For international regions, use `https://rds.aliyuncs.com` as the base endpoint.

### Synchronous API Pattern
All monitoring and alerts APIs follow a synchronous calling pattern:

1. Construct a GET or POST request with required parameters
2. Include authentication headers (`Authorization: Bearer <token>`)
3. Send the request to the appropriate regional endpoint
4. Receive an immediate JSON response with the requested data
5. Handle success responses or parse error codes for failures

No polling or async task handling is required since all operations complete immediately.

## Parameter Reference

### Monitoring Metrics

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| Action | String | true | null | null | The operation that you want to perform. Set the value to ModifyDBInstanceMetrics. |
| DBInstanceName | String | true | null | null | The ID of the instance. You can call the DescribeDBInstances operation to query the IDs of instances. |
| MetricsConfig | String | true | null | max 30 metric keys, comma-separated | The keys of the Enhanced Monitoring metrics that you want to display for the instance. You can enter a maximum of 30 metric keys. If you enter multiple metric keys, you must separate the metric keys with commas (,). |
| Scope | String | true | null | one of: instance, region | The application scope of this modification. Valid values: instance, region. |

### Monitoring Frequency

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| Action | String | true | null | null | The operation that you want to perform. Set the value to ModifyDBInstanceMonitor. |
| DBInstanceId | String | true | null | null | The ID of the instance. You can call the DescribeDBInstances operation to query the IDs of instances. |
| Period | String | true | null | one of: 5, 10, 60, 300 | The monitoring frequency that you want to use. Valid values: 5, 10, 60, 300. Unit: seconds. |
| ClientToken | String | false | null | max length 64 characters, ASCII only | The client token that is used to ensure the idempotence of the request. You can use the client to generate the value, but you must make sure that it is unique among different requests. The token can contain only ASCII characters and cannot exceed 64 characters in length. |

### Query Slow Logs

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| DBInstanceId | String | true | null | null | The instance ID. You can call the DescribeDBInstances operation to query the instance ID. |
| StartTime | String | true | null | ISO 8601 format, UTC time | The beginning of the time range to query. Specify the time in the ISO 8601 standard in the yyyy-MM-ddZ format. The time must be in UTC. |
| EndTime | String | true | null | ISO 8601 format, UTC time, max 31 days duration | The end of the time range to query. The end time must be later than the start time. The time span between the start time and the end time cannot exceed 31 days. Specify the time in the ISO 8601 standard in the yyyy-MM-ddZ format. The time must be in UTC. |
| DBName | String | false | null | null | The name of the database. |
| SortKey | String | false | null | one of: TotalExecutionCounts, TotalQueryTimes, TotalLogicalReads, TotalPhysicalReads | The dimension based on which the system sorts the entries to return. Valid values: TotalExecutionCounts, TotalQueryTimes, TotalLogicalReads, TotalPhysicalReads. |
| PageSize | Integer | false | 30 | 30 to 100 | The number of entries per page. Valid values: 30 to 100. Default value: 30. |
| PageNumber | Integer | false | 1 | 1 or higher | The page number. Pages start from 1. Default value: 1. |

### Manage SQL Audit

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| DBInstanceId | String | true | null | null | The instance ID. You can call the DescribeDBInstances operation to query the instance ID. |
| SQLCollectorStatus | String | true | null | one of: Enable, Disabled | Specifies whether to enable the SQL Explorer (SQL Audit) feature. Valid values: Enable, Disabled. |
| ConfigValue | String | true | null | one of: 30, 180, 365, 1095, 1825 | The log retention period that is allowed by the SQL Explorer feature on the instance. Valid values: 30 (30 days), 180 (180 days), 365 (one year), 1095 (three years), 1825 (five years). |
| ClientToken | String | false | null | max 64 characters, ASCII only | The client token that is used to ensure the idempotence of the request. You can use the client to generate the token, but you must make sure that the token is unique among different requests. The token can contain only ASCII characters and cannot exceed 64 characters in length. |

## Code Examples

### Query Monitoring Frequency - Python - All Regions

```python
from aliyunsdkcore.client import AcsClient
from aliyunsdkrds.request.v20140815 import DescribeDBInstanceMonitorRequest

client = AcsClient(
    access_key_id='your-access-key-id',
    access_key_secret='your-access-key-secret',
    region_id='cn-hangzhou'
)

request = DescribeDBInstanceMonitorRequest.DescribeDBInstanceMonitorRequest()
request.set_DBInstanceId('rm-uf6wjk5xxxxxxx')
request.set_ClientToken('ETnLKlblzczshOTUbOCzxxxxxxx')

response = client.do_action_with_exception(request)
print(response)
```

### Query Instance Performance - Python - All Regions

```python
from aliyunsdkcore.client import AcsClient
from aliyunsdkrds.request.v20140815 import DescribeDBInstancePerformanceRequest

client = AcsClient('<your-access-key-id>', '<your-access-secret>', 'cn-beijing')
request = DescribeDBInstancePerformanceRequest()
request.set_DBInstanceId('rm-1234567890abcdefg')
request.set_StartTime('2023-01-01T00:00:00Z')
request.set_EndTime('2023-01-01T01:00:00Z')
request.set_Key('MySQL_NetworkTraffic')
response = client.do_action_with_exception(request)
print(response)
```

### Query Error Logs - Python - All Regions

```python
import json
from aliyunsdkcore.client import AcsClient
from aliyunsdkrds.request.v20140815.DescribeErrorLogsRequest import DescribeErrorLogsRequest

# Initialize client
client = AcsClient(
    '<your-access-key-id>',
    '<your-access-key-secret>',
    'cn-hangzhou'  # Replace with your region
)

# Create request
request = DescribeErrorLogsRequest()
request.set_DBInstanceId('rm-uf6wjk5****')
request.set_StartTime('2011-05-01T20:10Z')
request.set_EndTime('2011-05-30T20:10Z')
request.set_PageSize(30)
request.set_PageNumber(1)

# Send request
response = client.do_action_with_exception(request)
print(json.loads(response))
```

### Query SQL Audit Logs - Python - All Regions

```python
import json
from aliyunsdkcore.client import AcsClient
from aliyunsdkrds.request.v20140815.DescribeSQLLogRecordsRequest import DescribeSQLLogRecordsRequest

# Initialize the client
client = AcsClient(
    '<your-access-key-id>',
    '<your-access-key-secret>',
    'cn-hangzhou'  # Replace with your region
)

# Create the request
request = DescribeSQLLogRecordsRequest()
request.set_DBInstanceId('rm-uf6wjk5****')
request.set_StartTime('2011-06-01T15:00:00Z')
request.set_EndTime('2011-06-06T15:00:00Z')
request.set_PageSize(30)
request.set_PageNumber(1)

# Send the request
response = client.do_action_with_exception(request)
print(json.loads(response))
```

### Query Slow Logs - Python - All Regions

```python
import requests
import json

# API endpoint
url = "https://rds.aliyuncs.com/api/v1/DescribeSlowLogs"

# Request headers
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Request parameters
params = {
    "DBInstanceId": "rm-uf6wjk5****",
    "StartTime": "2011-05-01Z",
    "EndTime": "2011-05-30Z",
    "PageSize": 30,
    "PageNumber": 1
}

# Make the API request
response = requests.get(url, headers=headers, params=params)

# Parse and print the response
if response.status_code == 200:
    data = response.json()
    print(json.dumps(data, indent=2))
else:
    print(f"Error: {response.status_code} - {response.text}")
```

### Query SQL Log Files - Bash - All Regions

```bash
curl -X GET \
  'https://rds.aliyuncs.com/?Action=DescribeSQLLogFiles&DBInstanceId=rm-uf6wjk5****&PageSize=30&PageNumber=1&SignatureMethod=HMAC-SHA1&SignatureNonce=1234567890&SignatureVersion=1.0&Timestamp=2024-01-01T00%3A00%3A00Z&AccessKeyId=your-access-key-id&Signature=your-signature' \
  -H 'Host: rds.aliyuncs.com'
```

## Response Format

```json
{
  "TotalRecordCount": 2,
  "RequestId": "A467D279-68A8-57B3-BDA4-35F8B3DDB1B7",
  "Items": [
    {
      "Description": "OS CPU utilization, equal to the number of OS-consumed CPUs divided by the total number of CPUs",
      "MetricsKey": "os.cpu_usage.sys.avg",
      "GroupKeyType": "CPU utilization",
      "GroupKey": "os.cpu_usage",
      "Method": "avg",
      "Dimension": "os",
      "Unit": "%",
      "SortRule": 1,
      "DbType": "pgsql",
      "MetricsKeyAlias": "os.cpu_usage.sys"
    },
    {
      "Description": "User CPU utilization, equal to the number of user-consumed CPUs divided by the total number of CPUs",
      "MetricsKey": "os.cpu_usage.user.avg",
      "GroupKeyType": "CPU utilization",
      "GroupKey": "os.cpu_usage",
      "Method": "avg",
      "Dimension": "os",
      "Unit": "%",
      "SortRule": 2,
      "DbType": "pgsql",
      "MetricsKeyAlias": "os.cpu_usage.user"
    }
  ]
}
```

**Key Fields**:
- `TotalRecordCount` — Total number of records returned
- `RequestId` — Unique identifier for the API request
- `Items[].MetricsKey` — Unique identifier for each metric
- `Items[].Description` — Human-readable description of the metric
- `Items[].GroupKey` — Category grouping for the metric
- `Items[].GroupKeyType` — Display name for the metric group
- `Items[].Method` — Aggregation method (e.g., avg, max, min)
- `Items[].Dimension` — Scope of the metric (os vs database)
- `Items[].Unit` — Measurement unit for the metric value

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|---------------|--------------------|-----------------------------|
| 400 | InvalidMetricsConfig: The specified metrics config is invalid. The error message returned because the Enhanced Monitoring metrics that you specify are invalid. | Verify that the metric keys in MetricsConfig are valid and properly formatted as comma-separated values. |
| 400 | InvalidScope: The specified scope is invalid. The error message returned because the application scope that you specify is invalid. | Ensure Scope parameter is set to either "instance" or "region". |
| 404 | InvalidDBInstanceName.NotFound: The database instance does not exist. The error message returned because the instance name that you specify cannot be found. Check whether the instance name is correct. | Verify the DBInstanceName parameter matches an existing RDS instance ID. |
| 400 | InvalidSearchTimeRange: The interval between the end time that is specified by the EndTime parameter and the start time that is specified by the StartTime parameter must be less than 31 days. Check the values of these parameters. | Ensure the time range between StartTime and EndTime is less than 31 days for log queries. |
| 408 | RequestTimeout: Query timed out.Please try again or narrow down the query scope. | Reduce the time range or add more specific filters to narrow the query scope. |

### Rate Limits & Retry
- **Query Instance Monitoring Frequency**: 100 QPS per account
- **DescribeSQLLogRecords**: 1,000 calls per minute per account
- **General guidance**: Implement exponential backoff with jitter for retry logic. Start with a 1-second delay and double the delay with each retry up to a maximum of 30 seconds. Respect any `Retry-After` header if present in error responses.

## Environment Requirements

- **Python SDK**: `aliyun-python-sdk-rds>=2.1.0`
- **Environment variable setup**: `export ALIYUN_API_KEY=your_api_key`
- **Python version**: Compatible with Python 3.6+ (as per aliyun-python-sdk-core requirements)

## FAQ

Q: How do I enable SQL Audit (SQL Explorer) for my RDS instance?
A: Use the ModifySQLCollectorPolicy API with SQLCollectorStatus set to "Enable". You can also check the current status using DescribeSQLCollectorPolicy before making changes.

Q: What's the maximum time range I can query for slow logs and error logs?
A: For slow logs and error logs, the maximum time range is 31 days. For SQL audit logs via DescribeSQLLogRecords, the maximum is 15 days, and you can only query data from the last 15 days.

Q: How can I reduce storage usage from accumulated logs?
A: Use the PurgeDBInstanceLog API to upload log backup files to OSS and delete them from the instance. Additionally, configure appropriate retention periods using ModifySQLCollectorRetention (30, 180, 365, 1095, or 1825 days).

Q: What monitoring frequencies are supported for RDS instances?
A: Supported monitoring frequencies are 5, 10, 60, and 300 seconds. Note that higher frequencies (5 and 10 seconds) may incur additional charges.

Q: How do I find which metrics are available for my RDS instance?
A: Use the DescribeAvailableMetrics API to get a complete list of available enhanced monitoring metrics for your specific instance and database engine type.

## Pricing & Billing

### Billing Model
All APIs in this domain follow a per-request billing model, where each API call counts as one request regardless of the amount of data returned.

### Price Reference

| Tier | Input Price | Output Price | Other Fees |
|-----------|---------|---------|---------|
| default | 0.0001 / | 0.0001 / |
| Standard | 0.0001 / | 0.0001 / |
| RDS AI Assistant Ultimate Edition | 0.002 / | |

### Free Tier
- **Monitoring Metrics APIs**: Monthly 1000 free calls
- **Instance Performance APIs**: Monthly 1000 free calls  
- **Some APIs**: No free tier available

### Usage Limits
- **Monitoring Metrics**: 100 QPS
- **Instance Performance**: 100 QPS
- **Slow Logs**: Maximum 10 requests per second
- **SQL Log Reports**: Maximum 7-day query time range, 100 records per page

### Billing Notes
- API calls are billed per request, with each invocation counting as one request
- High-frequency monitoring (5 or 10 seconds) incurs additional fees beyond the base API call cost
- Free tier quotas reset monthly
- Calls initiated by both your Alibaba Cloud account and RAM users within your account are counted toward usage limits