# rds-troubleshoot

Part of **RDS**

# ApsaraDB RDS Troubleshooting Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|----------|------------------|
| Missing or Invalid Parameters | `MissingParameter` or `InvalidParameter` error | High | Provide all required parameters with valid values |
| Authentication or Authorization Failure | `InvalidAccessKeyId.NotFound`, `Forbidden.RAM`, or `ActionUnauthorized` | High | Verify credentials, RAM permissions, and account status |
| Operation Denied Due to Instance State | `OperationDenied.DBInstanceStatus` or `IncorrectDBInstanceState` | Medium | Wait for instance to reach a supported state before retrying |
| Quota or Resource Limits Exceeded | `QuotaExceeded.CreateInstance` or `InsufficientBalance` | Medium | Request quota increase or add funds to your account |
| Network or Region Configuration Mismatch | `InvalidVSwitchId.Mismatch` or `RegionUnauthorized` | Medium | Ensure VPC, VSwitch, zone, and region align with instance requirements |

## Problem Details

### Problem 1: Missing or Invalid Parameters

**Symptoms**
- Error message: `MissingParameter`
- Error message: `InvalidParameter`
- Error message: `Invalid<parameter name>.Malformed`
- Behavior: API request fails immediately with HTTP 400
- Context: Occurs during instance creation, modification, or database/account management

**Root Cause**
These errors occur when required parameters are omitted, contain syntactically invalid values, or use unsupported options. Common triggers include typos in parameter names, incorrect data formats (e.g., malformed timestamps), or using deprecated parameters.

**Solution**
1. Review the [ApsaraDB RDS API documentation](https://www.alibabacloud.com/help/en/rds) for the specific operation to identify required parameters.
2. Validate all input values against allowed formats (e.g., ensure `Timestamp` follows ISO 8601).
3. For errors like `InvalidEngineVersion.Malformed`, confirm the engine version is supported in your region:
   ```bash
   aliyun rds DescribeAvailableResource \
     --RegionId <your-region> \
     --Engine <mysql|sqlserver|postgresql|ppas>
   ```
4. Avoid using reserved keywords for database or account names (e.g., `InvalidParameter.Keyword`).

**Verification**
- Resubmit the corrected request.
- Expected behavior: HTTP 200 response with successful operation result.

### Problem 2: Authentication or Authorization Failure

**Symptoms**
- Error message: `InvalidAccessKeyId.NotFound`
- Error message: `Forbidden.RAM`
- Error message: `ActionUnauthorized`
- Error message: `Forbidden.Authentication`
- Behavior: API returns HTTP 403; console shows permission denied
- Context: Occurs when using RAM users, expired credentials, or unverified accounts

**Root Cause**
The request lacks valid authentication credentials or sufficient permissions. Causes include:
- Invalid or deleted AccessKey ID
- RAM user missing required policies (e.g., `AliyunRDSFullAccess`)
- Account not passing real-name verification
- Attempting RAM-incompatible operations (e.g., `Forbedden.NotSupportRAM`)

**Solution**
1. Verify your AccessKey is active and correctly configured:
   ```bash
   aliyun configure list
   ```
2. Attach appropriate RAM policies to the user:
   - For full access: `AliyunRDSFullAccess`
   - For read-only: `AliyunRDSReadOnlyAccess`
3. Ensure your Alibaba Cloud account has passed real-name verification.
4. For operations incompatible with RAM (e.g., certain billing actions), use the root account.

**Verification**
- Run a simple describe command:
  ```bash
  aliyun rds DescribeDBInstances --RegionId <your-region>
  ```
- Expected output: JSON list of instances without error.

### Problem 3: Operation Denied Due to Instance State

**Symptoms**
- Error message: `OperationDenied.DBInstanceStatus`
- Error message: `IncorrectDBInstanceState`
- Error message: `OperationDenied.LockMode`
- Behavior: Operation fails even with correct parameters
- Context: Attempting actions like restart, upgrade, or backup on an instance in transitional or locked state

**Root Cause**
ApsaraDB RDS enforces state-based operation constraints. Actions are blocked if the instance is:
- In `Creating`, `Deleting`, or `Rebooting` state
- Locked due to security risks (`LockMode = ManualLock` or `LockByExpiration`)
- Part of a replication topology (read-only or guard instance)

**Solution**
1. Check current instance status:
   ```bash
   aliyun rds DescribeDBInstanceAttribute \
     --DBInstanceId <your-instance-id>
   ```
2. If locked, unlock via console:
   - Navigate to **RDS Console > Instances > [Instance] > Basic Information**
   - Click **Unlock** if manual lock is applied
3. Wait for ongoing operations (e.g., backup, scaling) to complete before retrying.
4. For read-only instances, perform write operations on the primary instance.

**Verification**
- Confirm instance status is `Running` and `LockMode` is `Unlock`.
- Retry the original operation successfully.

### Problem 4: Quota or Resource Limits Exceeded

**Symptoms**
- Error message: `QuotaExceeded.CreateInstance`
- Error message: `InsufficientBalance`
- Error message: `QuotaExceeded.DBName`
- Behavior: Creation/modification requests fail despite valid configuration
- Context: New deployments or scaling operations in constrained accounts

**Root Cause**
Your account has hit service quotas (e.g., max instances per region) or financial limits (insufficient balance). Common scenarios:
- Default quota of 10 RDS instances per region
- Database/account count limits per instance
- Unpaid bills triggering service suspension

**Solution**
1. Check current usage and quotas:
   ```bash
   aliyun rds DescribeResourceUsage --DBInstanceId <your-instance-id>
   ```
2. Request quota increase via **Alibaba Cloud Console > Quota Center**.
3. Add funds to your account if `InsufficientBalance` appears.
4. Delete unused databases/accounts to free quota:
   ```bash
   aliyun rds DeleteDatabase \
     --DBInstanceId <your-instance-id> \
     --DBName <unused-db>
   ```

**Verification**
- After quota adjustment or cleanup, retry the operation.
- Monitor **Billing Management** for payment confirmation.

### Problem 5: Network or Region Configuration Mismatch

**Symptoms**
- Error message: `InvalidVSwitchId.Mismatch`
- Error message: `RegionUnauthorized`
- Error message: `InvalidPrivateIpAddress.Mismatch`
- Behavior: VPC-related operations fail during instance creation or migration
- Context: Deploying across zones/VPCs without proper alignment

**Root Cause**
Network resources must align geographically and topologically:
- VSwitch must reside in the same zone as the RDS instance
- Private IP must belong to the VSwitch CIDR block
- User lacks permissions to create resources in the target region

**Solution**
1. Confirm zone compatibility:
   ```bash
   aliyun vpc DescribeVSwitches \
     --VpcId <your-vpc> \
     --ZoneId <target-zone>
   ```
2. Ensure private IP is within VSwitch range (e.g., VSwitch CIDR `192.168.0.0/24` → IP `192.168.0.10`).
3. Grant region access via **Resource Access Management (RAM)** if `RegionUnauthorized` occurs.
4. For VPC migration, verify no public connections exist (`PublicConnectionExists`).

**Verification**
- Successful VPC association or instance creation in target zone.
- Network connectivity test from ECS in same VPC.

## FAQ

**Q: How do I check if my RDS instance is healthy?**  
A: Use the `DescribeDBInstanceAttribute` API to verify `DBInstanceStatus` is `Running` and `LockMode` is `Unlock`. Monitor metrics like CPU, memory, and IOPS in the RDS console.

**Q: What permissions are needed to manage RDS instances via API?**  
A: The RAM user requires policies such as `AliyunRDSFullAccess` for full control or custom policies granting specific actions (e.g., `rds:DescribeDBInstances`, `rds:ModifyDBInstanceSpec`).

**Q: How do I enable debug logging for API requests?**  
A: Use the Alibaba Cloud CLI with `--debug` flag:
```bash
aliyun rds DescribeDBInstances --RegionId cn-hangzhou --debug
```
This outputs full request/response headers and bodies for diagnosis.

**Q: What are common causes of timeout errors during large operations?**  
A: Timeouts typically occur during large backups, imports, or storage scaling. Ensure your client timeout setting exceeds the operation’s expected duration (e.g., 30+ minutes for TB-scale data).

**Q: How do I roll back a failed instance modification?**  
A: Most modifications (e.g., spec changes) are atomic and automatically rolled back on failure. For parameter changes, revert to a previous configuration template via **RDS Console > Parameters > Apply Historical Template**.