# bailian-text

Part of **BAILIAN**

# Alibaba Cloud Model Studio Text and Code Generation Troubleshooting Guide

## Problem Index

| Problem | Symptoms | Severity | Solution Summary |
|------|------|---------|------------|
| Authentication and Authorization Failures | `401 Invalid API key`, `403 Access denied` | High | Verify API key format, region, and workspace permissions. |
| Rate Limiting and Throttling | `429 Rate limiting`, `Throttling.RateQuota`, `Throttling.AllocationQuota` | Medium | Implement exponential backoff, use fallback models, or increase quotas. |
| Request Validation and Routing Errors | `400 Bad request`, `404 Resource not found`, `400 Input length exceeds` | Medium | Check request parameters, context window limits, and base URL paths. |
| Server-Side and Availability Issues | `500 Internal server error`, `503 Service unavailable` | High | Retry with backoff, check service status, or switch to backup models. |

## Problem Details

### Problem 1: Authentication and Authorization Failures

**Symptoms**
- Error message: `401 Invalid API key: The provided API key is incorrect, expired, or has mismatched region settings.`
- Error message: `403 Access denied: You lack permission to access the model or workspace.`
- Error message (Coding Plan): `401 Invalid access token or token expired.` or `403 Invalid API key.`
- Behavior: API requests are rejected immediately with 401 or 403 HTTP status codes.

**Root Cause**
- The API key is missing, incorrect, or expired.
- For Coding Plan users, the standard API key is used instead of the plan-specific key (which starts with `sk-sp-`), or the subscription has expired.
- The API key does not have the required RAM permissions or workspace access rights for the requested model.
- Region mismatch between the API key and the endpoint base URL.

**Solution**
1. Verify the API key format. For standard Model Studio access, ensure the key is correctly set in the `DASHSCOPE_API_KEY` environment variable.
2. For Coding Plan users, ensure you are using the plan-specific API key starting with `sk-sp-xxx` and that your subscription is active.
3. Check that the base URL matches your region and plan:
   - Standard (China): `https://dashscope.aliyuncs.com/compatible-mode/v1`
   - Coding Plan (China): `https://coding.dashscope.aliyuncs.com/v1`
   - Coding Plan (Global): `https://coding-intl.dashscope.aliyuncs.com/v1`
4. Verify workspace permissions in the Alibaba Cloud console under Model Square > Apply Now > Select Model.

**Verification**
- Send a simple test request using `curl` or the Python SDK. A successful response will return a `200 OK` status and a valid JSON payload with the model's output.

### Problem 2: Rate Limiting and Throttling

**Symptoms**
- Error message: `429 Rate limiting: Too many requests have been made.`
- Error message: `Throttling.RateQuota: Request rate exceeded (RPM exceeded).`
- Error message: `Throttling.AllocationQuota: Token usage exceeded (TPM exceeded).`
- Error message: `Throttling.BurstRate: Traffic growth rate exceeded (Traffic Burst).`
- Behavior: Requests fail intermittently or consistently during high-traffic periods.

**Root Cause**
- The application exceeds the configured Requests Per Minute (RPM) or Tokens Per Minute (TPM) quotas for the specific model.
- Sudden spikes in traffic trigger the burst rate limiter.
- Coding Plan quotas (5-hour, weekly, or monthly) have been exhausted.

**Solution**
1. Implement exponential backoff and retry logic using libraries like `tenacity` in Python.
```python
import openai
from openai import OpenAI
from tenacity import retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type

RETRYABLE_ERRORS = (openai.RateLimitError, openai.InternalServerError, openai.APIConnectionError)

@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=retry_if_exception_type(RETRYABLE_ERRORS)
)
def chat_with_retry(client, model, messages):
    return client.chat.completions.create(model=model, messages=messages)
```
2. Use server-side queuing by adding the `X-DashScope-Wait-Timeout` header to allow the server to queue requests instead of rejecting them immediately.
```bash
curl -X POST "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions" \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-DashScope-Wait-Timeout: 30" \
  -d '{"model": "qwen-plus", "messages": [{"role": "user", "content": "Hello"}]}'
```
3. Implement a fallback model strategy. If the primary model returns a 429 error, automatically route the request to a backup model with independent quotas (e.g., fallback from `qwen-plus` to `qwen-flash`).
4. For Coding Plan users, monitor your 5-hour, weekly, and monthly quotas. If exhausted, wait for the quota replenishment cycle or upgrade your plan.

**Verification**
- Monitor the application logs to ensure 429 errors are caught and retried successfully, or that fallback models are invoked without failing the user request.

### Problem 3: Request Validation and Routing Errors

**Symptoms**
- Error message: `400 Bad request: The request body or parameters are invalid.`
- Error message: `400 Input length exceeds the allowed range.`
- Error message: `404 Resource not found: The requested model, workspace, or resource does not exist.`
- Error message (Coding Plan): `404 Base URL path is incorrect.`
- Behavior: The API rejects the request due to malformed payloads, excessive context length, or incorrect endpoint routing.

**Root Cause**
- The `messages` array is malformed, missing required fields, or exceeds the maximum limit (e.g., max 100 messages).
- The input prompt exceeds the model's context window limit.
- The model ID is misspelled or not available in the current workspace.
- For Coding Plan integrations, the base URL path is incorrect (e.g., using `/v1` instead of `/apps/anthropic`).

**Solution**
1. Check the `messages` array structure. Ensure it contains valid role and content pairs and does not exceed 100 messages.
2. If encountering context length errors, truncate the conversation history, create a new session, or switch to a model with a larger context window.
3. Verify the `model` parameter matches the exact model ID (e.g., `qwen-plus`, `qwen-max-2025-01-25`).
4. For Coding Plan Claude Code integration, ensure the base URL is set correctly:
   - China: `https://coding.dashscope.aliyuncs.com/apps/anthropic`
   - Global: `https://coding-intl.dashscope.aliyuncs.com/apps/anthropic`

**Verification**
- Resend the corrected request. A `200 OK` response confirms the payload and routing are valid.

### Problem 4: Server-Side and Availability Issues

**Symptoms**
- Error message: `500 Internal server error: An unexpected error occurred on the server side.`
- Error message: `503 Service unavailable: The model is temporarily offline due to high load or maintenance.`
- Behavior: Requests fail with 5xx status codes, indicating backend infrastructure issues or model unavailability.

**Root Cause**
- Temporary backend infrastructure failures, network partitions, or ongoing maintenance on the Alibaba Cloud Model Studio platform.
- The specific model instance is overloaded or temporarily taken offline for updates.

**Solution**
1. Implement retry logic with exponential backoff (as shown in Problem 2) to handle transient 500 and 503 errors.
2. If the issue persists, check the Alibaba Cloud Service Health Dashboard for ongoing incidents or maintenance windows.
3. Configure a fallback model in your application architecture to maintain availability when the primary model returns 503 errors.

**Verification**
- Wait for a few minutes and retry the request. If the platform recovers, the request will succeed. If using a fallback, verify that the fallback model successfully processes the request during the primary model's downtime.

## FAQ

**Q: How do I handle rate limiting for high-concurrency applications?**
A: Implement a dual token bucket algorithm to manage both RPM and TPM quotas, use a concurrency semaphore to limit simultaneous requests, and apply traffic shaping to smooth out bursty traffic. Additionally, use the `X-DashScope-Wait-Timeout` header to enable server-side queuing.

**Q: What is the difference between standard API keys and Coding Plan API keys?**
A: Standard API keys are used for pay-as-you-go billing and general API access. Coding Plan API keys (which start with `sk-sp-`) are tied to a fixed monthly subscription with specific 5-hour, weekly, and monthly quotas, and require specific base URLs like `coding.dashscope.aliyuncs.com`.

**Q: How can I reduce token consumption and avoid `Throttling.AllocationQuota` errors?**
A: Optimize your prompts to be more concise, use context caching for repeated prefixes in multi-turn conversations, and truncate older messages in the `messages` array to keep the input token count within the TPM limits.

**Q: Why am I getting a 404 error when using the Coding Plan with Claude Code?**
A: This usually occurs because the base URL path is incorrect. For Claude Code integration, you must use the `/apps/anthropic` path (e.g., `https://coding.dashscope.aliyuncs.com/apps/anthropic`) instead of the standard `/v1` path.

**Q: Does the platform support automatic retries for network and server errors?**
A: The platform does not automatically retry failed requests on the client side. You must implement client-side retry logic using libraries like `tenacity` in Python, configuring it to catch `RateLimitError`, `InternalServerError`, and `APIConnectionError` with exponential backoff.

## Source Documents

- Error messages_6031596.xdita
- Rate limits_6030883.xdita
- Best practices for handling rate limiting_6490246.xdita
- FAQ_6413436.xdita