# bailian-text

Part of **BAILIAN**

# Alibaba Cloud Model Studio Text and Code Generation

## Capabilities Overview

| Sub-capability | Models | API Pattern | Description |
|----------------|--------|-------------|-------------|
| Generate Text | qwen-plus, qwen-max, deepseek-v4-pro, glm-5.1, kimi-k2.6, MiniMax-M2.5, etc. | OpenAI Compatible | Generate text, chat completions, and reasoning using various large language models. |
| Deep Research | qwen-deep-research | Streaming | Conduct deep research and complex reasoning tasks with web search and planning. |
| Vision Understanding | qwen-vl-max, qwen-vl-plus, qvq-max, qwen3-vl-plus | OpenAI Compatible | Analyze and understand images and videos using multimodal vision models. |
| Code Generation | qwen3-coder-plus, qwen2.5-coder-32b-instruct | OpenAI Compatible | Generate, complete, and debug software code using specialized coding models. |
| Tool & Function Calling | qwen-plus, qwen-max, deepseek-v4-pro, glm-5.1 | OpenAI Compatible | Invoke external tools, code interpreters, and MCP servers. |
| Responses & Conversations | qwen-plus, qwen3.6-plus | OpenAI Compatible | Manage multi-turn conversations, context, and stateful responses via Responses API. |
| Deep Thinking | qwen-plus, deepseek-v4-pro, glm-5.1, kimi-k2-thinking | Streaming | Enable advanced reasoning and thinking modes for complex tasks. |
| Context Cache | qwen-plus, qwen-max, deepseek-v3.2, kimi-k2.6 | OpenAI Compatible | Cache long contexts to reduce latency and cost for repeated prompts. |
| Anthropic Compatible | qwen3.6-plus, qwen3-max, deepseek-v4-pro | OpenAI Compatible | Migrate Anthropic applications using the Messages API format. |
| Structured Output | qwen-plus, qwen-max, qwen-flash | OpenAI Compatible | Force the model to output valid JSON or structured formats. |
| Role Playing | qwen-plus-character, qwen-flash-character | OpenAI Compatible | Create character-driven conversations and role-playing scenarios. |
| Math Reasoning | qwen-math-plus, qwen2.5-math-72b-instruct | OpenAI Compatible | Solve complex mathematical problems using specialized math models. |
| Long Context | qwen-long | OpenAI Compatible | Process extremely long documents via file-id references. |

## Model Selection Guide

### Generate Text & Chat
| Model ID | API Pattern |
|----------|-------------|
| qwen-plus / qwen-max / qwen-turbo | OpenAI Compatible |
| qwen3.7-max / qwen3.6-plus / qwen3.5-flash | OpenAI Compatible |
| deepseek-v4-pro / deepseek-v4-flash / deepseek-r1 | OpenAI Compatible (Streaming) |
| glm-5.1 / glm-5 / glm-4.7 | OpenAI Compatible (Streaming) |
| kimi-k2.6 / kimi-k2-thinking | OpenAI Compatible (Streaming) |
| MiniMax-M2.5 / MiniMax-M2.1 | OpenAI Compatible (Streaming) |

### Vision & Multimodal
| Model ID | API Pattern |
|----------|-------------|
| qwen-vl-max / qwen-vl-plus | OpenAI Compatible |
| qwen3-vl-plus / qwen3-vl-flash | OpenAI Compatible |
| qvq-max / qvq-plus | OpenAI Compatible |
| qwen-vl-ocr | OpenAI Compatible |

### Code & Math
| Model ID | API Pattern |
|----------|-------------|
| qwen3-coder-plus / qwen3-coder-flash | OpenAI Compatible |
| qwen2.5-coder-32b-instruct / 14b / 7b | OpenAI Compatible |
| qwen-math-plus / qwen-math-turbo | OpenAI Compatible |

### Specialized Tasks
| Model ID | API Pattern |
|----------|-------------|
| qwen-deep-research | Streaming |
| qwen-plus-character / qwen-flash-character | OpenAI Compatible |
| tongyi-xiaomi-analysis-flash / pro | OpenAI Compatible |
| farui-plus (Legal) | Synchronous |

## API Calling Patterns

### Authentication
The primary and recommended authentication method is the **Bearer Token**.
*   **Header Format**: `Authorization: Bearer $DASHSCOPE_API_KEY`
*   **Environment Variable**: `DASHSCOPE_API_KEY`
*   *Note for Anthropic Compatible API*: You can also use the `x-api-key: $DASHSCOPE_API_KEY` header.

### Service Endpoints
API keys and endpoints are region-specific. Ensure your base URL matches the region where your API key was generated.

*   **China (Beijing)**: `https://dashscope.aliyuncs.com/compatible-mode/v1`
*   **Singapore (International)**: `https://dashscope-intl.aliyuncs.com/compatible-mode/v1`
*   **US (Virginia)**: `https://dashscope-us.aliyuncs.com/compatible-mode/v1`
*   **Anthropic Compatible (China)**: `https://dashscope.aliyuncs.com/apps/anthropic`
*   **DashScope Native (China)**: `https://dashscope.aliyuncs.com/api/v1`

### OpenAI Compatible Mode
The standard way to interact with models. Use the official OpenAI SDKs by changing the `base_url` and `api_key`.
1. Initialize the client with the region-specific base URL.
2. Call `client.chat.completions.create()`.
3. Pass `model`, `messages`, and optional parameters like `temperature` or `tools`.

### Streaming Output
Enable real-time token-by-token delivery by setting `stream=True`.
*   **OpenAI SDK**: Iterate over the response generator. Each chunk contains a `delta` object with `content` and optionally `reasoning_content`.
*   **DashScope Native**: Set `stream=True` and `incremental_output=True`. Add the `X-DashScope-SSE: enable` header for HTTP/curl requests.
*   **Usage Tracking**: Pass `stream_options={"include_usage": True}` to receive token usage statistics in the final chunk.

### Deep Thinking Mode
For complex reasoning, models like Qwen3, DeepSeek, GLM, and Kimi support a thinking process.
1. Enable via `extra_body={"enable_thinking": True}` (Python SDK) or as a top-level parameter in Node.js/curl.
2. The response will include a `reasoning_content` field in the `delta` (streaming) or `message` (non-streaming) before the final `content` is generated.
3. Thinking tokens are billed as output tokens.

### Tool and Function Calling
1. Define your tools in the `tools` array with `type: "function"` and a JSON schema for `parameters`.
2. The model will return `tool_calls` in the response.
3. Execute the function locally and append the result to the `messages` array with `role: "tool"` and the corresponding `tool_call_id`.
4. Call the API again to get the final synthesized response.

### Anthropic Compatible Mode
To migrate existing Anthropic applications:
1. Change the SDK base URL to `https://dashscope.aliyuncs.com/apps/anthropic`.
2. Use the `messages.create()` method.
3. Pass the `system` prompt as a top-level parameter, not inside the `messages` array.

## Parameter Reference

### Chat & Text Generation (OpenAI Compatible)

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| model | string | Yes | - | See Model Selection Guide | The model ID to use for inference. |
| messages | array | Yes | - | Max length varies by model context window | Conversation history. Roles: system, user, assistant, tool. |
| stream | boolean | No | False | True / False | Enable Server-Sent Events (SSE) streaming. |
| temperature | float | No | 1.0 | Range: [0, 2) | Controls randomness. Higher values increase diversity. |
| top_p | float | No | 1.0 | Range: (0, 1.0] | Nucleus sampling threshold. |
| max_tokens | integer | No | Model max | Model-specific limit | Maximum number of tokens to generate. |
| response_format | object | No | `{"type": "text"}` | `text` / `json_object` | Force JSON output. Prompt must contain the word "json". |
| enable_thinking | boolean | No | False | True / False | Enables reasoning process. Use via `extra_body` in Python. |
| tools | array | No | - | Max 20 tools recommended | List of function definitions or built-in tools (e.g., code_interpreter). |
| tool_choice | string / object | No | auto | `auto` / `none` / `required` / specific function | Controls tool selection behavior. |
| cache_control | object | No | - | `{"type": "ephemeral"}` | Marker for explicit context caching in message content. |

### Vision & Multimodal

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| image_url | object | No | - | Max 10 MB per image | URL of the image to analyze. Used in `content` array. |
| video_url | object | No | - | Max 1 GB (Qwen2.5-VL) | URL of the video file. |
| fps | number | No | 2.0 | Range: 0.1 - 10.0 | Frame extraction frequency for video input. |
| vl_high_resolution_images | boolean | No | False | True / False | Enables high-res image processing (up to 16K tokens). |

### Deep Research

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| output_format | string | No | model_detailed_report | `model_detailed_report` / `model_summary_report` | Format and detail level of the research report. |
| enable_feedback | boolean | No | True | True / False | If false, skips the follow-up question phase. |

## Code Examples

### Basic Chat Completion - Python - China Region

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ]
)
print(completion.choices[0].message.content)
```

### Streaming with Deep Thinking - Python - China Region

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you"}]

completion = client.chat.completions.create(
    model="qwen-plus",
    messages=messages,
    extra_body={"enable_thinking": True},
    stream=True,
    stream_options={"include_usage": True},
)

reasoning_content = ""
answer_content = ""
is_answering = False
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content
```

### Vision Understanding (Image Input) - Python - China Region

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
            {"type": "text", "text": "What is depicted in the image?"}
        ]}
    ]
)
print(completion.choices[0].message.content)
```

### Tool Calling (Function Calling) - Python - China Region

```python
from openai import OpenAI
import json
import os
import random

client = OpenAI(api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Check weather in a city.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "City or county."}},
            "required": ["location"]
        }
    }
}]

def get_current_weather(arguments):
    location = arguments["location"]
    return f"{location} is sunny today."

messages = [{"role": "user", "content": "What is the weather in Beijing?"}]
response = client.chat.completions.create(
    model="qwen-plus",
    messages=messages,
    tools=tools
)

assistant_output = response.choices[0].message
messages.append(assistant_output)

if assistant_output.tool_calls:
    tool_call = assistant_output.tool_calls[0]
    arguments = json.loads(tool_call.function.arguments)
    tool_result = get_current_weather(arguments)
    
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": tool_result
    })
    
    final_response = client.chat.completions.create(model="qwen-plus", messages=messages)
    print(final_response.choices[0].message.content)
```

### Anthropic Compatible API - Python - China Region

```python
import anthropic
import os

client = anthropic.Anthropic(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/apps/anthropic",
)

message = client.messages.create(
    model="qwen3.6-plus",
    max_tokens=1024,
    system="You are a helpful assistant",
    messages=[
        {
            "role": "user",
            "content": "Who are you?"
        }
    ],
    thinking={"type": "disabled"},
)

print(message.content[0].text)
```

### DashScope Native Streaming - curl - China Region

```bash
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input": {"messages": [{"role": "user", "content": "Who are you?"}]},
    "parameters": {"enable_thinking": true, "incremental_output": true, "result_format": "message"}
}'
```

## Response Format

### Standard Chat Completion
```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1718956423,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I am Qwen, a large-scale language model developed by Alibaba Cloud."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 10,
    "total_tokens": 22
  }
}
```

**Key Fields**:
*   `choices[].message.content` — The generated text response.
*   `choices[].message.reasoning_content` — The internal thinking process (only present if `enable_thinking` is true).
*   `choices[].message.tool_calls` — Array of tool calls requested by the model.
*   `usage.prompt_tokens` / `usage.completion_tokens` — Token consumption for billing.
*   `usage.prompt_tokens_details.cached_tokens` — Number of tokens served from context cache.

### Streaming Chunk Format
```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1726132850,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "I am",
        "reasoning_content": "Let me introduce myself..."
      },
      "finish_reason": null
    }
  ]
}
```

## Error Handling

| Code | Description | Recommended Action |
|------|-------------|--------------------|
| 400 | Bad Request - Invalid parameters, malformed JSON, or missing required fields. | Check request body structure, model ID, and parameter constraints. |
| 401 | Unauthorized - Invalid or missing API key. | Verify `DASHSCOPE_API_KEY` is set correctly and passed in the header. |
| 403 | Forbidden - Access denied due to region mismatch or insufficient permissions. | Ensure your API key region matches the base URL endpoint. |
| 429 | Too Many Requests - Rate limit exceeded. | Implement exponential backoff and retry logic. Reduce QPS. |
| 500 | Internal Server Error - Unexpected server-side issue. | Retry the request after a short delay. |
| InvalidParameter | Specific parameter validation failed. | Review the error message for the exact invalid field. |

### Rate Limits & Retry
*   **Default Limits**: 100 QPS (Queries Per Second) per model for most standard models.
*   **Concurrency**: Max 10 concurrent WebSocket or streaming connections per client.
*   **Retry Strategy**: For `429` and `5xx` errors, use exponential backoff (e.g., wait 1s, 2s, 4s) before retrying. Check for the `Retry-After` header if provided.

## Requirements

*   **OpenAI SDK**: `pip install openai>=1.0.0`
*   **DashScope SDK**: `pip install dashscope>=1.14.0` (Python) or `dashscope>=2.19.4` (Java)
*   **Anthropic SDK**: `pip install anthropic>=1.0.0`
*   **Environment Variable**: `export DASHSCOPE_API_KEY=your_api_key`
*   **Node.js**: Requires v18+ for native `fetch` and ES Module support in OpenAI SDK.

## FAQ

**Q: Why am I getting a 401 Unauthorized error when my API key is correct?**
A: API keys in Alibaba Cloud Model Studio are strictly region-specific. An API key generated for the China (Beijing) region will not work with the Singapore or US (Virginia) endpoints. Ensure your `base_url` matches the region where you created the API key.

**Q: How is Deep Thinking mode billed?**
A: When `enable_thinking` is set to true, the model generates a reasoning process (`reasoning_content`) before the final answer. These reasoning tokens are counted and billed as **output tokens** at the standard output token rate for the selected model.

**Q: How do I use Context Caching to save costs on long prompts?**
A: You can use Explicit Cache by adding a `cache_control: {"type": "ephemeral"}` marker inside the `content` array of your system or user message. The minimum cacheable prompt length is 1,024 tokens. Cached tokens are billed at a significantly discounted rate (e.g., 10% to 20% of the standard input price).

**Q: Can I force the model to output strict JSON?**
A: Yes. Set `response_format={"type": "json_object"}` in your request parameters. You must also explicitly include the word "json" in your system or user prompt to instruct the model, otherwise the API will return a 400 Bad Request error.

## Pricing & Billing

### Billing Model
Billing is primarily **per-token** (pay-as-you-go). Input tokens (prompts, system messages, tool definitions, cached context) and output tokens (generated text, reasoning content) are billed separately based on the specific model's tier.

### Price Reference

| Model / Tier | Input Price | Output Price |
|--------------|-------------|--------------|
| qwen-plus | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |
| qwen-max | CNY 0.004 / 1K tokens | CNY 0.008 / 1K tokens |
| qwen-turbo | CNY 0.0005 / 1K tokens | CNY 0.001 / 1K tokens |
| qwen-vl-max | CNY 0.005 / 1K tokens | CNY 0.010 / 1K tokens |
| qwen3-coder-plus | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |
| deepseek-v4-pro | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |
| glm-5.1 | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |
| kimi-k2.6 | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |
| MiniMax-M2.5 | CNY 0.002 / 1K tokens | CNY 0.004 / 1K tokens |

### Free Tier
*   Most standard models include **1 million tokens free per month** or a 90-day validity period upon activating Alibaba Cloud Model Studio.
*   Third-party models (GLM, Kimi, MiniMax) may have specific free quotas (e.g., 1 million tokens per model).

### Usage Limits
*   **Context Window**: Varies by model (e.g., 8K, 32K, 128K, up to 10M for qwen-long).
*   **Max Output**: Typically 2K to 8K tokens per request.
*   **File Uploads (qwen-long)**: Max 10,000 files or 100 GB per account; max 100 files per API request.

### Billing Notes
*   **Context Cache Discounts**: Explicit cache hits are billed at ~10% of the standard input price. Implicit cache hits are billed at ~20%.
*   **Tool Definitions**: The JSON schema provided in the `tools` array is counted and billed as input tokens.
*   **Streaming**: Billing is identical to non-streaming calls. If a stream is interrupted, you are only billed for the output tokens generated before the stop request.

## Source Documents

- `API Reference ModelsHong Kong_6039841.xdita`
- `Qwen_6030891.xdita`
- `API usage_5580659.xdita`
- `DeepSeek-Kuaishou Wanqing_6492426.xdita`
- `DeepSeek-SiliconFlow_6361683.xdita`
- `GLM_5978659.xdita`
- `Kimi - Alibaba Cloud_5889022.xdita`
- `Kimi-Moonshot AI_6404566.xdita`
- `MiniMax  Alibaba Cloud_6374985.xdita`
- `MiniMax - XiYu Technology_6404822.xdita`
- `OpenAI Chat_6371874.xdita`
- `Qwen_4759789.xdita`
- `WebSocket API_6562901.xdita`
- `Model calls in a sub-workspace_4869993.xdita`
- `Preparations_5580659.xdita`
- `Single-turn conversation_6030884.xdita`
- `First API call to Qwen_5088955.xdita`
- `Make your first API call to Qwen_5088955.xdita`
- `Anthropic-compatible Messages_6099295.xdita`
- `DashScope_6371876.xdita`
- `OpenAI-compatible Chat_6371874.xdita`
- `Text generation_4759789.xdita`
- `OpenAI compatible - Completions_5505264.xdita`
- `OpenAI-compatible - Chat_5014669.xdita`
- `Create a response_6371875.xdita`
- `List input items_6562639.xdita`
- `Retrieve a response_6562635.xdita`
- `OpenAI compatible - Conversations_6421861.xdita`
- `OpenAI-compatible - Responses_6371584.xdita`
- `OpenAI compatible - Vision_5148709.xdita`
- `OpenAI-compatible - Vision_5148709.xdita`
- `Multi-turn conversation_6030885.xdita`
- `Multi-turn conversations_5470428.xdita`
- `Streaming_6030886.xdita`
- `Streaming output_5470432.xdita`
- `Structured output_6030888.xdita`
- `Structured output_5411672.xdita`
- `Deep thinking_6030887.xdita`
- `Deep thinking_5514422.xdita`
- `Partial mode_6030889.xdita`
- `Role-playing Qwen-Character_5540075.xdita`
- `Code Interpreter to Be Released with Qwen3.5_6210161.xdita`
- `Function Calling_5411670.xdita`
- `MCP_5953719.xdita`
- `Tool calling_5936847.xdita`
- `Visual understanding_6030890.xdita`
- `Conversation Analysis Tongyi-Xiaomi-Analysis_6364320.xdita`
- `Context cache_5414891.xdita`
- `Partial mode_5411673.xdita`
- `Long context Qwen-Long_5181764.xdita`
- `Mathematical capabilities Qwen-Math_5196479.xdita`
- `Web extractor_6371883.xdita`
- `DeepSeek_5489362.xdita`
- `GLM-Alibaba Cloud_5978659.xdita`
- `GLM-Zhipu_6481599.xdita`
- `Kimi_5889022.xdita`
- `MiniMax_6404822.xdita`
- `Stepfun_6584393.xdita`
- `MiMo - Xiaomi_6564626.xdita`
- `Code Capabilities Qwen-Coder_5199667.xdita`
- `Code Interpreter_6210161.xdita`