# bailian-translation

Part of **BAILIAN**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for the following routing skills. If you're unsure which path to take, check the corresponding routing skill:

> - [Transcribe, recognize, and translate speech audio](../../intent/bailian-transcribe-speech/SKILL.md)
> - [Extract and understand information from documents and images](../../intent/bailian-extract-documents/SKILL.md)

# Bailian Translation and Localization

## Capabilities Overview

| Sub-capability | Models | API Pattern | Description |
|--------|------|----------|------|
| Realtime Speech Translation | qwen3-livetranslate-flash, qwen3.5-livetranslate-flash-realtime, gummy-realtime-v1, + 3 more | WebSocket / Streaming | Translate live audio and video streams into text or speech in another language. |
| File Translation | qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01 | OpenAI Compatible (Streaming) | Translate audio and video files into different languages. |
| Real-time Translation | qwen3.5-livetranslate-flash-realtime, gummy-realtime-v1, gummy-chat-v1 | WebSocket | Translate live audio and video streams in real-time. |
| Speech-to-Speech | qwen3.5-omni-plus-realtime, qwen-omni-turbo, qwen3-omni-flash, + 24 more | WebSocket | Translate spoken audio directly into synthesized speech in another language. |
| Machine Translation | qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, + 2 more | OpenAI Compatible | Translate text accurately using Qwen-MT models. |
| Image Translation | qwen-mt-image | Async Task | Translate text embedded within images while preserving the original layout. |
| Optical Character Recognition | qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20, qwen-vl-ocr, + 3 more | OpenAI Compatible | Extract and recognize text from images using Qwen-OCR models. |
| Conversation Analysis | tongyi-xiaomi-analysis-flash, tongyi-xiaomi-analysis-pro | OpenAI Compatible | Analyze customer service or chat transcripts to extract insights, sentiment, and summaries. |

## Model Selection Guide

### Realtime Speech Translation

| Model ID | API Pattern |
|---------|----------|
| qwen3-livetranslate-flash | Streaming |
| qwen3-livetranslate-flash-2025-12-01 | Streaming |
| qwen3-livetranslate-flash-api | Streaming |
| qwen3-asr-flash-realtime | WebSocket |
| qwen3.5-livetranslate-flash-realtime | WebSocket |
| qwen3-livetranslate-flash-realtime | WebSocket |
| gummy-realtime-v1 | WebSocket |
| gummy-chat-v1 | WebSocket |

### File Translation

| Model ID | API Pattern |
|---------|----------|
| qwen3-livetranslate-flash | OpenAI Compatible (Streaming) |
| qwen3-livetranslate-flash-2025-12-01 | OpenAI Compatible (Streaming) |

### Real-time Translation

| Model ID | API Pattern |
|---------|----------|
| qwen3.5-livetranslate-flash-realtime | WebSocket |
| qwen3.5-livetranslate-flash-realtime-2026-05-19 | WebSocket |
| qwen3-livetranslate-flash-realtime | WebSocket |
| qwen3-livetranslate-flash-realtime-2025-09-22 | WebSocket |
| qwen3-asr-flash-realtime | WebSocket |
| gummy-realtime-v1 | WebSocket |
| gummy-chat-v1 | WebSocket |

### Speech-to-Speech

| Model ID | API Pattern |
|---------|----------|
| qwen3.5-omni-plus-realtime | WebSocket |
| qwen3.5-omni-plus | WebSocket |
| qwen3.5-omni-flash-realtime | WebSocket |
| qwen3.5-omni-flash | WebSocket |
| qwen3-omni-flash-realtime | WebSocket |
| qwen3-omni-flash | WebSocket |
| qwen3.5-livetranslate-flash-realtime | WebSocket |
| qwen3-livetranslate-flash | WebSocket |
| qwen2.5-omni-7b | WebSocket |
| qwen-omni-turbo | WebSocket |
| qwen-omni-turbo-latest | WebSocket |
| qwen-omni-turbo-2025-03-26 | WebSocket |
| qwen-omni-turbo-realtime | WebSocket |
| qwen-omni-turbo-realtime-latest | WebSocket |
| qwen-omni-turbo-realtime-2025-05-08 | WebSocket |
| qwen3-livetranslate-flash-2025-12-01 | WebSocket |
| qwen3-livetranslate-flash-realtime-2025-09-22 | WebSocket |
| qwen3-omni-flash-2025-09-15 | WebSocket |
| qwen3-omni-flash-2025-12-01 | WebSocket |
| qwen3-omni-flash-realtime-2025-09-15 | WebSocket |
| qwen3-omni-flash-realtime-2025-12-01 | WebSocket |
| qwen3.5-livetranslate-flash-realtime-2026-05-19 | WebSocket |
| qwen3.5-omni-flash-2026-03-15 | WebSocket |
| qwen3.5-omni-flash-realtime-2026-03-15 | WebSocket |
| qwen3.5-omni-plus-2026-03-15 | WebSocket |
| qwen3.5-omni-plus-realtime-2026-03-15 | WebSocket |

### Machine Translation

| Model ID | API Pattern |
|---------|----------|
| qwen-mt-plus | OpenAI Compatible |
| qwen-mt-flash | OpenAI Compatible |
| qwen-mt-lite | OpenAI Compatible |
| qwen-mt-lite-us | OpenAI Compatible |
| qwen-mt-turbo | OpenAI Compatible |

### Image Translation

| Model ID | API Pattern |
|---------|----------|
| qwen-mt-image | Async Task |

### Optical Character Recognition

| Model ID | API Pattern |
|---------|----------|
| qwen-vl-ocr-latest | OpenAI Compatible |
| qwen-vl-ocr-2025-11-20 | OpenAI Compatible |
| qwen-vl-ocr-2025-08-28 | OpenAI Compatible |
| qwen-vl-ocr-2025-04-13 | OpenAI Compatible |
| qwen-vl-ocr-2024-10-28 | OpenAI Compatible |
| qwen-vl-ocr | OpenAI Compatible |

### Conversation Analysis

| Model ID | API Pattern |
|---------|----------|
| tongyi-xiaomi-analysis-flash | OpenAI Compatible |
| tongyi-xiaomi-analysis-pro | OpenAI Compatible |

## API Calling Modes

### Authentication
The primary and recommended authentication method is the **Bearer Token**.
- Header format: `Authorization: Bearer $DASHSCOPE_API_KEY`
- Environment variable: `DASHSCOPE_API_KEY`
- For DashScope native endpoints, some legacy examples use `Authorization: $DASHSCOPE_API_KEY` without the "Bearer " prefix, but the Bearer token format is universally supported and recommended across all OpenAI-compatible and modern DashScope endpoints.

### Service Endpoints
- **OpenAI Compatible (China / Beijing)**: `https://dashscope.aliyuncs.com/compatible-mode/v1`
- **OpenAI Compatible (International / Singapore)**: `https://dashscope-intl.aliyuncs.com/compatible-mode/v1`
- **DashScope Native (China)**: `https://dashscope.aliyuncs.com/api/v1/services/aigc/...`
- **WebSocket (China)**: `wss://dashscope.aliyuncs.com/api-ws/v1/realtime` or `wss://dashscope.aliyuncs.com/api-ws/v1/inference`
- **WebSocket (International)**: `wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime`

### OpenAI Compatible Pattern
Used for Machine Translation, File Translation, OCR, and Conversation Analysis.
1. Send a standard POST request to the `/chat/completions` endpoint.
2. For non-standard parameters (like `translation_options` in Python SDK), pass them inside the `extra_body` dictionary. In Node.js or raw HTTP, pass them at the top level of the JSON payload.
3. For streaming, set `stream: true` and iterate over the Server-Sent Events (SSE) chunks.

### WebSocket Pattern
Used for Real-time Speech Translation and Speech-to-Speech.
1. Establish a WebSocket connection to the `/realtime` or `/inference` endpoint, passing the API key in the `Authorization` header.
2. Send a `session.update` JSON event to configure modalities, target language, voice, and hotwords.
3. Stream Base64-encoded audio chunks via `input_audio_buffer.append` events.
4. Listen for server events like `response.audio_transcript.done` (for translated text) and `response.audio.delta` (for synthesized audio).
5. Send `session.finish` or close the connection to end the session.

### Async Task Pattern
Used for Image Translation.
1. Submit the task via POST to the DashScope native endpoint (e.g., `/image2image/image-synthesis`).
2. Include the header `X-DashScope-Async: enable` to trigger asynchronous processing.
3. The response returns a `task_id`.
4. Poll the task status via GET `/api/v1/tasks/{task_id}` until `task_status` is `SUCCEEDED`.
5. Download the resulting image from the provided `image_url`.

## Parameter Reference

### Machine Translation (Qwen-MT)

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | One of: qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, qwen-mt-turbo | Model name. |
| messages | array | Yes | - | Must contain exactly one user message | Input messages in chat format. |
| translation_options | object | Yes | - | - | Translation parameters including source_lang and target_lang. |
| translation_options.source_lang | string | No | auto | Language code or 'auto' | Source language code. Specifying improves accuracy. |
| translation_options.target_lang | string | Yes | - | Valid language code | Target language code (e.g., 'en', 'zh', 'English'). |
| translation_options.terms | array | No | - | Array of {source, target} objects | Glossary of terms for consistent translation. |
| translation_options.tm_list | array | No | - | Array of {source, target} objects | Translation memory entries for style consistency. |
| translation_options.domains | string | No | - | English text only | Domain-specific prompt to tailor translation style. |
| stream | boolean | No | false | true / false | Enable streaming output. |
| temperature | float | No | 0.65 | Range: [0, 2) | Controls output diversity. |

### Real-time Speech Translation (Qwen-LiveTranslate & Gummy)

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | qwen3.5-livetranslate-flash-realtime, gummy-realtime-v1, etc. | Model name. |
| session.modalities | array | No | ["text", "audio"] | ["text"] or ["text", "audio"] | Controls output type. |
| session.translation.language | string | No | en | Valid language code | Target language for translation. |
| session.voice | string | No | default / Tina / Cherry | Supported voices | Voice for synthesized audio output. |
| session.enable_voice_clone | boolean | No | false | true / false | Enables voice cloning from input audio. |
| session.translation.corpus.phrases | object | No | - | Key-value pairs | Hotwords mapping source terms to target translations. |
| format | string | Yes (Gummy) | - | pcm, wav, mp3, opus, etc. | Audio format (Gummy models). |
| sample_rate | integer | Yes (Gummy) | - | 16000 or higher | Audio sampling rate in Hz (Gummy models). |
| transcription_enabled | boolean | No (Gummy) | true | true / false | Enable source language recognition. |
| translation_enabled | boolean | No (Gummy) | false | true / false | Enable translation feature. |

### Image Translation (Qwen-MT-Image)

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | qwen-mt-image | Model name. |
| input.image_url | string | Yes | - | Public HTTP/HTTPS URL | URL of the image to translate. |
| input.source_lang | string | Yes | - | Language code or 'auto' | Source language. |
| input.target_lang | string | Yes | - | Language code | Target language. |
| input.ext.domainHint | string | No | - | English text, max 200 words | Domain hint to adapt translation style. |
| input.ext.sensitives | array | No | - | Max 50 words | Sensitive words to filter before translation. |
| input.ext.config.imageSegment | boolean | No | false | true / false | Exclude text on image subjects from translation. |

### Optical Character Recognition (Qwen-OCR)

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | qwen-vl-ocr-latest, etc. | Model name. |
| messages | array | Yes | - | Must include image_url | Input messages containing the image. |
| min_pixels | integer | No | 3072 | - | Minimum pixel threshold. Images below this are upscaled. |
| max_pixels | integer | No | 8388608 | - | Maximum pixel threshold. Images above this are downscaled. |
| enable_rotate | boolean | No | false | true / false | Correct skewed images automatically. |
| temperature | float | No | 0.01 | Range: [0, 2) | Controls output diversity. |

## Code Examples

### Machine Translation - Python - OpenAI Compatible

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [
    {
        "role": "user",
        "content": "No me reí después de ver este video"
    }
]
translation_options = {
    "source_lang": "auto",
    "target_lang": "English"
}

completion = client.chat.completions.create(
    model="qwen-mt-plus",
    messages=messages,
    extra_body={
        "translation_options": translation_options
    }
)
print(completion.choices[0].message.content)
```

### File Translation - Python - OpenAI Compatible Streaming

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "input_audio": {
                    "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                    "format": "wav",
                },
            }
        ],
    }
]

completion = client.chat.completions.create(
    model="qwen3-livetranslate-flash",
    messages=messages,
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
    print(chunk)
```

### Real-time Speech Translation - Python - WebSocket

```python
import os
import sys
import base64
import signal
import pyaudio
from dashscope.audio.qwen_omni import (
    OmniRealtimeConversation,
    OmniRealtimeCallback,
    MultiModality,
)
from dashscope.audio.qwen_omni.omni_realtime import TranslationParams

class Callback(OmniRealtimeCallback):
    """Callback handler class for real-time translation"""

    def __init__(self, speaker):
        self.speaker = speaker

    def on_open(self):
        print("[Connection established]")

    def on_close(self, code, msg):
        print(f"[Connection closed] code: {code}, msg: {msg}")

    def on_event(self, response):
        event_type = response.get("type", "")
        if event_type == "input_audio_buffer.speech_started":
            print("====== Speech input detected ======")
        elif event_type == "input_audio_buffer.speech_stopped":
            print("====== Speech input ended ======")
        elif event_type == "conversation.item.input_audio_transcription.completed":
            print(f"[Original text] {response.get('transcript', '')}")
        elif event_type == "response.audio_transcript.done":
            print(f"[Translation result] {response.get('transcript', '')}")
        elif event_type == "response.audio.delta":
            audio_b64 = response.get("delta", "")
            if audio_b64:
                self.speaker.write(base64.b64decode(audio_b64))
        elif event_type == "error":
            print(f"[Error] {response.get('error', {}).get('message', '')}")

def main():
    if not os.environ.get("DASHSCOPE_API_KEY"):
        print("Set the DASHSCOPE_API_KEY environment variable.")
        sys.exit(1)

    pya = pyaudio.PyAudio()
    speaker = pya.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True, frames_per_buffer=2400)
    mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1600)

    callback = Callback(speaker=speaker)
    conversation = OmniRealtimeConversation(
        model="qwen3.5-livetranslate-flash-realtime",
        url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime",  
        callback=callback
    )
    conversation.connect()

    translation_params = TranslationParams(
        language="en",
        corpus=TranslationParams.Corpus(phrases={"Source Term 1": "Target Translation 1"})
    )

    conversation.update_session(
        output_modalities=[MultiModality.TEXT, MultiModality.AUDIO],
        input_audio_transcription_model="qwen3-asr-flash-realtime",
        voice="Tina",
        translation_params=translation_params,
    )

    def on_exit(sig, frame):
        mic.stop_stream()
        mic.close()
        speaker.stop_stream()
        speaker.close()
        pya.terminate()
        conversation.close()
        sys.exit(0)

    signal.signal(signal.SIGINT, on_exit)
    print("[Starting real-time translation] Speak into the microphone. Press Ctrl+C to exit.")

    while True:
        audio_data = mic.read(1600, exception_on_overflow=False)
        conversation.append_audio(base64.b64encode(audio_data).decode("ascii"))

if __name__ == "__main__":
    main()
```

### Image Translation - curl - Async Task

```bash
# Submit the async task
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-mt-image",
    "input": {
        "image_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250916/ordhsk/1.webp",
        "source_lang": "zh",
        "target_lang": "en",
        "ext": {
            "config": {
                "imageSegment": false
            }
        }
    }
}'

# Poll the task status using the returned task_id
curl -X GET https://dashscope.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
```

### Optical Character Recognition - Python - OpenAI Compatible

```python
from openai import OpenAI
import os

PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'},
"""

try:
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-vl-ocr-2025-11-20",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                        "min_pixels": 32 * 32 * 3,
                        "max_pixels": 32 * 32 * 8192
                    },
                    {"type": "text", "text": PROMPT_TICKET_EXTRACTION}
                ]
            }
        ])
    print(completion.choices[0].message.content)
except Exception as e:
    print(f"Error message: {e}")
```

### Conversation Analysis - Python - OpenAI Compatible

```python
from openai import OpenAI
import os

dialogue = """
[1] Agent: Hello, welcome to AB E-commerce platform. How can I help you today?
[2] Customer: Hi, I bought a food processor from your store last week, and it’s making a strange noise when running.
[3] Agent: I’m sorry for the inconvenience. Does the noise occur immediately after startup or after it’s been running for a while?
[4] Customer: It starts a few seconds after turning on, and the sound is very sharp.
[5] Agent: Understood. Could you please record a short video of it running and send it to our technical team for verification?
[6] Customer: Sure, I’ll send it shortly.
[7] Agent: Great. We’ll reply with a resolution within two hours of receiving the video.
[8] Customer: If it’s confirmed as a quality issue, can I get a direct replacement?
[9] Agent: Yes, if it’s verified as a quality issue, we’ll replace it free of charge and cover shipping.
[10] Customer: Okay, I’ll wait for your feedback. Thank you for your patient assistance.
"""

analysis_prompt = f"""
Analyze the customer's satisfaction level (Satisfied / Dissatisfied / Neutral) based on the conversation.
Output format: Satisfaction Label#Reasoning

Conversation:
{dialogue}
"""

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="tongyi-xiaomi-analysis-flash",
    messages=[{'role': 'user', 'content': analysis_prompt}],
    temperature=0,
    extra_body={"top_k": 1}
)

print(completion.choices[0].message.content)
```

## Response Format

### OpenAI Compatible (Synchronous)

```json
{
  "id": "chatcmpl-999a5d8a-f646-4039-968a-167743ae0f22",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I didn't laugh after watching this video.",
        "role": "assistant"
      }
    }
  ],
  "created": 1762346157,
  "model": "qwen-mt-plus",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 53,
    "total_tokens": 62
  }
}
```

**Key Fields**:
- `choices[].message.content` — The translated text or extracted OCR result.
- `usage.total_tokens` — Total tokens consumed for billing.

### Async Task (Image Translation)

```json
{
    "request_id": "5fec62eb-bf94-91f8-b9f4-f7f758e4e27e",
    "output": {
        "task_id": "72c52225-8444-4cab-ad0c-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-08-13 18:11:16.954",
        "end_time": "2025-08-13 18:11:23.860",
        "image_url": "http://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/xxx?Expires=xxx"
    },
    "usage": {
        "image_count": 1
    }
}
```

**Key Fields**:
- `output.task_id` — The unique identifier to poll for task status.
- `output.task_status` — Current state (e.g., PENDING, RUNNING, SUCCEEDED, FAILED).
- `output.image_url` — Temporary URL to download the translated image.

### Streaming Chunk Format (WebSocket / SSE)
For real-time translation, chunks are delivered as JSON events via WebSocket or SSE:
```json
{
  "type": "response.audio_transcript.done",
  "transcript": "Hello, how are you?",
  "event_id": "event_1718456789000"
}
```

## Error Handling

| Code | Description | Recommended Action |
|---------------|--------------------|-----------------------------|
| 400 | Bad Request - Invalid request body, missing required parameters, or invalid parameter values. | Check parameter syntax, ensure required fields like `model` and `messages` are present. |
| 401 | Unauthorized - Invalid or missing API key. | Verify `DASHSCOPE_API_KEY` is set correctly and matches the region endpoint. |
| 403 | Forbidden - Access denied or region mismatch. | Ensure your account has permissions for the specific model and region. |
| 429 | Too Many Requests - Rate limit exceeded. | Implement exponential backoff and reduce request frequency. |
| 500 | Internal Server Error - Unexpected server error. | Retry after a short delay. Contact support if persistent. |

### Rate Limits & Retry
- **Standard QPS**: 100 QPS per model for most APIs.
- **WebSocket Concurrency**: Max 10 concurrent WebSocket connections for real-time translation models.
- **Image Translation Query**: Default 1 QPS for polling the task status API. Use asynchronous callbacks for high-volume polling.
- **Retry Strategy**: For 429 and 500 errors, use exponential backoff starting at 1 second.

## Pricing & Billing

### Billing Model
Pricing varies by capability:
- **Text / OCR / Conversation Analysis**: Billed per 1,000 tokens (input and output).
- **Audio / Video Translation**: Billed per 1,000 tokens or per second of audio processing. Video tokens are calculated based on frame count and resolution.
- **Image Translation**: Billed per successfully generated image.
- **Real-time Speech (Gummy)**: Billed per second of active connection.

### Price Reference

| Model / Tier | Input Price | Output Price |
|-----------|---------|---------|
| qwen-mt-plus | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens |
| qwen-mt-flash | 0.001 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| qwen-mt-lite | 0.0005 CNY / 1K tokens | 0.001 CNY / 1K tokens |
| qwen-vl-ocr-latest | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens |
| qwen3.5-livetranslate-flash-realtime | 0.002 CNY / 1K tokens | 0.003 CNY / 1K tokens |
| gummy-realtime-v1 | 0.00015 CNY / second | 0.00015 CNY / second |
| tongyi-xiaomi-analysis-flash | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens |

### Free Tier
- Most text and OCR models include **1 million tokens free per month** for new users.
- Real-time translation models often include sufficient free quota for basic debugging.

### Usage Limits
- Max 8,192 tokens per request for Qwen-MT models.
- Max 1 minute audio duration per task for Gummy short-sentence models.
- Image translation: Max 100 MB per image, dimensions between 15x15 and 8192x8192 pixels.

### Billing Notes
- For video input, token consumption includes both audio tokens (12.5 tokens/sec) and visual tokens based on resolution.
- Speech recognition and translation features in Gummy are billed separately but at the same unit price.
- Failed API calls and errors incur no fees.

## FAQ

**Q: How do I pass `translation_options` when using the OpenAI Python SDK?**
A: The OpenAI Python SDK strictly validates standard parameters. To pass custom parameters like `translation_options`, wrap them in the `extra_body` dictionary: `extra_body={"translation_options": {"source_lang": "auto", "target_lang": "English"}}`. In Node.js or raw HTTP, pass them at the top level of the JSON payload.

**Q: What is the difference between Gummy and Qwen-LiveTranslate models?**
A: Gummy models (`gummy-realtime-v1`, `gummy-chat-v1`) are optimized for long-form and short-sentence speech recognition and translation via WebSocket, billed per second. Qwen-LiveTranslate models (`qwen3.5-livetranslate-flash-realtime`) are vision-enhanced, support voice cloning, and can process both audio and image frames from video streams for higher context accuracy.

**Q: How do I handle asynchronous image translation tasks?**
A: Submit the image URL and language pairs to the `/image2image/image-synthesis` endpoint with the `X-DashScope-Async: enable` header. The API returns a `task_id`. Poll the `/api/v1/tasks/{task_id}` endpoint until the status is `SUCCEEDED`, then download the result from the provided `image_url`.

**Q: Can I use custom terminology or translation memory with Qwen-MT?**
A: Yes. You can pass a glossary via `translation_options.terms` (array of `{source, target}` objects) and translation memory via `translation_options.tm_list`. You can also use `translation_options.domains` to provide an English prompt describing the specific industry context (e.g., legal, medical).

**Q: Why am I getting a 401 Unauthorized error?**
A: Ensure your `DASHSCOPE_API_KEY` is correctly set. Note that API keys are region-specific: keys generated for the China (Beijing) region will not work on the International (Singapore) endpoint, and vice versa. Ensure your `base_url` matches the region where your API key was provisioned.

## Source Documents

- `Audio and video translation - Qwen API reference_6273891.xdita`
- `Client events_6121319.xdita`
- `Java SDK_6279211.xdita`
- `Python SDK_6279210.xdita`
- `Qwen-LiveTranslate Java SDK_6279211.xdita`
- `Real-time audio and video translation Qwen-Livetranslate-Realtime_6121318.xdita`
- `Server events_6121320.xdita`
- `Java SDK_6224098.xdita`
- `Python SDK_6224100.xdita`
- `WebSocket API_6224101.xdita`
- `Java SDK_6224103.xdita`
- `Python SDK_6224105.xdita`
- `WebSocket API_6224104.xdita`
- `Audio and Video File Translation  Qwen_6273756.xdita`
- `Real-time audio and video translation - Qwen_6117111.xdita`
- `Real-time speech translation - Gummy_5603923.xdita`
- `Speech-to-speech_6488516.xdita`
- `Qwen-MT API reference_6222855.xdita`
- `Qwen-MT_6222855.xdita`
- `Translation capabilities Qwen-MT_5382814.xdita`
- `Machine translation Qwen-MT_5382814.xdita`
- `Qwen - image translation_6001632.xdita`
- `Qwen-MT-Image API reference_6001632.xdita`
- `Qwen-OCR_6224855.xdita`
- `Conversation Analysis Tongyi-Xiaomi-Analysis_6364320.xdita`