# bailian-model

Part of **BAILIAN**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for the following routing skills. If you're unsure which path to take, check the corresponding routing skill:

> - [Fine-tune a large language or multimodal model](../../intent/bailian-fine-model/SKILL.md)
> - [Deploy custom or fine-tuned AI models as endpoints](../../intent/bailian-deploy-model/SKILL.md)

# Bailian Model Training and Data Management

## Capabilities Overview

| Sub-capability | Models | API Pattern | Description |
|----------------|--------|-------------|-------------|
| Model Fine-Tuning | qwen3-14b, wan2.6-i2v, wan2.5-i2v-preview + 6 more | Synchronous / Async Task | Create and manage fine-tuning jobs for text and video generation models. |
| Model Deployment | qwen-plus, emo, qwen3-8b + 2 more | Synchronous | Deploy models as dedicated services for production workloads. |
| Model Import | qwen3-32b + 2 more | Async Task | Import custom models from OSS into the model management platform. |
| File Management | N/A | Synchronous | Upload, list, and manage files used for model training and inference. |
| File and Batch Operations | qwen-plus, qwen-long, qwen-turbo + 2 more | Synchronous / Async Task | Upload files and execute large-scale batch inference tasks asynchronously. |
| Response Management | N/A | Synchronous | Retrieve, list, and delete stored responses and input items. |

## Model Selection Guide

### Model Fine-Tuning

| Model ID | API Pattern |
|----------|-------------|
| qwen3-14b | Synchronous |
| qwen3-32b | Synchronous |
| qwen3-8b | Synchronous |
| qwen3-vl-8b-instruct | Synchronous |
| qwen3-vl-8b-thinking | Synchronous |
| wan2.6-i2v | Async Task |
| wan2.5-i2v-preview | Async Task |
| wan2.2-i2v-flash | Async Task |
| wan2.2-kf2v-flash | Async Task |

### Model Deployment

| Model ID | API Pattern |
|----------|-------------|
| qwen-plus | Synchronous |
| qwen-flash-2025-07-28 | Synchronous |
| qwen3-8b | Synchronous |
| emo | Synchronous |
| animate-anyone-detect | Synchronous |

### Model Import

| Model ID | API Pattern |
|----------|-------------|
| qwen3-32b | Async Task |

### File and Batch Operations

| Model ID | API Pattern |
|----------|-------------|
| qwen-plus | Synchronous |
| qwen-long | Synchronous |
| qwen-doc-turbo | Synchronous |
| qwen-turbo | Async Task |
| qwen-vl-plus | Async Task |

## API Calling Modes

### Authentication
The primary and recommended authentication method is the **Bearer Token**.
- **Header Format**: `Authorization: Bearer $DASHSCOPE_API_KEY`
- **Environment Variable**: `DASHSCOPE_API_KEY`
- Ensure your API key is associated with the correct workspace that has model deployment and training permissions.

### Service Endpoints
Endpoints vary by region and API compatibility mode:
- **China (Standard)**: `https://dashscope.aliyuncs.com/api/v1/...`
- **China (OpenAI Compatible)**: `https://dashscope.aliyuncs.com/compatible-mode/v1/...`
- **International (Standard)**: `https://dashscope-intl.aliyuncs.com/api/v1/...`
- **International (OpenAI Compatible)**: `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/...`
- **Batch Processing (China)**: `https://batch.dashscope.aliyuncs.com/compatible-mode/v1/...`

### Synchronous Pattern
Used for immediate operations like file uploads, model deployment creation, and standard fine-tuning job submissions.
1. Send a `POST` or `GET` request to the respective endpoint.
2. Parse the immediate JSON response for `request_id` and `output` or `data` fields.

### Async Task Pattern
Used for long-running operations like video model fine-tuning, custom model imports, and batch inference.
1. **Submit**: Send a `POST` request to create the job. Extract the `job_id` or `batch_id` from the response.
2. **Poll**: Send periodic `GET` requests to the status endpoint (e.g., `/api/v1/fine-tunes/{job_id}` or `/compatible-mode/v1/batches/{batch_id}`).
3. **Retrieve**: Once the status is `SUCCEEDED` or `completed`, fetch the results, checkpoints, or output file IDs.

### OpenAI Compatible Pattern
Used for Batch Chat and File Management using standard OpenAI SDKs.
1. Initialize the OpenAI client with your `DASHSCOPE_API_KEY`.
2. Set the `base_url` to the appropriate Bailian compatible endpoint.
3. Use standard OpenAI methods (e.g., `client.files.create`, `client.batches.create`).

## Parameter Reference

### Model Fine-Tuning

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| model | String | Yes | - | - | Base model ID or previously fine-tuned model ID. |
| training_file_ids | Array | Yes | - | - | List of training dataset file IDs. |
| validation_file_ids | Array | No | - | - | List of validation dataset file IDs. |
| training_type | String | No | sft | One of: cpt, sft, efficient_sft, dpo_full, dpo_lora | Fine-tuning method. |
| hyper_parameters | Map | No | Recommended defaults | - | Hyperparameters like n_epochs, batch_size, learning_rate. |
| job_name | String | No | Random UUID | - | Name of the fine-tuning job. |

### Model Deployment

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| model_name | String | Yes | - | - | ID of the model to deploy. |
| plan | String | Yes | - | One of: mu, cu, ptu, lora | Deployment billing plan. |
| name | String | Yes | - | - | Display name for the deployment in the console. |
| capacity | Number | No | - | Integer multiple of base_capacity | Number of resource units. |
| ptu_capacity | Object | No | - | Required if plan is ptu | Contains input_tpm and output_tpm. |
| deploy_spec | String | No | - | e.g., MU1 | Required if plan is mu. |
| suffix | String | No | - | Max 8 chars, globally unique | Suffix for the generated deployed model name. |

### Model Import

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| model_name | String | Yes | - | - | Base model name (e.g., qwen3-32b). |
| source | String | Yes | - | One of: oss | Import source. |
| weight_type | String | Yes | - | One of: full, lora | Training type of the imported weights. |
| storage_info | Object | Yes | - | - | Contains bucket_name and object_key. |
| storage_info.object_key | String | Yes | - | Must end with / | Path prefix in OSS for model files. |

### File Management

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| files / file | File | Yes | - | Max 1 GB per file | The file to upload. |
| purpose | String | Yes | - | One of: fine-tune, file-extract, batch | Purpose of the uploaded file. |
| descriptions | String | No | - | - | Description of the file. |

### Batch Processing

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input_file_id | String | Yes | - | Max 500 MB, 50K requests | File ID or OSS URL for batch input. |
| endpoint | String | Yes | - | e.g., /v1/chat/completions | API access path matching the input file. |
| completion_window | String | Yes | - | 24h to 336h | Maximum wait time for batch completion. |

## Code Examples

### Create Fine-Tuning Job - Bash - All Regions

```bash
curl --location "https://dashscope.aliyuncs.com/api/v1/fine-tunes" \
--header "Authorization: Bearer ${DASHSCOPE_API_KEY}" \
--header 'Content-Type: application/json' \
--data '{
    "model":"qwen3-14b",
    "training_file_ids":[
        "86a9fe7f-dd77-43b0-9834-2170e12339ec",
        "03ead352-6190-4328-8016-61821c23d4fc"
    ],
    "hyper_parameters":{
        "n_epochs":1,
        "learning_rate":1.6e-5,
        "batch_size":32,
        "split":0.8  
    },
    "training_type":"sft",
    "finetuned_output_suffix":"suffix"
}'
```
*Note: For international regions, replace the base URL with `https://dashscope-intl.aliyuncs.com/api/v1/fine-tunes`.*

### Deploy Model Service (PTU Plan) - Bash - All Regions

```bash
curl "https://dashscope.aliyuncs.com/api/v1/deployments" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "name": "my_qwen_flash",
    "model_name": "qwen-flash-2025-07-28",
    "plan": "ptu",
    "ptu_capacity": {
        "input_tpm": 10000,
        "output_tpm": 1000
    }
}'
```

### Upload File for Fine-Tuning (OpenAI Compatible) - Python - China

```python
import os
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# Upload a JSONL file for fine-tuning
file_object = client.files.create(file=Path("test.jsonl"), purpose="fine-tune")

print(file_object.model_dump_json())
```
*Note: For international regions, use `https://dashscope-intl.aliyuncs.com/compatible-mode/v1`.*

### Execute Batch Chat Inference - Python - All Regions

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://batch.dashscope.aliyuncs.com/compatible-mode/v1",
).with_options(timeout=1800.0) # Timeout: 1800s (30 min). Max: 3600s.

completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ]
)
print(completion.choices[0].message.content)
```

### Import Custom Model from OSS - Bash - All Regions

```bash
curl -X POST "https://dashscope.aliyuncs.com/api/v1/custom_models/import" \
    --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
    --header "Content-Type: application/json" \
    --data '{
        "model_name": "qwen3-32b",
        "display_name": "My LoRA fine-tuned model",
        "source": "oss",
        "weight_type": "lora",
        "storage_info": {
            "bucket_name": "my-model-bucket",
            "object_key": "models/qwen3-32b-lora/"
        }
    }'
```

### Upload Local File for Temporary OSS URL - Python - International

```python
import os
import requests
from pathlib import Path

def get_upload_policy(api_key, model_name):
    url = "https://dashscope-intl.aliyuncs.com/api/v1/uploads"
    headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
    params = {"action": "getPolicy", "model": model_name}
    response = requests.get(url, headers=headers, params=params)
    return response.json()['data']

def upload_file_to_oss(policy_data, file_path):
    file_name = Path(file_path).name
    key = f"{policy_data['upload_dir']}/{file_name}"
    with open(file_path, 'rb') as file:
        files = {
            'OSSAccessKeyId': (None, policy_data['oss_access_key_id']),
            'Signature': (None, policy_data['signature']),
            'policy': (None, policy_data['policy']),
            'x-oss-object-acl': (None, policy_data['x_oss_object_acl']),
            'x-oss-forbid-overwrite': (None, policy_data['x_oss_forbid_overwrite']),
            'key': (None, key),
            'success_action_status': (None, '200'),
            'file': (file_name, file)
        }
        requests.post(policy_data['upload_host'], files=files)
    return f"oss://{key}"

api_key = os.getenv("DASHSCOPE_API_KEY")
oss_url = upload_file_to_oss(get_upload_policy(api_key, "qwen-vl-plus"), "/tmp/cat.png")
print(f"Temporary URL: {oss_url}")
```

## Response Format

### Fine-Tuning Job Creation
```json
{
    "request_id": "635f7047-003e-4be3-b1db-6f98e239f57b",
    "output": {
        "job_id": "ft-202511272033-8ae7",
        "status": "PENDING",
        "finetuned_output": "qwen3-14b-ft-202511272033-8ae7",
        "model": "qwen3-14b",
        "base_model": "qwen3-14b",
        "training_file_ids": ["9e9ffdfa-c3bf-436e-9613-6f053c66aa6e"],
        "hyper_parameters": {
            "n_epochs": 1,
            "batch_size": 16,
            "learning_rate": "1.6e-5"
        },
        "training_type": "sft",
        "create_time": "2025-11-27 20:33:15"
    }
}
```

**Key Fields**:
- `request_id` — Unique identifier for the API request.
- `output.job_id` — The ID used to poll status, retrieve logs, and manage the job.
- `output.status` — Current state (e.g., PENDING, RUNNING, SUCCEEDED, FAILED).
- `output.finetuned_output` — The model ID generated upon successful completion, used for deployment.

### Model Deployment
```json
{
  "request_id": "f2ae64f7-83cc-410c-bc0b-840443f7eb86",
  "output": {
    "deployed_model": "emo-35b3f106-sample01",
    "status": "PENDING",
    "model_name": "emo",
    "base_capacity": 1,
    "capacity": 1,
    "ready_capacity": 0,
    "charge_type": "post_paid"
  }
}
```

**Key Fields**:
- `output.deployed_model` — The unique endpoint identifier for inference calls.
- `output.ready_capacity` — Number of instances currently ready to serve traffic.

## Error Handling

| Code | Description | Recommended Action |
|------|-------------|--------------------|
| 400 / InvalidParameter | Bad request or invalid parameter format. | Review the error message and correct parameters (e.g., missing required fields, invalid purpose). |
| 401 / InvalidApiKey | Unauthorized access or invalid API key. | Verify `DASHSCOPE_API_KEY` is correct and properly formatted in the header. |
| 403 / Forbidden | Insufficient permissions. | Ensure the API key's workspace has model deployment or training permissions. |
| 404 / NotFound | Resource does not exist. | Verify the `job_id`, `file_id`, or `deployed_model` is correct and active. |
| 408 | Request timeout. | The connection dropped (e.g., max 3600s wait). Retry the request. |
| 429 / Throttling | Rate limit exceeded. | Implement exponential backoff. Check QPS limits (e.g., 100 QPS per model, 1000 QPS for batch). |
| Conflict | Deployed model name/suffix already exists. | Specify a unique `suffix` (max 8 chars) when creating a new deployment. |
| 500 / InternalError | Unexpected server error. | Record the `request_id` and retry after a short delay. Contact support if persistent. |

### Rate Limits & Retry
- **Fine-Tuning**: Maximum 20 concurrent or succeeded fine-tune jobs per user.
- **File Upload**: 3 QPS for uploads; 10 QPS for query/list/delete.
- **Batch Processing**: 1,000 calls/minute for task creation; max 10,000 pending requests per model.
- **Temporary OSS URLs**: 100 QPS per account per model.
- **Retry Strategy**: For 429 and 500 errors, use exponential backoff. Respect `Retry-After` headers if provided.

## Requirements

- **Python SDK (DashScope)**: `pip install dashscope>=1.24.0` (Required for native DashScope features and temporary OSS uploads).
- **Python SDK (OpenAI Compatible)**: `pip install openai>=1.0.0` (Required for Batch Chat, File Management, and Batch File Input).
- **Java SDK (OpenAI Compatible)**: Use the official OpenAI Java SDK (`com.openai`).
- **Environment Variable**: `export DASHSCOPE_API_KEY=your_api_key`

## FAQ

**Q: What is the difference between `sft` and `efficient_sft` for fine-tuning?**
A: `sft` performs full-parameter supervised fine-tuning, which requires more compute but yields comprehensive model updates. `efficient_sft` uses LoRA (Low-Rank Adaptation), which trains a small number of parameters, making it faster and more cost-effective. Video models like `wan2.6-i2v` currently only support `efficient_sft`.

**Q: Why should I use the Batch Chat API instead of the standard real-time API?**
A: The Batch Chat API (`batch.dashscope.aliyuncs.com`) is designed for non-real-time, high-throughput scenarios like data annotation. It offers lower costs (often 50% of real-time pricing) and higher concurrency limits, but requests may wait in a queue for up to 3600 seconds before completion.

**Q: How do I use local images with multimodal models like Qwen-VL without hosting them publicly?**
A: Use the temporary file upload API (`/api/v1/uploads`) to get an `oss://` URL valid for 48 hours. When passing this URL to the model, you must include the HTTP header `X-DashScope-OssResourceResolve: enable` so the platform can resolve the internal OSS resource.

**Q: I received a "Conflict" error when deploying a model. How do I fix it?**
A: This occurs when the generated deployment name already exists in your workspace. Provide a unique `suffix` parameter (up to 8 characters) in your deployment request to differentiate the new instance.

**Q: Are file uploads and storage billed separately?**
A: Standard file uploads for `file-extract` or `batch` purposes via the OpenAI-compatible interface are free to upload and store. You are only billed for the input/output tokens when the model processes the file. However, native DashScope file management has a 5 GB total storage quota and 100 active file limit.

## Pricing & Billing

### Billing Model
Billing varies by operation:
- **Model Fine-Tuning**: Billed per token consumed during training (Total tokens * epochs * unit price).
- **Model Deployment**: Billed per request/token based on the selected plan (PTU, MU, CU, LoRA). Billing starts immediately upon successful deployment.
- **Batch Processing**: Billed per token at a discounted rate (typically 50% of real-time costs). Only successful requests are billed.

### Price Reference

| Model / Tier | Input Price | Output Price | Other Fees |
|--------------|-------------|--------------|------------|
| qwen3-14b (Training) | $0.0016 / 1K tokens | $0.0016 / 1K tokens | - |
| qwen3-32b (Training) | $0.008 / 1K tokens | $0.008 / 1K tokens | - |
| wan2.6-i2v (Training) | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens | - |
| qwen-plus (Batch) | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens | - |
| PTU Plan (Deployment) | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens | - |
| MU Plan (Deployment) | 0.003 CNY / 1K tokens | 0.003 CNY / 1K tokens | - |
| LoRA Plan (Deployment)| 0.001 CNY / 1K tokens | 0.001 CNY / 1K tokens | - |

### Free Tier
- 1 million tokens free per month for selected models and deployments.
- Temporary OSS storage for multimodal inputs is free for development and testing.

### Usage Limits
- **Fine-Tuning**: Max 20 concurrent/succeeded jobs per user. Max 1 GB per training file.
- **Batch Processing**: Max 500 MB input file, 50,000 requests per file, max 2 parallel tasks.
- **File Storage**: Max 10,000 files, 100 GB total quota for OpenAI-compatible file storage.

### Billing Notes
- Training jobs created via API support only token-based billing. To use training units (subscription), create the job via the console.
- Async tasks are billed upon completion. Minimum 1-hour charge may apply for long-running video generation deployments.

## Source Documents

- `API details_4759678.xdita`
- `Video generation model fine-tuning API reference_6019159.xdita`
- `API details_4759682.xdita`
- `API_4759682.xdita`
- `Model import API reference_6534759.xdita`
- `Alibaba Cloud Model Studio file management API_4759680.xdita`
- `Upload files and get temporary URLs_5501086.xdita`
- `Deploy a model using an API_4759683.xdita`
- `Model lifecycle and updates_6194423.xdita`
- `Fine-tune Qwen_6371852.xdita`
- `Fine-tune models using APIs_4759677.xdita`
- `Tune a model with the API_4759677.xdita`
- `OpenAI compatible - Batch Chat_6233674.xdita`
- `OpenAI compatible - File_5014670.xdita`
- `OpenAI-compatible - Batch Chat_6233674.xdita`
- `OpenAI-compatible - Batch file input_5106149.xdita`
- `Delete a response_6562638.xdita`