# bailian-media

Part of **BAILIAN**

# Alibaba Cloud Model Studio Image, Video, and 3D Generation

## Capabilities Overview

| Function | Models | API Pattern | Description |
|--------|------|----------|------|
| Text to Image | flux-schnell, qwen-image-2.0-pro, wan2.7-image-pro, stable-diffusion-3.5-large + 12 more | Synchronous / Async Task | Generate high-quality images from text descriptions and prompts. |
| Text to Video | wan2.7-t2v, happyhorse-1.0-t2v, kling-v3-video-generation, pixverse-c1-t2v + 6 more | Async Task | Generate dynamic video clips from text prompts. |
| Image to Video | wan2.7-i2v-2026-04-25, happyhorse-1.0-i2v, viduq3-pro_img2video, pixverse-c1-it2v + 8 more | Async Task | Animate static images into videos using a starting frame and optional prompt. |
| Keyframe to Video | wan2.2-kf2v-flash, pixverse-c1-kf2v, viduq3-turbo_start-end2video | Async Task | Generate video transitions between specified first and last frame keyframes. |
| Reference to Video | wan2.7-r2v, happyhorse-1.0-r2v, viduq3-mix_reference2video, pixverse-c1-r2v + 4 more | Async Task | Generate videos that maintain subject identity from reference images or videos. |
| Video Editing | wan2.7-videoedit, wanx2.1-vace-plus, happyhorse-1.0-video-edit | Async Task | Edit existing videos using masks, inpainting, and structural controls. |
| Image Editing | qwen-image-2.0-pro, wanx2.1-imageedit, wan2.5-i2i-preview, wan2.7-image-pro + 5 more | Synchronous / Async Task | Edit images, perform inpainting, erase elements, and apply modifications. |
| Portrait and Face | facechain-generation, facechain-finetune, wanx-style-repaint-v1 | Async Task | Generate AI portraits, train character models, and apply style repaints. |
| Virtual Try-On | aitryon, aitryon-plus, shoemodel-v1, aitryon-parsing-v1 | Async Task / Synchronous | Generate virtual try-on images for outfits and footwear. |
| Background & Outpainting | wanx-background-generation-v2, image-out-painting | Async Task | Generate image backgrounds and expand images using outpainting techniques. |
| Graphic Design | wordart-texture, wordart-semantic, wanx-poster-generation-v1 | Async Task | Create artistic text textures, WordArt transformations, and creative posters. |
| 3D Model Generation | Tripo/Tripo-H3.1, Tripo/Tripo-P1.0 | Async Task | Generate 3D models from text descriptions or image inputs. |
| Omni Realtime | qwen3.5-omni-plus-realtime, qwen3-omni-flash-realtime | WebSocket | Conduct real-time audio and video conversations. |
| Audio Understanding | qwen3-omni-30b-a3b-captioner | OpenAI Compatible | Analyze and caption audio files using Omni-Captioner. |

## Model Selection Guide

### Text to Image
| Model ID | API Pattern |
|---------|----------|
| flux-schnell, flux-dev | Async Task |
| qwen-image-2.0-pro, qwen-image-2.0 | Synchronous |
| wan2.7-image-pro, wan2.7-image, wan2.6-image | Synchronous / Async Task |
| stable-diffusion-3.5-large, stable-diffusion-xl | Async Task |
| z-image-turbo | Synchronous |

### Text to Video
| Model ID | API Pattern |
|---------|----------|
| wan2.7-t2v, wan2.6-t2v | Async Task |
| happyhorse-1.0-t2v | Async Task |
| kling/kling-v3-video-generation | Async Task |
| pixverse/pixverse-c1-t2v, pixverse-v6-t2v | Async Task |
| vidu/viduq3-pro_text2video, viduq3-turbo_text2video | Async Task |

### Image to Video
| Model ID | API Pattern |
|---------|----------|
| wan2.7-i2v-2026-04-25, wan2.6-i2v-flash | Async Task |
| happyhorse-1.0-i2v | Async Task |
| pixverse/pixverse-c1-it2v, pixverse-v6-it2v | Async Task |
| vidu/viduq3-pro_img2video, viduq3-turbo_img2video | Async Task |

### 3D Model Generation
| Model ID | API Pattern |
|---------|----------|
| Tripo/Tripo-H3.1 | Async Task |
| Tripo/Tripo-P1.0 | Async Task |

### Omni Realtime
| Model ID | API Pattern |
|---------|----------|
| qwen3.5-omni-plus-realtime | WebSocket |
| qwen3.5-omni-flash-realtime | WebSocket |
| qwen3-omni-flash-realtime | WebSocket |

## API Calling Modes

### Authentication
The primary authentication method for all Model Studio APIs is the Bearer Token.
- Header format: `Authorization: Bearer $DASHSCOPE_API_KEY`
- Environment variable: `DASHSCOPE_API_KEY`
- Obtain your API key from the Alibaba Cloud Model Studio console. Ensure you use the correct key for your target region (China Beijing vs. Singapore International).

### Service Endpoints
Endpoints are region-specific. Choose the base URL that matches your API key's region:
- **China (Beijing)**: `https://dashscope.aliyuncs.com/api/v1`
- **Singapore (International)**: `https://dashscope-intl.aliyuncs.com/api/v1`
- **OpenAI Compatible Mode**: Append `/compatible-mode/v1` to the base URL.
- **WebSocket Realtime**: `wss://dashscope.aliyuncs.com/api-ws/v1/realtime` (China) or `wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime` (International).

### Async Task Pattern
Most image and video generation models use the Asynchronous Task pattern due to long processing times.
1. **Submit Task**: Send a `POST` request to the generation endpoint. Include the header `X-DashScope-Async: enable`.
2. **Receive Task ID**: The response will immediately return a `task_id` and `task_status: PENDING`.
3. **Poll for Result**: Send a `GET` request to `/api/v1/tasks/{task_id}`. Repeat until `task_status` is `SUCCEEDED` or `FAILED`.
4. **Download Media**: Extract the temporary URL from `output.results[].url` or `output.video_url` and download the file promptly (URLs typically expire in 24 hours).

### Synchronous Pattern
Used by faster models like Qwen-Image Edit and Omni-Captioner.
- Send a standard `POST` request. The connection remains open until the generation completes, returning the final result directly in the response body.

### OpenAI Compatible Pattern
Used for Omni-Captioner and text-based multimodal models.
- Endpoint: `POST /compatible-mode/v1/chat/completions`
- Supports standard OpenAI SDKs (`openai>=1.0.0`).
- Supports streaming via Server-Sent Events (SSE) when `stream: true` is set.

### WebSocket Pattern
Used for Omni Realtime conversational models.
- Connect to the `wss://` endpoint with your Bearer token in the headers.
- Exchange JSON events for session configuration (`session.update`), audio streaming (`input_audio_buffer.append`), and receiving text/audio deltas (`response.text.delta`, `response.audio.delta`).

## Parameter Reference

### Text to Image (Wan & Qwen Series)
| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | - | The model ID (e.g., `wan2.7-image-pro`, `qwen-image-2.0-pro`). |
| messages | array | Yes | - | Single-turn only | Array containing one `user` message with `text` and optional `image` content. |
| size | string | No | 2K / 2048*2048 | Model specific | Output resolution. E.g., `1K`, `2K`, `4K`, or `width*height`. |
| n | integer | No | 1 | 1-6 | Number of images to generate. |
| prompt_extend | boolean | No | true | - | Enables intelligent prompt rewriting via LLM. |
| watermark | boolean | No | false | - | Adds an "AI Generated" or model-specific watermark. |

### Video Generation (Wan, Kling, PixVerse, Vidu)
| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | - | The model ID (e.g., `wan2.7-t2v`, `kling/kling-v3-video-generation`). |
| input.prompt | string | Varies | - | Max 5000 chars | Text description of the video. |
| input.media | array | Varies | - | - | Array of media assets (e.g., `first_frame`, `reference_image`, `video`). |
| parameters.resolution | string | No | 720P / 1080P | 480P, 720P, 1080P | Target resolution tier. |
| parameters.duration | integer | No | 5 | 2-15 | Duration of the generated video in seconds. |
| parameters.ratio | string | No | 16:9 | 16:9, 9:16, 1:1, etc. | Aspect ratio of the output video. |
| parameters.prompt_extend | boolean | No | true | - | Enables intelligent prompt rewriting. |

### 3D Model Generation (Tripo)
| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| model | string | Yes | - | Tripo/Tripo-H3.1, Tripo/Tripo-P1.0 | The 3D generation model ID. |
| input.prompt | string | Mutually Exclusive | - | Max 1024 chars | Text description for text-to-3D. |
| input.image | string | Mutually Exclusive | - | JPEG/PNG, <=20MB | URL for single-image-to-3D. |
| input.images | array | Mutually Exclusive | - | 2-4 images | URLs for multi-image-to-3D. |
| parameters.texture_quality | string | No | standard | standard, detailed | Quality of the texture mapping. |
| parameters.pbr | boolean | No | true | - | Generate Physically Based Rendering materials. |

## Code Examples

### Text to Image - Python - China

```python
import json
import os
import dashscope
from dashscope import MultiModalConversation

# Use this URL for Beijing region. For Singapore region, replace with: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"text": "A winter street scene in Beijing featuring two adjacent traditional Chinese shops with gray-tiled roofs and vermilion-red exterior walls standing side by side."}
        ]
    }
]

api_key = os.getenv("DASHSCOPE_API_KEY")

response = MultiModalConversation.call(
    api_key=api_key,
    model="qwen-image-2.0-pro",
    messages=messages,
    result_format='message',
    stream=False,
    watermark=False,
    prompt_extend=True,
    negative_prompt="Low resolution, low quality, distorted limbs.",
    size='2048*2048'
)

if response.status_code == 200:
    print(json.dumps(response, ensure_ascii=False))
else:
    print(f"HTTP status code: {response.status_code}")
    print(f"Error code: {response.code}")
    print(f"Error message: {response.message}")
```

### Text to Video - curl - China

```bash
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.7-t2v",
    "input": {
        "prompt": "A tense detective story with cinematic storytelling. Shot 1 [0–3 seconds] wide shot: Rainy New York street at night, neon lights flicker, a detective in a black trench coat walks briskly."
    },
    "parameters": {
        "resolution": "720P",
        "ratio": "16:9",
        "prompt_extend": true,
        "watermark": true,
        "duration": 15
    }
}'
```

### Image to Video - Python - China

```python
# -*- coding: utf-8 -*-
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope
import os

dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'
api_key = os.getenv("DASHSCOPE_API_KEY")

media = [
    {
        "type": "first_frame",
        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
    },
    {
        "type": "driving_audio",
        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    }
]

def sample_sync_call():
    print('----Making a synchronous call. Please wait.----')
    rsp = VideoSynthesis.call(
        api_key=api_key,
        model="wan2.7-i2v-2026-04-25",
        media=media,
        resolution="720P",
        duration=10,
        watermark=True,
        prompt="A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall.",
    )
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))

if __name__ == '__main__':
    sample_sync_call()
```

### 3D Generation - curl - China

```bash
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/3d-generation' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "Tripo/Tripo-P1.0",
    "input": {
        "prompt": "a cute cat"
    },
    "parameters": {
        "texture_quality": "standard"
    }
}'
```

### Omni Realtime - Python - WebSocket

```python
# pip install websocket-client
import json, websocket, os

API_KEY = os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-omni-plus-realtime"
headers = ["Authorization: Bearer " + API_KEY]

def on_open(ws): 
    print(f"Connected to server: {API_URL}")

def on_message(ws, message): 
    print("Received event:", json.dumps(json.loads(message), indent=2))

def on_error(ws, error): 
    print("Error:", error)

ws = websocket.WebSocketApp(API_URL, header=headers, on_open=on_open, on_message=on_message, on_error=on_error)
ws.run_forever()
```

## Response Format

### Async Task Success Response
```json
{
    "request_id": "caa62a12-8841-41a6-8af2-xxxxxx",
    "output": {
        "task_id": "eff1443c-ccab-4676-aad3-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-09-29 14:18:52.331",
        "scheduled_time": "2025-09-29 14:18:59.290",
        "end_time": "2025-09-29 14:23:39.407",
        "video_url": "https://dashscope-result-sh.oss-accelerate.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "duration": 10,
        "size": "1280*720",
        "video_count": 1,
        "SR": 720
    }
}
```

**Key Fields**:
- `output.task_id` — Unique identifier for the asynchronous task.
- `output.task_status` — Current state: `PENDING`, `RUNNING`, `SUCCEEDED`, or `FAILED`.
- `output.video_url` / `output.results[].url` — Temporary URL to download the generated media.
- `usage.duration` / `usage.image_count` — Metrics used for billing and tracking.

### Streaming Chunk Format (OpenAI Compatible)
```json
{
  "choices": [
    {
      "delta": {
        "audio": {
          "data": "/v8AAAAAAAAAAAAAAA...",
          "expires_at": 1757647879,
          "id": "audio_a68eca3b-c67e-4666-a72f-73c0b4919860"
        }
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "object": "chat.completion.chunk",
  "model": "qwen3.5-omni-plus",
  "id": "chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"
}
```

## Error Handling

| Code | Description | Recommended Action |
|---------------|--------------------|-----------------------------|
| InvalidApiKey | The provided API key is invalid or missing. | Verify your `DASHSCOPE_API_KEY` and ensure it matches the target region. |
| InvalidParameter | One or more parameters are invalid or out of range. | Check the request body for correct syntax, valid URLs, and parameter constraints. |
| Throttling | Request rate limit exceeded. | Implement exponential backoff and retry logic. Reduce QPS. |
| DataInspectionFailed | Input or output content failed safety moderation. | Review your prompt and input media for prohibited or sensitive content. |
| IPInfringementSuspect | Content suspected of infringing intellectual property. | Modify the input prompt or reference media to avoid copyrighted material. |
| InternalError.Timeout | Internal server timeout during processing. | Retry the request after a short delay. |

### Rate Limits & Retry
- **General Limit**: Most models allow up to 100 QPS per API key.
- **Specific Limits**: Some legacy or specialized models (e.g., FaceChain, Outpainting) have stricter limits like 2-5 QPS and 1-5 concurrent tasks.
- **Async Task Query Limit**: Polling the `/tasks/{task_id}` endpoint is generally limited to 20 RPS. For high-frequency production environments, configure asynchronous task callbacks via EventBridge instead of aggressive polling.

## Requirements

- **Python SDK**: `pip install dashscope>=1.25.16` (for native DashScope APIs) or `pip install openai>=1.52.0` (for OpenAI compatible endpoints).
- **Java SDK**: `dashscope-sdk-java>=2.22.14`
- **Node.js**: `openai>=1.0.0` (for compatible endpoints) or `websocket-client` for realtime APIs.
- **Environment Variables**: 
  ```bash
  export DASHSCOPE_API_KEY="your_api_key_here"
  ```

## FAQ

**Q: How do I handle long-running video generation tasks?**
A: Video generation uses the Async Task pattern. Submit your request with the `X-DashScope-Async: enable` header, save the returned `task_id`, and poll the `GET /api/v1/tasks/{task_id}` endpoint every 5-15 seconds until the status is `SUCCEEDED`. Alternatively, configure EventBridge callbacks to receive push notifications upon completion.

**Q: What image formats and sizes are supported for image-to-video?**
A: Most models support JPEG, PNG, BMP, and WEBP formats. The resolution typically must be between 240x240 and 4096x4096 pixels, and the file size must not exceed 10 MB to 20 MB depending on the specific model. Aspect ratios should generally fall between 1:4 and 4:1.

**Q: How does prompt rewriting (`prompt_extend`) work?**
A: When `prompt_extend` is set to `true`, an underlying LLM automatically expands and optimizes your short text prompt into a highly detailed description. This significantly improves visual quality and prompt adherence, especially for brief inputs, but adds 3-5 seconds of latency to the generation process.

**Q: Why did my task fail with `DataInspectionFailed`?**
A: Alibaba Cloud enforces strict content moderation policies. This error occurs if your text prompt, input image, or the generated output contains sensitive, violent, explicit, or politically restricted content. Revise your inputs to comply with safety guidelines and retry.

**Q: Do generated media URLs expire?**
A: Yes. The temporary OSS URLs provided in the `output.video_url` or `output.results[].url` fields are typically valid for exactly 24 hours after task completion. You must download and store the files in your own permanent storage (like Alibaba Cloud OSS) before the URL expires.

## Pricing & Billing

### Billing Model
Pricing varies by modality:
- **Images**: Billed per successfully generated image (per request).
- **Videos**: Billed per second or minute of generated video duration.
- **Omni/Audio**: Billed per 1,000 tokens (audio duration is converted to tokens based on specific model rules).
Failed calls and content moderation rejections do not incur charges.

### Price Reference

| Model / Tier | Input Price | Output Price | Other Fees |
|-----------|---------|---------|---------|
| wan2.7-image-pro | 0.002 CNY / image | 0.002 CNY / image | - |
| qwen-image-2.0-pro | 0.002 CNY / image | 0.002 CNY / image | - |
| wan2.7-t2v | 0.002 CNY / second | 0.002 CNY / second | - |
| kling/kling-v3-video-generation | 0.002 CNY / second | 0.002 CNY / second | - |
| Tripo/Tripo-P1.0 | 0.005 CNY / request | 0.005 CNY / request | - |
| qwen3.5-omni-plus | 0.002 CNY / 1K tokens | 0.003 CNY / 1K tokens | Audio: duration × 7 tokens (in), 12.5 tokens (out) |

### Free Tier
- **Images**: Many models (e.g., Qwen-Image, Wan-Image) offer 100 to 1,000 free requests per month.
- **Videos**: Wan and Kling video models often include 1 million free seconds or 1,800 free seconds per month for new users.
- **Omni/Audio**: 1 million free tokens per month for Qwen-Omni models.

### Usage Limits
- Maximum video duration per request: Typically 5 to 15 seconds.
- Maximum image resolution: Up to 4K (4096x4096) for advanced models like Wan2.7-Image-Pro.
- Concurrent tasks: Varies by model, generally up to 10 concurrent async tasks per API key.

### Billing Notes
- Asynchronous tasks are billed only upon successful completion (`SUCCEEDED`).
- For video models, billing is strictly based on the `output_video_duration` in seconds.
- Audio token conversion rule: 1 second of audio = 12.5 tokens (for most Omni models). Durations under 1 second are billed as 1 second.

## Source Documents

- `FLUX text-to-image_5100714.xdita`
- `Kling - Image Generation_6485444.xdita`
- `Qwen - text-to-image_5992838.xdita`
- `Stable Diffusion 3.5 API reference_5277366.xdita`
- `StableDiffusion 1.5 API Reference_4759726.xdita`
- `Text-to-image Stable Diffusion_4759725.xdita`
- `Wan - text-to-image V2_5416547.xdita`
- `Wan2.6 - image generation and editing_6286495.xdita`
- `Wan2.7 - image generation and editing_6489304.xdita`
- `Wanx text-to-image V1_4759694.xdita`
- `Wanxiang - Sketch to Image_4759705.xdita`
- `Z-Image_6290326.xdita`
- `Text-to-image_5191242.xdita`
- `Integrate multimodal generation models_6546109.xdita`
- `HappyHorse - text-to-video_6529127.xdita`
- `Kling - Video Generation_6485439.xdita`
- `PixVerse text-to-video_6465333.xdita`
- `Video generation_5580663.xdita`
- `Vidu - Text-to-Video_6478140.xdita`
- `Wan - text-to-video API reference 2.1-2.6_6533590.xdita`
- `Wan - text-to-video_5454563.xdita`
- `Text-to-video_6374996.xdita`
- `Video Generation_5526595.xdita`
- `HappyHorse - image-to-video-first frame_6529128.xdita`
- `PixVerse - Image-to-video based on the first frame_6465334.xdita`
- `Vidu Image-to-video from an initial frame_6478141.xdita`
- `Wan - image-to-video - first frame 2.1-2.6_5477072.xdita`
- `Wan 2.7 - image-to-video API_6449076.xdita`
- `Image-to-video_6424088.xdita`
- `Image-to-video first frame_6278422.xdita`
- `PixVerse Image-to-video from first and last frames_6465336.xdita`
- `Vidu - Image-to-video - First and last frame_6478142.xdita`
- `Wan - image-to-video-first and last frames 2.2_5602917.xdita`
- `Image-to-video first and last frames_6374997.xdita`
- `HappyHorse - reference-to-video_6535588.xdita`
- `PixVerse reference-to-video_6465337.xdita`
- `Vidu - reference video_6478169.xdita`
- `Wan - reference-to-video 2.6_6533585.xdita`
- `Wan - reference-to-video API reference_6286500.xdita`
- `Reference-to-video_6400742.xdita`
- `Wan2.7 - video editing_6409934.xdita`
- `Video editing 2.1_6383169.xdita`
- `Video editing 2.7_6496455.xdita`
- `Image editing - Qwen_6002391.xdita`
- `Image editing - Wan2.1_5537848.xdita`
- `Image editing - Wan2.72.62.5_6205099.xdita`
- `Image erase completion API reference_5088945.xdita`
- `Wan2.1 - general image editing_5495870.xdita`
- `Wan2.5 - general image editing_6107277.xdita`
- `Wanxiang - Image Inpainting_4945657.xdita`
- `Image Background Generation_4759704.xdita`
- `Image outpainting_4944555.xdita`
- `Creative WordArt Jinshu API_4759715.xdita`
- `Text Texture Generation API Details_4759716.xdita`
- `WordArt Transformation API Details_4759714.xdita`
- `Creative poster generation_4964707.xdita`
- `Human Image Detection API Details_4759709.xdita`
- `Human instance segmentation_5088944.xdita`
- `FaceChain Portraits_4759660.xdita`
- `AI portrait generation API details_4759712.xdita`
- `Character image training API details_4759711.xdita`
- `Getting started_4759710.xdita`
- `Portrait Stylization_4759697.xdita`
- `OutfitAnyone_4944028.xdita`
- `OutfitAnyone - Basic Edition_4944029.xdita`
- `OutfitAnyone - Image Refinement_4944163.xdita`
- `OutfitAnyone-Parsing_5454561.xdita`
- `OutfitAnyone-Plus API reference_5628411.xdita`
- `Virtual Model_4944974.xdita`
- `Footwear model_4959828.xdita`
- `Image Erase and Inpaint_5193754.xdita`
- `Image Inpainting_5192936.xdita`
- `Image background generation_5192122.xdita`
- `Image outpainting_5193207.xdita`
- `Portrait style repainting_5193233.xdita`
- `Human instance segmentation_5193604.xdita`
- `Virtual Model Generation_5193626.xdita`
- `Footwear model_5191945.xdita`
- `Doodle_5191949.xdita`
- `Creative poster generation_5192931.xdita`
- `AnimateAnyone action template generation_4968908.xdita`
- `AnimateAnyone video generation API reference_4932705.xdita`
- `Generate dance videos from images - AnimateAnyone_4932700.xdita`
- `Image to broadcast video - LivePortrait_5290704.xdita`
- `LivePortrait video generation_5290711.xdita`
- `Emoji video generation_5465055.xdita`
- `Image to emoji video - Emoji_5465052.xdita`
- `EMO video generation API reference_4932702.xdita`
- `Image-to-Singing-and-Acting Video EMO_4932703.xdita`
- `Lip-sync replacement for videos - VideoRetalk_5380971.xdita`
- `VideoRetalk video generation_5380973.xdita`
- `Wan - digital human_6005323.xdita`
- `wan2.2-s2v video generation_6005326.xdita`
- `Wan image-to-action API reference_6103652.xdita`
- `Wan - video character swap API reference_6107009.xdita`
- `Wan - video editing 2.1_5716314.xdita`
- `Wan - video effects - first frame_5823050.xdita`
- `AnimateAnyone image detection_4932707.xdita`
- `LivePortrait image detection_5290705.xdita`
- `Emoji image detection_5465051.xdita`
- `EMO image detection API reference_4932704.xdita`
- `wan2.2-s2v image detection_6005325.xdita`
- `Fine-tune a Wan video generation model_6019158.xdita`
- `Tripo 3D model generation_6534723.xdita`
- `3D model generation_6547928.xdita`
- `Non-real-time Qwen-Omni_5480135.xdita`
- `Real-time Qwen-Omni-Realtime_5603725.xdita`
- `Audio understanding Qwen3-Omni-Captioner_6100371.xdita`