# bailian-search

Part of **BAILIAN**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for the following routing skills. If you're unsure which path to take, check the corresponding routing skill:

> - [Build RAG knowledge bases and retrieval pipelines](../../intent/bailian-build-system/SKILL.md)
> - [Integrate external tools, MCP servers, and web search into AI agents](../../intent/bailian-integrate-mcp/SKILL.md)

# Model Studio Search, Retrieval, and Embeddings

## Capabilities Overview

| Sub-capability | Models | API Pattern | Description |
|----------------|--------|-------------|-------------|
| Web Search | qwen-plus, qwen3-max, qwen-turbo, + 45 more | OpenAI Compatible | Augment LLM responses with real-time information retrieved from the internet. |
| Knowledge Retrieval | qwen3.6-plus, qwen3.5-plus, qwen3.6-flash, + 10 more | OpenAI Compatible (Streaming) | Retrieve relevant chunks and answers from custom enterprise knowledge bases. |
| Search by Image | qwen3.6-plus, qwen3.5-plus, qwen3.6-flash, + 4 more | OpenAI Compatible (Streaming) | Perform reverse image search to find visually similar images in a database. |
| Text to Image Search | qwen3.6-plus, qwen3.5-plus, qwen3.6-flash, + 4 more | OpenAI Compatible (Streaming) | Search for images in a database using natural language text descriptions. |
| Multimodal Embedding | qwen3-vl-embedding, tongyi-embedding-vision-plus, multimodal-embedding-v1, + 6 more | Synchronous | Generate vector embeddings for text and image inputs for retrieval tasks. |
| Text Reranking | qwen3-rerank, gte-rerank-v2, qwen3-vl-rerank | Synchronous | Rerank retrieved documents to improve search and RAG accuracy. |
| Intent and NLU | tongyi-intent-detect-v3, opennlu-v1 | OpenAI Compatible | Perform intent recognition and natural language understanding for routing and function calling. |
| Text and Multimodal Embedding | text-embedding-v4, text-embedding-v3, text-embedding-v2, + 12 more | Synchronous | Convert text and images into dense vector representations. |
| Rerank | qwen3-rerank, qwen3-vl-rerank, gte-rerank-v2 | OpenAI Compatible | Rerank retrieved documents based on relevance to a query. |
| Create Embeddings | text-embedding-async-v1, text-embedding-async-v2 | Async Task | Generate vector representations of text for semantic search and clustering. |
| GUI Automation | gui-plus, gui-plus-2026-02-26 | OpenAI Compatible | Generate user interface interactions and automate GUI tasks. |

## Model Selection Guide

### Web Search

| Model ID | API Pattern |
|----------|-------------|
| qwen3.7-max, qwen3.7-max-preview, qwen3.7-max-2026-05-17+, qwen3.7-max-2026-05-20, qwen3.7-max-2026-05-17 | OpenAI Compatible |
| qwen3.6-max-preview, qwen3.6-plus, qwen3.6-plus-2026-04-02+, qwen3.6-plus-2026-04-02 | OpenAI Compatible |
| qwen3.6-flash, qwen3.6-flash-2026-04-16+, qwen3.6-flash-2026-04-16 | OpenAI Compatible |
| qwen3.5-plus, qwen3.5-plus-2026-02-15+, qwen3.5-plus-2026-02-15, qwen35plus-li-001 | OpenAI Compatible |
| qwen3.5-flash, qwen3.5-flash-2026-02-23+, qwen3.5-flash-2026-02-23 | OpenAI Compatible |
| qwen3.5-omni-plus, qwen3.5-omni-plus-2026-03-15, qwen3.5-omni-flash, qwen3.5-omni-flash-2026-03-15 | OpenAI Compatible |
| qwen3.5-omni-plus-realtime, qwen3.5-omni-plus-realtime-2026-03-15, qwen3.5-omni-flash-realtime, qwen3.5-omni-flash-realtime-2026-03-15 | OpenAI Compatible |
| qwen3-max, qwen3-max-2025-09-23+, qwen3-max-2026-01-23, qwen3-max-2025-09-23, qwen3-max-preview | OpenAI Compatible |
| qwen3-coder-next | OpenAI Compatible |
| qwen-max, qwen-max-latest, qwen-max-2024-09-19+ | OpenAI Compatible |
| qwen-plus, qwen-plus-latest, qwen-plus-2025-07-14+, qwen-plus-character | OpenAI Compatible |
| qwen-flash, qwen-flash-2025-07-28+, qwen-flash-character, qwen-flash-character-2026-02-26 | OpenAI Compatible |
| qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-07-15 | OpenAI Compatible |
| qwq-plus | OpenAI Compatible |
| deepseek-v4-pro, deepseek-v4-flash, deepseek-v3.2, deepseek-v3.2-exp, deepseek-v3.1, deepseek-r1-0528, deepseek-r1, deepseek-v3 | OpenAI Compatible |
| Moonshot-Kimi-K2-Instruct, MiniMax-M2.1 | OpenAI Compatible |

### Knowledge Retrieval

| Model ID | API Pattern |
|----------|-------------|
| qwen3.6-plus, qwen3.6-flash, qwen3.6-7b, qwen3.6-14b, qwen3.6-32b, qwen3.6-72b, qwen3.6-27b | OpenAI Compatible (Streaming) |
| qwen3.5-plus, qwen3.5-flash, qwen3.5-7b, qwen3.5-14b, qwen3.5-32b, qwen3.5-72b | OpenAI Compatible (Streaming) |

### Search by Image & Text to Image Search

| Model ID | API Pattern |
|----------|-------------|
| qwen3.6-plus, qwen3.6-flash, qwen3.6-27b, qwen3.6-open-source | OpenAI Compatible (Streaming) |
| qwen3.5-plus, qwen3.5-flash, qwen3.5-open-source | OpenAI Compatible (Streaming) |

### Multimodal Embedding & Text and Multimodal Embedding

| Model ID | API Pattern |
|----------|-------------|
| qwen3-vl-embedding, qwen3-vl-embedding-code-id | Synchronous |
| qwen2.5-vl-embedding, qwen2.5-vl-embedding-code-id | Synchronous |
| tongyi-embedding-vision-plus, tongyi-embedding-vision-plus-2026-03-06 | Synchronous |
| tongyi-embedding-vision-flash, tongyi-embedding-vision-flash-2026-03-06 | Synchronous |
| multimodal-embedding-v1 | Synchronous |
| text-embedding-v4, text-embedding-v3, text-embedding-v2, text-embedding-v1 | Synchronous |

### Text Reranking & Rerank

| Model ID | API Pattern |
|----------|-------------|
| qwen3-rerank | Synchronous / OpenAI Compatible |
| qwen3-vl-rerank | Synchronous / OpenAI Compatible |
| gte-rerank-v2 | Synchronous / OpenAI Compatible |

### Intent and NLU

| Model ID | API Pattern |
|----------|-------------|
| tongyi-intent-detect-v3 | OpenAI Compatible |
| opennlu-v1 | Synchronous |

### Create Embeddings (Batch)

| Model ID | API Pattern |
|----------|-------------|
| text-embedding-async-v1, text-embedding-async-v2 | Async Task |

### GUI Automation

| Model ID | API Pattern |
|----------|-------------|
| gui-plus, gui-plus-2026-02-26 | OpenAI Compatible |

## API Calling Modes

### Authentication
The primary and recommended authentication method is the **Bearer Token**.
- **Header Format**: `Authorization: Bearer $DASHSCOPE_API_KEY`
- **Environment Variable**: `DASHSCOPE_API_KEY`

### Service Endpoints
- **China (Beijing)**: 
  - OpenAI Compatible: `https://dashscope.aliyuncs.com/compatible-mode/v1`
  - DashScope Native: `https://dashscope.aliyuncs.com/api/v1`
- **International (Singapore)**: 
  - OpenAI Compatible: `https://dashscope-intl.aliyuncs.com/compatible-mode/v1`
  - DashScope Native: `https://dashscope-intl.aliyuncs.com/api/v1`

### OpenAI Compatible
Use the standard OpenAI SDK by changing the `base_url` to the Model Studio endpoint and providing your `DASHSCOPE_API_KEY`. This pattern supports synchronous and streaming chat completions, embeddings, and responses.

### OpenAI Compatible (Streaming)
Set `stream=True` in your SDK client or HTTP request to receive Server-Sent Events (SSE). Parse the `event.type` and `event.delta` fields to reconstruct the response in real time.

### Synchronous
Single POST request with an immediate JSON response. Used primarily for embeddings, reranking, and NLU tasks via the DashScope SDK or native HTTP API.

### Async Task
Used for batch processing (e.g., large-scale text embedding).
1. **Submit**: Send a POST request with the `X-DashScope-Async: enable` header.
2. **Poll**: Check the task status using the returned `task_id` via the GET tasks endpoint.
3. **Retrieve**: Once `task_status` is `SUCCEEDED`, download the results from the provided output URL.

## Parameter Reference

### Web Search

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| enable_search | boolean | false | false | - | Enables web search for the request. |
| search_strategy | string | false | turbo | one of: turbo, max, agent, agent_max | Defines the level of search depth and quality. |
| freshness | integer | false | - | range 7-365 | Limits search results to content published within the specified number of days. |
| assigned_site_list | array | false | [] | max length 25 | Restricts search results to specific domains. |
| forced_search | boolean | false | false | - | Forces the model to perform a web search regardless of input. |
| enable_source | boolean | false | false | - | Includes search result sources in the response. |
| enable_citation | boolean | false | false | - | Adds citation markers to the response content. |

### Knowledge Retrieval (Responses API)

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| type | string | true | - | Must be "file_search" | Specifies the tool type for knowledge retrieval. |
| vector_store_ids | array | true | - | Only one valid ID supported | A list containing your knowledge base ID. |

### Image Search Tools (Responses API)

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| tools | array | true | - | image_search or web_search_image | List of tools to enable. |
| input | array / string | true | - | - | Text prompt or multimodal array containing text and image URLs. |
| stream | boolean | false | false | true / false | Enables streaming output for intermediate results. |

### Text & Multimodal Embeddings

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| model | string | true | - | See model list | The model ID to use for generating embeddings. |
| input | string / array | true | - | Max 10 items, 8192 tokens each | The text, image URL, or video URL to embed. |
| dimensions | integer | false | 1024 | 2048, 1536, 1024, 768, 512, 256, 128, 64 | Specifies the vector dimension (v3/v4 only). |
| text_type | string | false | document | query / document | Specifies the role of the input text for asymmetric retrieval. |
| output_type | string | false | dense | dense / sparse / dense&sparse | Specifies the type of vector output. |
| enable_fusion | boolean | false | false | true / false | Fuses multiple multimodal inputs into a single embedding. |

### Reranking

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| query | string / object | true | - | Max 4000 tokens | The search query string or multimodal object. |
| documents | array | true | - | Max 500 text, 40 images, 4 videos | Candidate documents to sort. |
| top_n | integer | false | all | 1 to total documents | Number of top-ranked documents to return. |
| return_documents | boolean | false | false | true / false | Whether to return document text in results. |
| instruct | string | false | - | English text | Custom instruction to guide the sorting policy. |

### Intent and NLU

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| messages | array | true | - | First message must be system | Conversation history defining intent and tools. |
| task | string | false | extraction | extraction / classification | Task type for OpenNLU. |
| sentence | string | true | - | Max 1024 tokens | Text content to process for OpenNLU. |
| labels | string | true | - | Comma-separated | Extraction targets or classification categories. |

### Async Batch Embeddings

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.url | string | true | - | Max 100,000 lines, 200 MB | HTTP URL of the UTF-8 text file for batch vectorization. |
| parameters.text_type | string | false | document | query / document | Role of the text for downstream tasks. |

## Code Examples

### Web Search - Python - China

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather in Hangzhou tomorrow?"}
    ],
    extra_body={"enable_search": True}
)
print(completion.choices[0].message.content)
```

### Knowledge Retrieval - Python - China

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

response = client.responses.create(
    model="qwen3.6-plus",
    input="Tell me about the Alibaba Cloud Model Studio X1 phone",
    tools=[
        {
            "type": "file_search",
            "vector_store_ids": ["your_knowledge_base_id"]
        }
    ]
)

print("[Model Response]")
print(response.output_text)
print(f"\n[Token Usage] Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}, Total: {response.usage.total_tokens}")
```

### Text Embedding - Python - China

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

completion = client.embeddings.create(
    model="text-embedding-v4",
    input="The quality of the clothes is excellent",
    dimensions=1024
)

print(completion.model_dump_json())
```

### Rerank - curl - China

```bash
curl --request POST \
  --url https://dashscope.aliyuncs.com/compatible-api/v1/reranks \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "qwen3-rerank",
    "query": "What is a reranking model?",
    "documents": [
      "Reranking models are widely used in search engines and recommendation systems to sort candidate texts by relevance.",
      "Quantum computing is a cutting-edge field in computer science.",
      "The development of pretrained language models has led to new advancements in reranking models."
    ],
    "top_n": 2
}'
```

### Multimodal Embedding - Python - China

```python
import dashscope
import json
import os
from http import HTTPStatus

text = "This is a test text for generating a multimodal fused embedding."
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"

input_data = [
    {"text": text},
    {"image": image},
    {"video": video}
]

resp = dashscope.MultiModalEmbedding.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen3-vl-embedding",
    input=input_data,
    enable_fusion=True
)

print(json.dumps(resp, ensure_ascii=False, indent=4))
```

### Search by Image - Python - China

```python
import os
import json
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

input_content = [
    {"type": "input_text", "text": "Find landscape images with a style similar to this one"},
    {"type": "input_image", "image_url": "https://img.alicdn.com/imgextra/i4/O1CN01YbrnSS1qtmsAkw0Ud_!!6000000005554-2-tps-788-450.png"}
]

response = client.responses.create(
    model="qwen3.6-plus",
    input=[{"role": "user", "content": input_content}],
    tools=[{"type": "image_search"}]
)

for item in response.output:
    if item.type == "image_search_call":
        print(f"[Tool Call] Search by image (status: {item.status})")
        if item.output:
            images = json.loads(item.output)
            print(f"  Found {len(images)} images:")
            for img in images[:5]:
                print(f"  [{img['index']}] {img['title']}")
                print(f"      {img['url']}")
    elif item.type == "message":
        print(f"\n[Model Response]")
        print(response.output_text)
```

### Intent Recognition - Python - China

```python
import os
import json
from openai import OpenAI

tools = [
    {
        "name": "get_current_weather",
        "description": "This is useful when you want to query the weather of a specified city.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "A city or district."}
            },
            "required": ["location"]
        }
    }
]

system_prompt = f"""You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You may call one or more tools to assist with the user query. The tools you can use are as follows:
{json.dumps(tools, ensure_ascii=False)}
Response in INTENT_MODE."""

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [
    {'role': 'system', 'content': system_prompt},
    {'role': 'user', 'content': "Weather in Hangzhou"}
]

response = client.chat.completions.create(
    model="tongyi-intent-detect-v3",
    messages=messages
)

print(response.choices[0].message.content)
```

### Async Batch Embedding - Python - All

```python
from dashscope import BatchTextEmbedding

result = BatchTextEmbedding.call(
    BatchTextEmbedding.Models.text_embedding_async_v1,
    url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241016/nigwvr/text_embedding_file.txt",
    text_type="document"
)
print(result)
```

## Response Format

### Text Embedding Success Response

```json
{
  "data": [
    {
      "embedding": [-0.0695386752486229, 0.030681096017360687],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-v4",
  "object": "list",
  "usage": {
    "prompt_tokens": 184,
    "total_tokens": 184
  },
  "id": "73591b79-d194-9bca-8bb5-xxxxxxxxxxxx"
}
```

**Key Fields**:
- `data[].embedding` — The dense vector representation of the input text.
- `data[].index` — The index of the input text corresponding to the request array.
- `usage.total_tokens` — Total tokens consumed by the request.

### Rerank Success Response

```json
{
  "object": "list",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9334521178273196
    },
    {
      "index": 2,
      "relevance_score": 0.34100082626411193
    }
  ],
  "model": "qwen3-rerank",
  "usage": {
    "total_tokens": 79
  }
}
```

**Key Fields**:
- `results[].index` — The original index of the document in the input array.
- `results[].relevance_score` — The calculated relevance score (higher is better).
- `usage.total_tokens` — Total tokens processed for query and documents.

## Error Handling

| Code | Description | Recommended Action |
|------|-------------|--------------------|
| 400 / InvalidParameter | Bad Request or invalid parameters. | Check parameter values, constraints, and JSON structure. |
| 401 / InvalidApiKey | Unauthorized or invalid API key. | Verify `DASHSCOPE_API_KEY` is set correctly in the environment. |
| 429 / Throttling | Rate limit exceeded. | Implement exponential backoff and retry logic. |
| 500 | Internal Server Error. | Retry the request after a short delay. |
| UnsupportedModel | The model does not support the requested feature. | Verify the model ID against the supported models list. |

### Rate Limits & Retry
- **Standard Limit**: 100 QPS per model for most embedding and reranking APIs.
- **Web Search Limit**: 15 RPS per Alibaba Cloud account. Exceeding this silently skips the search without returning an error.
- **Async Tasks**: 1 QPS for task submission, maximum 3 concurrent tasks.
- **Retry Strategy**: Use exponential backoff for 429 and 500 errors. Respect the `Retry-After` header if provided.

## Pricing & Billing

### Billing Model
Most embedding and NLU models are billed **per token** (input and output). Web search and image search tools incur additional **per request** or **per call** fees on top of the base model token costs.

### Price Reference

| Model / Tier | Input Price | Output Price | Other Fees |
|--------------|-------------|--------------|------------|
| text-embedding-v4 | CNY 0.0005 / 1K tokens | - | - |
| text-embedding-v3 | CNY 0.0005 / 1K tokens | - | - |
| qwen3-vl-embedding | CNY 0.0007 / 1K tokens (Text) | - | CNY 0.0018 / 1K tokens (Image/Video) |
| qwen3-rerank | CNY 0.002 / 1K tokens | CNY 0.002 / 1K tokens | - |
| Web Search (turbo) | - | - | CNY 3 / 1K calls |
| Web Search (agent) | - | - | CNY 4 / 1K calls |
| Image Search Tool | - | - | CNY 48 / 1K calls |
| tongyi-intent-detect-v3 | CNY 0.0004 / 1K tokens | CNY 0.001 / 1K tokens | - |
| opennlu-v1 | CNY 0.00465 / 1K tokens | CNY 0.00465 / 1K tokens | - |

### Free Tier
- **Text Embeddings**: 1 million tokens free (valid for 90 days after activating Model Studio).
- **Multimodal Embeddings & Rerank**: 1 million tokens free per month.
- **Intent Recognition**: 1 million tokens free (valid for 90 days).
- **Knowledge Retrieval**: The knowledge base feature is currently free of charge.

### Usage Limits
- **Batch Size**: Max 10 texts per request for synchronous embeddings; max 100,000 lines for async batch tasks.
- **Token Limits**: Max 8,192 tokens per line for v3/v4 embeddings; max 4,000 tokens per document for reranking.
- **Concurrency**: Max 10 concurrent requests per model for most APIs.

### Billing Notes
- Web search and image search results are appended to the prompt, increasing input token counts.
- Async batch tasks are billed upon completion. Task data and output URLs are retained for only 24 hours.

## FAQ

**Q: How do I enable web search in Qwen models?**
A: Set `enable_search: true` in the request body. If using the OpenAI SDK, pass it via the `extra_body` parameter. You can further control search behavior using `search_options` like `search_strategy` and `freshness`.

**Q: Can I use my own enterprise documents for RAG?**
A: Yes. Use the Responses API with the `file_search` tool. Provide your knowledge base ID in the `vector_store_ids` array. The model will automatically retrieve relevant chunks and generate an answer.

**Q: What is the difference between text-embedding-v3 and v4?**
A: `text-embedding-v4` supports higher token limits (8,192 tokens per string), more flexible dimension options (down to 64), and improved semantic representation compared to v3.

**Q: How do I process large batches of text for embeddings?**
A: Use the Async Task pattern with `text-embedding-async-v1` or `text-embedding-async-v2`. Upload your text file to a publicly accessible URL and submit the URL via the batch processing API. Poll the task status to retrieve the output file.

**Q: Why did my web search request not return any search results?**
A: Web search is limited to 15 RPS per account. If you exceed this limit, the system silently skips the search and falls back to the model's internal knowledge without returning an error. Ensure your request rate is within limits.

## Source Documents

- `Web search_5478079.xdita`
- `Knowledge retrieval_6411138.xdita`
- `Knowledge Retrieval and the Qwen3.5 Release_6411138.xdita`
- `Image-to-image search_6411133.xdita`
- `Text-to-image search_6411125.xdita`
- `Search by image_6411133.xdita`
- `Multimodal embeddings API_4759724.xdita`
- `Multimodal embeddings_4759724.xdita`
- `Rerank_4923321.xdita`
- `Intention recognition_5386285.xdita`
- `OpenNLU_4759729.xdita`
- `Text and multimodal embedding_5111541.xdita`
- `Rerank_6488526.xdita`
- `Batch processing API reference_4759721.xdita`
- `Get started_4759717.xdita`
- `Synchronous API_4759719.xdita`
- `OpenAI compatible - Embedding_5180115.xdita`
- `GUI interaction_6231423.xdita`