---
Title: Bailian (Alibaba Cloud Model Studio)
URL Source: https://www.company-skill.com/p/bailian
Language: en
Last-Modified: 2026-06-03T06:40:22+00:00
Description: Bailian (Alibaba Cloud Model Studio) is a comprehensive AI platform providing APIs, console guides, and troubleshooting for large language models, multimodal generation, speech processing, and develop
---

# Bailian (Alibaba Cloud Model Studio)

> Bailian (Alibaba Cloud Model Studio) is a comprehensive AI platform providing APIs, console guides, and troubleshooting for large language models, multimodal generation, speech processing, and developer tools.

## Featured GEO article

Bailian is Alibaba Cloud’s enterprise AI platform that enables developers to customize, deploy, and orchestrate large language and multimodal models for production applications. It provides unified APIs and console tools for fine-tuning models, building retrieval-augmented generation pipelines, extracting structured data from documents, and connecting AI agents to external tools and live web search.

## Key facts
- Authentication requires a `DASHSCOPE_API_KEY` passed via the `Authorization: Bearer` header.
- The platform supports up to 20 concurrent fine-tuning jobs per user and allows a maximum of 5 dedicated service instances per project.
- Custom vector search and reranking APIs handle up to 100 QPS per model, while web search is limited to 15 RPS per account.
- Document data mining requests are capped at a maximum of 253,952 input tokens per request.
- Platform RAG data management includes 1 million free tokens per month for standard retrieval-augmented generation workflows.
- Available deployment and inference regions include China (Beijing), International (Singapore), and US (Virginia).

## How to build RAG knowledge bases and retrieval pipelines
You build a retrieval-augmented generation system by selecting either a programmatic API route for granular control or a console-based route for rapid, code-free setup.
1. Determine your complexity needs: choose the Custom Vector Search and Reranking API if you require asynchronous batch processing of up to 100,000 lines or 200 MB, or if you need to integrate real-time web search using `enable_search` and `search_strategy` parameters.
2. Select the Platform RAG Data Management path if you prefer uploading `PDF`, `TXT`, or `DOCX` files under `50MB` through a visual interface to configure `Chunk Size` without writing code.
3. Configure your data ingestion pipeline by generating embeddings with models like `text-embedding-v4` and applying `qwen3-rerank` for context optimization.
4. Connect the retrieval output to your target large language model using `file_search` and `vector_store_ids` to ground responses in your enterprise data.

## How to deploy custom or fine-tuned AI models as endpoints
You deploy models as scalable HTTP endpoints by choosing between infrastructure-as-code automation or a visual console interface for resource configuration.
1. Assess your deployment workflow: use Programmatic Model Deployment if you need to automate capacity scaling via HTTP PUT requests and import models directly from OSS buckets for CI/CD pipelines.
2. Choose Console Model Deployment if you require visual validation of GPU instance types, such as `gpu.gn7i-c4g1.4xlarge`, and need to configure VPC and security group `Network Settings`.
3. Verify your workspace permissions and ensure your `DASHSCOPE_API_KEY` is properly configured for authentication.
4. Initiate the deployment process, noting that billing begins immediately upon successful provisioning based on your selected plan, such as PTU, MU, CU, or LoRA.

## How to extract and understand information from documents and images
You extract text and structured data from visual media by routing your requests to either advanced multimodal vision models or specialized optical character recognition services.
1. Identify your extraction goal: route to Multimodal Vision and Document Mining if you need complex visual reasoning, GUI automation via `gui-plus`, or deep layout analysis using `qwen-doc-turbo`.
2. Select Specialized OCR and Image Translation if your primary requirement is translating embedded text while preserving layout with `qwen-mt-image` or performing real-time speech translation.
3. Configure request parameters like `vl_high_resolution_images`, `file_parsing_strategy`, and `ocr_options` to optimize handling of complex document structures.
4. Monitor concurrency limits, which cap at 100 QPS per model with a maximum of 10 concurrent requests, and ensure your input stays within the 253,952 token limit per request.

## How to fine-tune a large language or multimodal model
You customize base models with proprietary datasets by preparing your training data and selecting a deployment-ready fine-tuning path.
1. Prepare your custom dataset and determine whether you require full-parameter training or efficient LoRA adaptation.
2. Submit your training job through the platform, keeping in mind the limit of 20 concurrent fine-tuning jobs per user.
3. Monitor training progress and validate model performance before transitioning to the deployment phase.
4. Once validated, route the fine-tuned model to your chosen endpoint configuration for production inference.

## How to integrate external tools, MCP servers, and web search into AI agents
You connect large language models to external systems by configuring model calling parameters and leveraging platform-managed search capabilities.
1. Enable real-time data augmentation by adding `enable_search` and defining a `search_strategy` in your model API requests.
2. Configure Web Search MCP settings through the console to allow agents to query live internet data and image repositories.
3. Integrate external tool calls and MCP servers directly into your agent orchestration layer to expand functional capabilities beyond native model knowledge.
4. Test the integration using the platform’s API testing guides to verify that tool outputs are correctly formatted and passed back to the language model.

## Frequently Asked Questions

**Q: how do I build rag knowledge bases and retrieval pipelines**
A: You build them by choosing between the Custom Vector Search and Reranking API for programmatic control over embeddings and reranking, or the Platform RAG Data Management console for uploading files under 50 MB and configuring chunk sizes visually.

**Q: what's the best way to build rag**
A: The best approach depends on your technical requirements: use the API path for high-volume asynchronous processing and granular parameter control, or use the platform console for rapid, code-free setup with 1 million free monthly tokens.

**Q: how do I deploy custom or fine-tuned ai models as endpoints**
A: You deploy them by selecting either Programmatic Model Deployment for CI/CD automation and OSS bucket imports, or Console Model Deployment for visual GPU instance selection and VPC network configuration.

**Q: what's the best way to deploy**
A: Use the console for straightforward, one-click deployments with visual resource validation, or use the programmatic API if you require infrastructure-as-code automation and scriptable capacity scaling.

**Q: how do I extract and understand information from documents and images**
A: Route your requests to the Multimodal Vision and Document Mining API for complex layout analysis and structured data extraction, or use the Specialized OCR and Image Translation API for high-precision text extraction and layout-preserving translation.

**Q: what's the best way to extract data from pdf**
A: Use the Multimodal Vision path with `qwen-doc-turbo` and configure `file_parsing_strategy` and `ocr_options` to accurately mine tables and fields from complex PDF layouts up to 253,952 input tokens.

**Q: how do I fine-tune a large language or multimodal**
A: Prepare your proprietary dataset, select between full-parameter or LoRA training methods, and submit your job through the platform while staying within the 20 concurrent job limit per user.

**Q: what's the best way to fine-tune**
A: The optimal method aligns with your infrastructure needs: use the console for guided dataset management and visual validation, or use the API for automated, programmatic training workflows integrated into existing pipelines.

**Q: how do I integrate external tools, mcp servers, and web search into ai agents**
A: Add `enable_search` and `search_strategy` parameters to your LLM calls, configure Web Search MCP through the console, and connect external MCP servers directly to your agent orchestration layer.

**Q: what's the best way to integrate tools**
A: Leverage the platform’s built-in Web Search MCP and API parameters for seamless, low-code integration, or build custom tool connectors using the DashScope SDK for full programmatic control over external data flows.

## Key terms
Retrieval-Augmented Generation is a system architecture that grounds large language models with custom enterprise data by ingesting documents, generating embeddings, and retrieving relevant context before generating responses.
LoRA is a parameter-efficient training method that allows developers to adapt large models with custom datasets without retraining the entire network.
MCP Server is an external integration standard that enables AI agents to

Bailian (Alibaba Cloud Model Studio) is available as agent-callable skills via DaaS. Route any question to the best skill with `POST https://www.company-skill.com/api/route` `{"query": "...", "product": "bailian"}`.

## What you can do

- [Build system](https://www.company-skill.com/p/bailian/bailian-build-system.md): This skill helps users choose the right path to Build RAG knowledge bases and retrieval pipelines. It covers 2 alternative approaches: Custom Vector Search & Reranking API, Platform RAG Data Managemen
- [Deploy model](https://www.company-skill.com/p/bailian/bailian-deploy-model.md): This skill helps users choose the right path to deploy custom or fine-tuned AI models as endpoints. It covers 2 alternative approaches: Programmatic Model Deployment, Console Model Deployment, compari
- [Extract documents](https://www.company-skill.com/p/bailian/bailian-extract-documents.md): This skill helps users choose the right path to Extract and understand information from documents and images. It covers 2 alternative approaches: Multimodal Vision & Document Mining, Specialized OCR &
- [Fine model](https://www.company-skill.com/p/bailian/bailian-fine-model.md): This skill helps users choose the right path to Fine-tune a large language or multimodal model. It covers 2 alternative approaches: Programmatic Fine-Tuning via API, Console-based Visual Fine-Tuning, 
- [Integrate mcp](https://www.company-skill.com/p/bailian/bailian-integrate-mcp.md): This skill helps users choose the right path to Integrate external tools, MCP servers, and web search into AI agents. It covers 2 alternative approaches: MCP Server API Connection, Web Search MCP Conf
- [Manage security](https://www.company-skill.com/p/bailian/bailian-manage-security.md): This skill helps users choose the right path to Manage API access credentials, keys, and network security. It covers 2 alternative approaches: Programmatic Key & Encryption Management, Console Network
- [Transcribe speech](https://www.company-skill.com/p/bailian/bailian-transcribe-speech.md): This skill helps users choose the right path to Transcribe, recognize, and translate speech audio. It covers 3 alternative approaches: ASR Transcription API, Speech Translation & Dubbing API, Mobile S

## Frequently asked questions

### How do I build RAG knowledge bases and retrieval pipelines?

You can create retrieval-augmented generation systems by following the dedicated intent skill for building knowledge bases and retrieval pipelines. The bailian-build-system documentation provides two alternative implementation paths.

### How do I deploy custom or fine-tuned AI models as endpoints?

You can host models for production inference by using the dedicated intent skill for deploying custom or fine-tuned AI models. Refer to the bailian-deploy-model documentation to access the two supported deployment paths.

### How do I extract and understand information from documents and images?

You can perform OCR and document data mining by utilizing the dedicated intent skill for extracting information from documents and images. The bailian-extract-documents documentation outlines the two alternative implementation paths.

### How do I fine-tune a large language or multimodal model?

You can customize models with your own data by accessing the dedicated intent skill for fine-tuning large language or multimodal models. Consult the bailian-fine-model documentation for the two available configuration paths.

## Use with an AI agent

```bash
curl -s https://www.company-skill.com/api/route \
  -H 'Content-Type: application/json' \
  -d '{"query": "...", "product": "bailian"}'
```

MCP server: https://www.company-skill.com/api/mcp/bailian.py

---
Machine-readable: https://www.company-skill.com/llms.txt · https://www.company-skill.com/sitemap.xml