# bailian-llm

Part of **BAILIAN**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Build RAG knowledge bases and retrieval pipelines](../../intent/bailian-build-system/SKILL.md). If you're unsure which path to take, check the routing skill first.

# Alibaba Cloud Model Studio Text and Code Generation Console Guide

## Operations Overview

| Operation | Console Entry | Prerequisites | Description |
|-----------|---------------|---------------|-------------|
| Get Started with Text Generation | Console > Model Service > Text Generation | Account registered, API key generated | Initialize and test Qwen models for basic text generation. |
| Optimize Prompts | Console > Prompt Management > Prompts | Access to the console | Use automatic optimization tools to improve text prompts. |
| Upload Data for RAG | Console > RAG > Data Management | Valid account, supported data files | Upload custom datasets to build a knowledge base for Retrieval-Augmented Generation. |

## Steps

### Get Started with Text Generation

**Navigation**: Console > Model Service > Text Generation

**Prerequisites**:
- Account registered on Alibaba Cloud
- API key generated in the console
- Basic understanding of API concepts

1. Navigate to the Qwen model service page.
   - Element: **Model Service** (menu) — left navigation panel

2. Select the Text Generation category.
   - Element: **Text Generation** (tab) — main content area

3. Click on the Get Started button to open the tutorial panel.
   - Element: **Get Started** (button) — top-right corner
   - Notes: This opens a tutorial panel with sample prompts and model selection options.

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Model Selection | dropdown | Yes | qwen-plus, qwen-max, qwen-turbo, qwen-longcontext | Choose the Qwen model variant to use for text generation. |
| Prompt Input | text_input | Yes | — | Enter the input text or question you want the model to respond to. |
| Max Tokens | number_input | No | — | Set the maximum number of tokens in the output response. Default is 2048. |

### Optimize Prompts

**Navigation**: Console > Prompt Management > Prompts

**Prerequisites**:
- Access to the console
- Basic understanding of LLM capabilities

1. Go to the Prompt Management page.
   - Element: **Prompt** (link) — left navigation panel

2. Click the Automatic Optimization button to improve your input prompts.
   - Element: **Automatic Optimization** (button) — main content area
   - Notes: This tool uses an LLM to expand and improve your input prompts. Note: This consumes tokens and is billed according to model inference pricing.

### Upload Data for RAG

**Navigation**: Console > RAG > Data Management > Upload Data

**Prerequisites**:
- A valid account with access to the RAG feature
- Data files in supported formats (PDF, TXT, DOCX)
- Internet connection for uploading and processing

1. Navigate to the RAG section in the console.
   - Element: **RAG** (menu) — left navigation panel

2. Click on Data Management to manage your knowledge base.
   - Element: **Data Management** (tab) — top navigation bar

3. Click the Upload Data button to start uploading files.
   - Element: **Upload Data** (button) — main content area
   - Notes: Supported file types: PDF, TXT, DOCX. Maximum file size: 50MB.

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Data Source Name | text_input | Yes | — | Enter a unique name for your data source to identify it later. |
| File Selection | text_input | Yes | — | Browse and select one or more files to upload for RAG processing. |
| Chunk Size | number_input | No | — | Set the size of text chunks during embedding. Default is 1024 characters. |
| Overlap | number_input | No | — | Define the overlap between adjacent text chunks to preserve context. Default is 200. |

## FAQ

Q: Where do I find the API key required for model integration?
A: You can generate and manage your API key in the console under the API Key management section after activating Alibaba Cloud Model Studio.

Q: What happens if I leave the Max Tokens field empty during text generation?
A: If left empty, the model will use its default maximum token limit (e.g., 2048 or up to the model's maximum context window) for the output response.

Q: Can I modify the chunk size and overlap after uploading data for RAG?
A: Chunk size and overlap are applied during the initial data ingestion and embedding process. To change these parameters, you typically need to re-upload or re-process the data source.

Q: What file formats are supported for RAG data uploads?
A: The console supports uploading PDF, TXT, and DOCX files. The maximum file size per upload is 50MB.

Q: Does using the Automatic Optimization tool for prompts incur extra costs?
A: Yes, the Automatic Optimization tool uses an underlying LLM to rewrite your prompts, which consumes tokens and is billed according to standard model inference pricing.

## Pricing & Billing

### Billing Model
Services are primarily billed on a pay-as-you-go basis per token used for input and output. Visual understanding and specific coding plans may operate on a per-request or fixed monthly quota basis.

### Price Reference

| Model / Tier | Input Price | Output Price |
|--------------|-------------|--------------|
| qwen-turbo | 0.0005 CNY / 1K tokens | 0.001 CNY / 1K tokens |
| qwen-plus | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens |
| qwen-max | 0.006 CNY / 1K tokens | 0.012 CNY / 1K tokens |
| qwen3.6-plus | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| qwen3.5-plus | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| kimi-k2.5 | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| glm-5 | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| MiniMax-M2.5 | 0.002 CNY / 1K tokens | 0.002 CNY / 1K tokens |
| standard (RAG) | 0.002 CNY / 1K tokens | 0.004 CNY / 1K tokens |

### Free Tier
- 1 million tokens free per month for standard text generation and RAG processing.

### Billing Notes
- Output tokens are billed only when the response is fully generated; partial responses are not charged.
- RAG data processing is billed based on token usage during retrieval and generation. Long-running queries may incur higher costs.
- Running an image understanding Skill consumes your Coding Plan quota. No other charges apply besides Coding Plan quota usage.
- Async tasks are billed on completion.

## Source Documents

- LangChain_4546131.xdita
- Get started_6038920.xdita
- Text-to-text prompt guide_4830742.xdita
- Add visual understanding capabilities_6415357.xdita
- Using data for RAG_4926459.xdita