# bailian-asr

Part of **BAILIAN**

# Bailian Speech and Audio Processing Console Guide

## Operations Overview

| Operation | Console Navigation | Prerequisites | Description |
|-----------|--------------------|---------------|-------------|
| Obtain API Key | Console > Model Management > API Key | None | Obtain and configure an API key for speech services. |
| Integrate Android SDK | Console > Bailian > SDK Download and Integration | Android Studio installed | Download and integrate the speech recognition SDK for Android. |
| Integrate iOS SDK | Console > Bailian > Model Experience Center > Speech | Xcode installed | Download and integrate the speech recognition SDK for iOS. |
| Configure Real-time ASR | Console > Speech Recognition > Real-time ASR | API key configured | Set up real-time speech recognition parameters and models. |
| Configure High-Concurrency TTS | Console > Bailian > Speech Synthesis > High-concurrency scenario optimization | DashScope SDK installed | Optimize connection and object pools for high-concurrency speech synthesis. |

## Operations Steps

### Obtain API Key

**Navigation**: Console > Model Management > API Key

1. Navigate to the API key page.
   - Element: **API key** (link) — top navigation panel

2. Obtain and configure the API key.
   - Element: **Get and configure an API key** (link) — top of page
   - Notes: We recommend using a short-lived temporary API key for mobile applications to reduce security risks.

### Integrate Android SDK

**Navigation**: Console > Bailian > SDK Download and Integration

**Prerequisites**:
- Android Studio installed
- Project set up with proper dependencies

1. Download the latest SDK package.
   - Element: **Download the latest SDK package** (link) — Getting started section

2. Unzip the package and add the AAR file to project dependencies.
   - Element: **app/libs folder** (text_input) — project structure
   - Notes: For C++ integration, use android_libs and android_include folders for dynamic libraries and header files.

3. Open the project in Android Studio.
   - Element: **DashGummySpeechRecognizerActivity.java** (link) — sample code location
   - Notes: Replace the API key in this file to test the feature.

### Integrate iOS SDK

**Navigation**: Console > Bailian > Model Experience Center > Speech

**Prerequisites**:
- Xcode installed with proper project setup

1. Obtain an API key.
   - Element: **Get and configure an API key** (link) — top navigation bar

2. Download the SDK package.
   - Element: **Download the latest SDK package** (link) — main content area

3. Add nuisdk.framework to your project.
   - Element: **Link Binary With Libraries** (menu) — Build Phases section in Xcode

4. Set nuisdk.framework to Embed & Sign.
   - Element: **Embed & Sign** (dropdown) — General → Frameworks, Libraries, and Embedded Content

5. Open the sample project in Xcode.
   - Element: **DashGummySpeechTranscriberViewController** (text_input) — sample code file path
   - Notes: Replace the API key in the sample code to test the feature.

### Configure Real-time ASR

**Navigation**: Console > Speech Recognition > Real-time ASR

**Prerequisites**:
- API key configured as environment variable DASHSCOPE_API_KEY

1. Navigate to the Speech Recognition section in the console.
   - Element: **Speech Recognition** (menu) — left navigation panel

2. Select Real-time ASR from the available options.
   - Element: **Real-time ASR** (link) — main content area

3. Click on Quick Start to view code examples.
   - Element: **Quick Start** (button) — top-right corner

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Model | dropdown | Yes | qwen3-asr-flash-realtime, fun-asr-realtime, paraformer-realtime-8k-v2, fun-asr-flash-8k-realtime | Select the speech recognition model to use for real-time transcription |
| Language | dropdown | No | en, zh, ja, ko | Set the language of the audio input for better accuracy |
| Sample Rate | dropdown | Yes | 8000, 16000, 24000, 48000 | Specify the sample rate of the input audio file |
| Input Audio Format | dropdown | Yes | pcm, wav, mp3, opus | Choose the format of the audio input stream |
| Enable VAD | checkbox | No | — | Enable server-side voice activity detection to detect speech start/end |
| Turn Detection Mode | radio | No | Server VAD (default), Manual mode | Choose whether the server or client controls when a turn ends |

### Configure High-Concurrency TTS

**Navigation**: Console > Bailian > Speech Synthesis > High-concurrency scenario optimization

**Prerequisites**:
- DashScope Java SDK version 2.16.9 or later
- Apache Commons Pool2 library

1. Configure environment variables for connection and object pool sizes.
   - Element: **Environment Variables** (text_input) — Project configuration settings
   - Notes: Set DASHSCOPE_CONNECTION_POOL_SIZE, DASHSCOPE_MAXIMUM_ASYNC_REQUESTS, DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST, and SAMBERT_OBJECTPOOL_SIZE based on server specifications.

2. Add dependencies to project build file.
   - Element: **pom.xml or build.gradle** (text_input) — Project root directory
   - Notes: Include dashscope-sdk-java and commons-pool2 dependencies with versions >= 2.16.9 and latest respectively.

3. Run Maven or Gradle command to update dependencies.
   - Element: **mvn clean install or ./gradlew build --refresh-dependencies** (button) — Terminal/command line
   - Notes: Use appropriate command based on build tool; Windows users should use gradlew.bat.

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| DASHSCOPE_CONNECTION_POOL_SIZE | number_input | No | — | Size of the connection pool for reusing WebSocket connections. Recommended to be at least twice the peak concurrency. |
| DASHSCOPE_MAXIMUM_ASYNC_REQUESTS | number_input | No | — | Maximum number of asynchronous requests. Should match connection pool size. |
| DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST | number_input | No | — | Maximum number of asynchronous requests per host. Should match connection pool size. |
| SAMBERT_OBJECTPOOL_SIZE | number_input | No | — | Size of the object pool for SpeechSynthesizer objects. Should be 1.5 to 2 times peak concurrency and <= connection pool size. |

## FAQ

Q: Where do I find the API key for speech recognition?
A: You can obtain the API key from the top navigation panel by clicking the "API key" or "Get and configure an API key" link in the console.

Q: What audio formats are supported for real-time ASR?
A: The supported input audio formats include pcm, wav, mp3, and opus. You can select the format from the "Input Audio Format" dropdown.

Q: How do I integrate the iOS SDK into my Xcode project?
A: After downloading the SDK, add nuisdk.framework via "Link Binary With Libraries" in the Build Phases section, and ensure it is set to "Embed & Sign" in the General settings.

Q: What environment variables should I configure for high-concurrency speech synthesis?
A: You should configure DASHSCOPE_CONNECTION_POOL_SIZE, DASHSCOPE_MAXIMUM_ASYNC_REQUESTS, DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST, and SAMBERT_OBJECTPOOL_SIZE to optimize connection and object pools.

Q: Can I use a temporary API key for mobile SDK integration?
A: Yes, it is highly recommended to use a short-lived temporary API key (valid for 60 seconds) in mobile applications to reduce the risk of long-term key compromise.

## Pricing & Billing

### Billing Model
Services are billed either per request (based on token usage) or per minute (based on audio processing duration), depending on the specific model and feature used.

### Price Reference

| Model / Tier | Input Price | Output Price |
|--------------|-------------|--------------|
| gummy-realtime-v1 | 0.002 CNY/1k tokens | — |
| gummy-chat-v1 | 0.002 CNY/1k tokens | 0.002 CNY/1k tokens |
| paraformer-realtime-v2 / 8k-v2 / v1 / 8k-v1 | 0.002 CNY/1k tokens | 0.002 CNY/1k tokens |
| paraformer-v2 | 0.002 CNY/1k tokens | 0.002 CNY/1k tokens |
| qwen3-asr-flash-realtime | 0.002 CNY/min | 0.002 CNY/min |
| fun-asr-realtime | 0.0015 CNY/min | 0.0015 CNY/min |
| paraformer-realtime-8k-v2 | 0.0022 CNY/min | 0.0022 CNY/min |
| cosyvoice-v3-flash | 0.002 CNY/1k tokens | 0.004 CNY/1k tokens |
| sambert-zhichu-v1 | 0.002 CNY/1k tokens | 0.004 CNY/1k tokens |

### Free Tier
- 1 million tokens free per month for most token-based models.
- 100 minutes free per month for minute-based ASR models.

### Billing Notes
- Recognition and translation features are billed separately.
- Real-time tasks are billed based on duration and token count; minimum charge per request is 1 second or 1 minute depending on the model.
- High-concurrency usage may cause processing delays; ensure account concurrency limits are sufficient.
- In high-concurrency scenarios, connection reuse reduces latency, but billing is still based on actual request count and token usage.
- Contact support to increase QPS limits for high-concurrency requests.

## Source Documents

- Android SDK_6193274.xdita
- iOS SDK_6194285.xdita
- Android SDK_6224458.xdita
- iOS SDK_6187070.xdita
- Android SDK_6187600.xdita
- Performance optimization for real-time speech recognition in high-concurrency scenarios_5285795.xdita
- Real-time speech recognition_6197717.xdita
- Best practices_4759745.xdita
- High-concurrency scenarios_5261840.xdita
- Recording guide_5399404.xdita
- High-concurrency scenarios_5300011.xdita