# bailian-asr

Part of **BAILIAN**

# Bailian Speech and Audio Processing Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|---------|---------|----------|------------------|
| SenseVoice Service Discontinuation | Error code `ServiceSunset` | High | Migrate to Paraformer/Fun-ASR or Qwen-based models |
| Paraformer Migration Configuration | API request fails after changing model name | Medium | Update model ID and adjust audio format payloads |
| Qwen Multimodal Migration | Audio processing fails with Qwen endpoints | Medium | Restructure API request to use Qwen multimodal message format |

## Problem Details

### Problem 1: SenseVoice Service Discontinuation

**Symptoms**
- Error code: `ServiceSunset`
- Behavior: API calls to the SenseVoice speech file recognition endpoint fail immediately.
- Context: Occurs when attempting to use the legacy SenseVoice service after its official sunset date.

**Root Cause**
The SenseVoice speech file recognition service has been officially discontinued. The backend infrastructure for this specific model is no longer active, resulting in a hard failure for any incoming requests.

**Solution**
1. Identify your current SenseVoice API integration points in your application code.
2. Select an alternative service based on your use case:
   - Use **Paraformer/Fun-ASR** for dedicated, high-accuracy speech file recognition.
   - Use **Qwen-based speech recognition** if you require multimodal capabilities (e.g., audio understanding combined with text generation).
3. Update your API requests to target the new model endpoints and adjust authentication or payload structures as required by the new service.

**Verification**
- Send a test audio file to the new alternative API endpoint.
- Confirm that the response returns a successful transcription without the `ServiceSunset` error.

### Problem 2: Paraformer Migration Configuration

**Symptoms**
- Error message: Model not found or invalid parameter format.
- Behavior: After changing the model name from SenseVoice to Paraformer, the API rejects the request.
- Context: Occurs during the initial migration phase when swapping model identifiers without updating the request payload.

**Root Cause**
Paraformer and Fun-ASR models utilize different API parameters, audio format requirements, and endpoint structures compared to the legacy SenseVoice API. Simply changing the model name is insufficient if the payload structure does not match the new model's expectations.

**Solution**
1. Update the `model` parameter in your API request to the correct Paraformer/Fun-ASR model ID.
2. Review the audio file format requirements for Paraformer and ensure your input files (e.g., WAV, MP3) and sample rates are supported.
3. Adjust the request payload to match the Paraformer API specification, ensuring file URLs or base64 encoded audio data are passed in the correct fields.

**Verification**
- Execute a test request with a supported audio file format.
- Verify that the API returns a `200 OK` status and a valid JSON response containing the transcribed text.

### Problem 3: Qwen Multimodal Migration

**Symptoms**
- Error message: Invalid message format or unsupported input type.
- Behavior: Audio processing fails or returns unexpected text when migrating to Qwen-based speech recognition.
- Context: Occurs when developers attempt to pass raw audio payloads to Qwen models using the legacy single-purpose ASR request format.

**Root Cause**
Qwen-based models are multimodal large language models. They do not use the traditional ASR-specific API structure. Instead, they require audio inputs to be formatted as part of a multimodal message array, similar to how images are passed to vision models.

**Solution**
1. Restructure your API request to use the standard Qwen multimodal chat completion format.
2. Format the audio input within the `messages` array. Pass the audio file URL or base64 data inside the `content` array of the user message.

Example request structure:
```json
{
  "model": "qwen-audio-turbo",
  "messages": [
    {
      "role": "user",
      "content": [
        {"audio": "https://example.com/audiofile.wav"},
        {"text": "Transcribe this audio file."}
      ]
    }
  ]
}
```
3. Ensure your prompt explicitly instructs the model to perform transcription or the specific audio analysis task you require.

**Verification**
- Send the restructured multimodal request to the Qwen endpoint.
- Confirm the model successfully processes the audio and returns the expected transcription in the `choices[0].message.content` field.

## FAQ

**Q: Why is my SenseVoice API call returning a ServiceSunset error?**
A: The SenseVoice speech file recognition service has been discontinued. You must migrate to an alternative service like Paraformer/Fun-ASR or Qwen-based speech recognition to resume operations.

**Q: What are the recommended alternatives for SenseVoice speech file recognition?**
A: The recommended alternatives are Paraformer/Fun-ASR for dedicated, high-accuracy speech recognition tasks, or Qwen-based speech recognition if you need multimodal audio processing and understanding capabilities.

**Q: Do I need to change my API endpoint when migrating to Paraformer or Qwen?**
A: Yes, migrating to alternative services requires updating your model parameters. For Qwen-based models, you must also restructure your API request to use the multimodal message format rather than the legacy ASR-specific payload structure.

**Q: How do I handle audio file formats during migration?**
A: Different models support different audio formats and sample rates. When migrating to Paraformer or Qwen, verify the supported audio codecs (e.g., WAV, MP3, FLAC) and ensure your input files meet the specific requirements of the new model to avoid processing errors.

## Source Documents

- `Audio file recognition SenseVoice - To be Unpublished_5075189.xdita`