# opensearch-text

Part of **OPENSEARCH**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for the following routing skills. If you're unsure which path to take, check the corresponding routing skill:

> - [Optimize search relevance and ranking](../../intent/opensearch-optimize-relevance/SKILL.md)
> - [Deploy embedding model for inference](../../intent/opensearch-deploy-model/SKILL.md)
> - [Build a Retrieval-Augmented Generation (RAG) solution](../../intent/opensearch-build-solution/SKILL.md)

# OpenSearch Embedding

## Capabilities Overview

| Sub-capability | Models | Calling Mode | Description |
|----------------|--------|--------------|-------------|
| Create Text Embedding | ops-text-embedding-001, ops-text-embedding-zh-001, ops-text-embedding-en-001, +3 more | Synchronous | Generate dense vector embeddings from text input. |
| Reduce Vector Dimensionality | ops-embedding-dim-reduction-001 | Synchronous | Compress high-dimensional vectors into lower dimensions while preserving semantic meaning. |
| Generate Multimodal Embedding | ops-m2-encoder, ops-m2-encoder-large, ops-gme-qwen2-vl-2b-instruct | Synchronous | Create embeddings from combined text and image inputs. |
| Deploy Service | qwen-plus, wan2.7-t2v, wanx2.1-t2v-plus | Synchronous | Deploys a vectorization or generation model as a hosted service for inference. |

## API Calling Patterns

### Authentication
The primary authentication method uses Bearer tokens in the `Authorization` header along with a service-specific `Token` header for custom deployments.

- **Header format**:  
  `Authorization: Bearer <your_workspace_api_key>`  
  `Token: <your_service_deployment_token>` (required only for custom deployment services)
- **Environment variable**: `OPENSEARCH_API_KEY` for the workspace API key
- For standard embedding services (e.g., `ops-text-embedding-001`), only the `Authorization` header is needed. For custom-deployed models, both headers are required. Obtain the service deployment token from the OpenSearch console under **Service Deployment > API Message**.

### Service Endpoint
APIs use region-specific endpoints with the following pattern:

`http://{instance_id}-{region}.opensearch.aliyuncs.com/v3/openapi/workspaces/{workspace_name}/{service_type}/{service_id}`

Common regions include:
- `cn-hangzhou`
- `cn-shanghai`
- `cn-beijing`

For OpenAI-compatible mode, use:
`https://{instance_id}-{region}.opensearch.aliyuncs.com/compatible-mode/v1/embeddings`

### Synchronous Pattern
All OpenSearch embedding APIs use synchronous calling:
1. Send a POST request to the service endpoint with JSON payload
2. Include required headers (`Authorization`, and `Token` if applicable)
3. Receive a complete JSON response immediately
4. Parse the `embeddings` or `result` field for vector output

No polling or streaming is used—responses are returned in a single round trip.

## Parameter Reference

### Create Text Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array or String | Yes | — | Max 32 strings per request; no empty strings | Text to embed. Can be a single string or array of strings. |
| input_type | String | No | document | One of: query, document | How the input will be used in retrieval context. |

### Sparse Text Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array or String | Yes | — | Up to 32 entries; max 8,192 tokens per entry | Input texts to convert to sparse vectors. |
| input_type | String | No | document | One of: query, document | Role of input text in retrieval. |
| return_token | Boolean | No | false | — | Whether to include token strings in each embedding result. |

### Generate Multimodal Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | List<ContentObject> | Yes | — | Max 32 entries per request | List of objects containing `text` or `image` fields. |
| text | String | No | — | — | Text content to embed. |
| image | String | No | — | URL or Base64-encoded data (max 8 MB total) | Image to embed, as URL or `data:image/{format};base64,...` |

### Reduce Vector Dimensionality

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | List<List<Float>> | Yes | — | Max 8 MB request size | Collection of input vectors to compress. |
| parameters.output_dimension | Integer | No | 512 | Range: 1–4096 | Target output dimension after reduction. |
| parameters.model_name | String | No | — | — | Name of user-trained fine-tuning model (if used). |

### Deploy Service (Custom Model Calls)

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| type | String | Yes | — | One of: text, image | Type of input data for multimodal services. |
| data | Array<String> | Yes | — | Max 16 elements; Base64 image format | Image inputs in `data:image/...` format. |
| input | Array<String> | Yes | — | Max 16 strings; non-empty | Text inputs for embedding or reranking. |
| input_type | String | No | document | One of: query, document | Input role for embedding services. |
| dimension | Integer | No | — | ≤ foundation model dimension | Output vector dimension (for models with reduction enabled). |
| query | Array<String> | Yes | — | — | Query text for reranking. |
| docs | Array<String> | Yes | — | Max 16 documents | Documents to rerank. |

## Code Examples

### Text Embedding - Python - cn-hangzhou

```python
import os
import requests

host = "http://****-hangzhou.opensearch.aliyuncs.com"
workspace = "default"
service_id = "ops-text-embedding-001"
api_key = os.environ.get("OPENSEARCH_API_KEY")

url = f"{host}/v3/openapi/workspaces/{workspace}/text-embedding/{service_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}",
}
payload = {
    "input": [
        "Science and technology are the primary productive forces",
        "opensearch product documentation",
    ],
    "input_type": "query",
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()
for item in data["result"]["embeddings"]:
    print(item["index"], item["embedding"][:5])
```

### Sparse Text Embedding - Bash - All Regions

```bash
curl -XPOST -H "Content-Type: application/json" \
  "http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "input": [
      "Science and technology are the primary productive forces",
      "OpenSearch product documentation"
    ],
    "input_type": "query",
    "return_token": false
  }'
```

### Multimodal Embedding - Bash - All Regions

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  "http://<your-endpoint>/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
  -d '{
    "input": [
      {
        "image": "http://example.com/photo.jpg"
      }
    ]
  }'
```

### OpenAI-Compatible Embedding - Bash - cn-shanghai

```bash
curl http://xxxx-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "ops-text-embedding-001",
    "input": "Search development platform"
  }'
```

### Custom Deployment Service - Bash - All Regions

```bash
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Token: YOUR_SERVICE_DEPLOYMENT_TOKEN" \
"http://xxxxxx.opensearch.aliyuncs.com/v3/openapi/xxxxxx" \
-d '{
  "input":[
    "Science and technology are primary productive forces",
    "OpenSearch product documentation"
  ],
  "input_type" : "document",
  "dimension": 567
}'
```

### Vector Dimensionality Reduction - Bash - All Regions

```bash
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/embedding-tuning/ops-embedding-dim-reduction-001/' \
--header 'Authorization: Bearer Your-API-KEY' \
--header 'Content-Type: application/json' \
--data '{
  "input": [
    [0.111,0.222,0.333],
    [0.121,0.221,0.331]
  ],
  "parameters":{
    "output_dimension": 512,
    "model_name" : "xxxx"
  }
}'
```

### Deployed Service SDK Call - Python - China

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetPredictionRequest
from alibabacloud_searchplat20240529.models import GetPredictionHeaders
from alibabacloud_tea_util import models as util_models

config = Config(
    bearer_token="Your API key",
    endpoint="****.opensearch.aliyuncs.com",
    protocol="http"
)
client = Client(config=config)

request = GetPredictionRequest().from_map({"input_type": "query", "input": ["search", "test"]})
headers = GetPredictionHeaders(token="your_service_token")
runtime = util_models.RuntimeOptions()

response = client.get_prediction_with_options("your_deployment_id", request, headers, runtime)
print(response)
```

## Response Format

```json
{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "token_count": 3072
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.02868066355586052,
                    0.022033605724573135,
                    -0.0417383536696434,
                    ...
                ]
            }
        ]
    }
}
```

**Key Fields**:
- `result.embeddings[].index` — Position of the input that produced this embedding
- `result.embeddings[].embedding` — Dense vector representation of the input
- `usage.token_count` — Number of tokens processed in the request
- `request_id` — Unique identifier for debugging
- `latency` — Processing time in milliseconds

For sparse embeddings, the structure uses `result.sparse_embeddings[].embedding[]` with `tokenId` and `weight` fields.

For OpenAI-compatible responses:
- `data[].embedding` — Vector array
- `usage.prompt_tokens` — Token count for billing

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|-------------------|
| 400 | Bad Request: The request body is malformed or contains invalid parameters. | Validate JSON structure and parameter constraints (e.g., input length, enum values). |
| 401 | Unauthorized: Invalid or missing API key or service deployment token. | Verify `OPENSEARCH_API_KEY` and, if applicable, the service deployment token from the console. |
| 403 | Forbidden: Access denied due to insufficient permissions or invalid workspace API key. | Confirm your account has access to the workspace and service. |
| 429 | Too Many Requests: Rate limit exceeded. Wait before retrying. | Implement exponential backoff; check QPS limits per service. |
| 500 | Internal Server Error: An unexpected error occurred on the server side. | Retry with jittered backoff; contact support if persistent. |
| InvalidParameter | The request parameters are invalid. | Ensure `input_type` is "query" or "document"; check image format for multimodal calls. |
| BadRequest.TaskNotExist | The specified task does not exist. | Verify `deploymentId` or `service_id` is correct. |

### Rate Limits & Retry
- Standard text embedding: 50–100 QPS per API key or workspace
- Multimodal embedding: 10 QPS per Alibaba Cloud account
- Custom deployments: 100 QPS per deployment
- Request body size limit: 8 MB

Implement retry with exponential backoff (e.g., 1s, 2s, 4s delays) for 429 and 500 errors. Respect the `Retry-After` header if provided.

## Environment Requirements

- **Python SDK**: `pip install alibabacloud_searchplat20240529>=1.0.0`
- **OpenAI SDK (for compatible mode)**: `pip install openai>=1.0.0`
- **Environment variable**: `export OPENSEARCH_API_KEY=your_workspace_api_key`
- **Python version**: 3.6 or higher (for SDK usage)

## FAQ

Q: How do I obtain the service deployment token for custom models?
A: Navigate to the OpenSearch console > Service Deployment > API Message tab. Copy the token displayed in the table for your deployed service.

Q: What's the difference between dense and sparse embeddings?
A: Dense embeddings are fixed-length real-valued vectors for semantic similarity. Sparse embeddings are variable-length token-weight pairs ideal for keyword matching and hybrid retrieval.

Q: Can I use the OpenAI SDK with OpenSearch embedding APIs?
A: Yes, for models like `ops-text-embedding-001`, set `openai.api_base` to your OpenSearch compatible-mode endpoint and use `openai.Embedding.create()`.

Q: Why am I getting a 401 error even with a valid API key?
A: Ensure you're using the workspace API key (not account AccessKey) and that it hasn't expired. For custom deployments, also include the `Token` header.

Q: What is the maximum input length for text embedding?
A: Most models support up to 8,192 tokens per input string. Check the specific model's documentation for exact limits.

## Pricing & Billing

### Billing Model
- Text and multimodal embedding: billed per token (input tokens only)
- Custom deployment services: billed per request
- Vector dimensionality reduction: billed per request

### Price Reference

| Model/Service | Input Price | Output Price |
|---------------|-------------|--------------|
| ops-text-embedding-001 | 0.002 /tokens | 0.002 /tokens |
| ops-text-embedding-002 | 0.002 /tokens | 0.002 /tokens |
| ops-text-embedding-zh-001 | 0.002 /tokens | 0.002 /tokens |
| ops-text-embedding-en-001 | 0.002 /tokens | 0.002 /tokens |
| ops-gte-sentence-embedding-multilingual-base | 0.002 /tokens | 0.002 /tokens |
| ops-qwen3-embedding-0.6b | 0.002 /tokens | 0.002 /tokens |
| ops-m2-encoder | 0.002 /tokens | 0.002 /tokens |
| ops-m2-encoder-large | 0.003 /tokens | 0.003 /tokens |
| ops-gme-qwen2-vl-2b-instruct | 0.004 /tokens | 0.004 /tokens |
| ops-embedding-dim-reduction-001 | 0.002 /tokens | 0.002 /tokens |

### Free Tier
- OpenAI-compatible mode: 1,000,000 tokens free per month
- Custom deployment services: 10,000 free requests per month

### Usage Limits
- Max 32 inputs per request for text embedding
- Max 16 inputs for custom deployment services
- Max 8 MB request body size
- QPS limits range from 10 to 100 depending on service type

### Billing Notes
- Token count is measured from actual processed tokens (not characters)
- Free tier resets monthly
- Charges apply only to successful requests