# es-text-embedding

Part of **ES**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Deploy a Retrieval-Augmented Generation (RAG) AI application](../../intent/es-deploy-application/SKILL.md). If you're unsure which path to take, check the routing skill first.

# Elasticsearch Embedding

## Capabilities Overview

| Sub-capability | Models | Calling Mode | Description |
|----------------|--------|--------------|-------------|
| Create Text Embedding | ops-text-embedding-001, ops-text-embedding-zh-001, ops-text-embedding-en-001, +3 more | Synchronous | Generate dense vector embeddings from input text using supported models. |
| Create Sparse Embedding | — | Synchronous | Generate sparse vector representations of text for efficient retrieval. |
| Create Multimodal Embedding | wan2.7-t2v, wanx2.1-t2v-plus, ops-m2-encoder, +2 more | Synchronous | Generate embeddings that combine text and other modalities like images. |

## Model Selection Guide

### Create Text Embedding

| Model ID | Calling Mode |
|----------|--------------|
| ops-text-embedding-001 | Synchronous |
| ops-text-embedding-zh-001 | Synchronous |
| ops-text-embedding-en-001 | Synchronous |
| ops-text-embedding-002 | Synchronous |
| ops-gte-sentence-embedding-multilingual-base | Synchronous |
| ops-qwen3-embedding-0.6b | Synchronous |
| wan2.7-t2v | Synchronous |
| wanx2.1-t2v-plus | Synchronous |

### Create Multimodal Embedding

| Model ID | Calling Mode |
|----------|--------------|
| wan2.7-t2v | Synchronous |
| wanx2.1-t2v-plus | Synchronous |
| ops-m2-encoder | Synchronous |
| ops-m2-encoder-large | Synchronous |
| ops-gme-qwen2-vl-2b-instruct | Synchronous |

## API Calling Patterns

### Authentication
Use **Bearer Token** authentication as the primary method.

- Include the header: `Authorization: Bearer <your_api_key>`
- Set your API key via the environment variable: `OPENSEARCH_API_KEY` or `ALIBABA_CLOUD_API_KEY`
- For custom deployment services, an additional `Token: YOUR_SERVICE_DEPLOYMENT_TOKEN` header is required (obtained from the console under Service Deployment > API Message)

### Service Endpoint
APIs use region-specific endpoints with the pattern:

`http://{instance}-{region}.opensearch.aliyuncs.com`

Common regions include:
- cn-hangzhou
- cn-shanghai
- cn-beijing

For OpenAI-compatible mode, use:
`https://api.ai-search-platform.com/v1/embeddings`

### Synchronous Pattern
All embedding functions use a synchronous request-response flow:

1. Send a POST request to the appropriate endpoint with JSON payload
2. Include required headers (`Content-Type: application/json`, `Authorization: Bearer ...`)
3. Receive a complete JSON response immediately (no polling needed)
4. Parse the `embeddings` array from the response body

This pattern applies to dense text, sparse text, and multimodal embeddings.

## Parameter Reference

### Create Text Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array<String> / String | Yes | — | Max 32 strings per request; non-empty | The text to be vectorized. |
| input_type | String | No | document | One of: query, document | Specifies how the input will be used. |
| dimension | Integer | No | — | Cannot exceed foundation model dimension | Output vector dimension (for custom models with dimensionality reduction). |

### Create Sparse Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array<String> / String | Yes | — | Up to 32 entries; max 8,192 tokens per entry | Input texts to embed. |
| input_type | String | No | document | One of: query, document | Role in retrieval: query or document. |
| return_token | Boolean | No | false | — | Whether to include token strings in the response. |

### Create Multimodal Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| type | String | Yes | — | One of: text, image | Type of input data. |
| data | Array<String> | Yes | — | Max 16 elements; base64-encoded image in `data:image/{format};base64,{base64_image}` format | Text or image data to vectorize. |
| input | Array<ContentObject> | Yes | — | Max 32 entries | List of mixed text/image inputs (for M2 models). |
| image | String | No | — | Max 8 MB request body | Image URL or Base64 string. |
| text | String | No | — | — | Text to embed. |

## Code Examples

### Dense Text Embedding - Python - cn-hangzhou

```python
import os
import requests

host = "http://****-hangzhou.opensearch.aliyuncs.com"
workspace = "default"
service_id = "ops-text-embedding-001"
api_key = os.environ.get("OPENSEARCH_API_KEY")

url = f"{host}/v3/openapi/workspaces/{workspace}/text-embedding/{service_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}",
}
payload = {
    "input": [
        "Science and technology are the primary productive forces",
        "opensearch product documentation",
    ],
    "input_type": "query",
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()
for item in data["result"]["embeddings"]:
    print(item["index"], item["embedding"][:5])
```

### Sparse Text Embedding - Bash - All Regions

```bash
curl -XPOST -H "Content-Type: application/json" \
  "http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "input": [
      "Science and technology are the primary productive forces",
      "OpenSearch product documentation"
    ],
    "input_type": "query",
    "return_token": false
  }'
```

### Multimodal Embedding (Image) - Bash - All Regions

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  "http://<your-endpoint>/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
  -d '{
    "input": [
      {
        "image": "http://example.com/photo.jpg"
      }
    ]
  }'
```

### OpenAI-Compatible Embedding - Python - All Regions

```python
import openai

openai.api_base = "https://api.ai-search-platform.com/v1"
openai.api_key = "your-api-key"

response = openai.Embedding.create(
    model="ops-text-embedding-002",
    input="This is a test sentence.",
)

print(response)
```

### Custom Deployment Service (Text) - Bash - All Regions

```bash
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Token: YOUR_SERVICE_DEPLOYMENT_TOKEN" \
"http://xxxxxx.opensearch.aliyuncs.com/v3/openapi/xxxxxx" \
-d '{
  "input":[
    "Science and technology are primary productive forces",
    "OpenSearch product documentation"
  ],
  "input_type" : "document",
  "dimension": 567
}'
```

### Sparse Embedding with SDK - Python - All Regions

```python
import os
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetTextSparseEmbeddingRequest

config = Config(
    bearer_token=os.environ.get("ALIBABA_CLOUD_API_KEY"),
    endpoint="<your-api-endpoint>",
    protocol="http"
)
client = Client(config=config)

request = GetTextSparseEmbeddingRequest(
    input=["test", "text"],
    input_type="document",
    return_token=True
)

response = client.get_text_sparse_embedding("default", "ops-text-sparse-embedding-001", request)
print(response)
```

### Multimodal Embedding with SDK - Python - All Regions

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetMultiModalEmbeddingRequest

config = Config(
    bearer_token="Replace with your API-KEY",
    endpoint="<your-api-endpoint>",
    protocol="http"
)
client = Client(config=config)

request = GetMultiModalEmbeddingRequest()
request.from_map({
    "input": [
        {"text": "Science and technology are the primary productive forces"}
    ]
})

response = client.get_multi_modal_embedding("default", "ops-m2-encoder", request)
print(response)
```

## Response Format

```json
{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "token_count": 3072
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.02868066355586052,
                    0.022033605724573135,
                    -0.0417383536696434,
                    ...
                ]
            }
        ]
    }
}
```

**Key Fields**:
- `result.embeddings[].index` — Position of the input in the original request
- `result.embeddings[].embedding` — Dense vector of floats representing the semantic meaning
- `usage.token_count` — Number of tokens processed (used for billing)
- `request_id` — Unique identifier for the request (useful for debugging)

For sparse embeddings, the response includes:
- `result.sparse_embeddings[].embedding[].tokenId` — Vocabulary index of non-zero dimensions
- `result.sparse_embeddings[].embedding[].weight` — Relevance score for each token

For multimodal embeddings:
- `usage.image` — Number of images processed
- `usage.token_count` — Total tokens from text inputs

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|-------------------|
| 400 | Bad Request: The request body is malformed or contains invalid parameters. | Validate JSON structure and parameter constraints (e.g., input length, allowed values). |
| 401 | Unauthorized: Invalid or missing API key or service deployment token. | Verify your API key and, if applicable, service deployment token from the console. |
| 403 | Forbidden: Access denied due to insufficient permissions or invalid workspace API key. | Ensure your API key has access to the specified workspace and service. |
| 429 | Too Many Requests: Rate limit exceeded. Wait before retrying. | Implement exponential backoff; check rate limits per model or account. |
| 500 | Internal Server Error: An unexpected error occurred on the server side. | Retry with jittered backoff; contact support if persistent. |
| InvalidParameter | The request contains invalid parameters. Example: JSON parse error when parsing input_type value. | Ensure `input_type` is exactly "query" or "document"; validate all fields. |

### Rate Limits & Retry
- Standard text embedding: 50–100 QPS per API key or workspace
- Sparse embedding: 50 QPS per Alibaba Cloud account
- Multimodal embedding: 10 QPS per account
- Custom deployments: 100 QPS per workspace API key

When encountering 429 errors:
- Use exponential backoff (e.g., start with 1s delay, double on each retry)
- Respect the `Retry-After` header if present
- Distribute load across multiple API keys if available

## Environment Requirements

- Python SDK: `pip install alibabacloud_searchplat20240529>=1.0.0`
- OpenAI-compatible SDK: `pip install openai>=1.0.0`
- Set environment variable: `export OPENSEARCH_API_KEY=your_key` or `export ALIBABA_CLOUD_API_KEY=your_key`
- Python version: 3.8 or higher (for SDK usage)

## FAQ

Q: What is the difference between dense and sparse embeddings?
A: Dense embeddings represent text as fixed-length vectors of floats (e.g., 768 dimensions), ideal for semantic similarity. Sparse embeddings use variable-length lists of (token_id, weight) pairs, optimized for keyword matching and hybrid search.

Q: How do I choose between `input_type: query` and `input_type: document`?
A: Use `query` for search queries and `document` for indexed content. Using consistent types improves retrieval quality, especially for asymmetric models trained on query-document pairs.

Q: Can I mix text and images in a single multimodal embedding request?
A: Yes. For models like `ops-m2-encoder`, send an array of objects where each object contains either a `text` field or an `image` field (URL or Base64).

Q: Why am I getting a 401 error even with a valid API key?
A: For custom deployment services, you need both an `Authorization: Bearer ...` header (workspace API key) and a `Token: ...` header (service deployment token from the console).

Q: What is the maximum input length for text embedding?
A: Most models support up to 8,192 tokens per input string. The exact limit depends on the model—check the specific service documentation.

## Pricing & Billing

### Billing Model
- Text and sparse embedding: billed per token (input tokens only)
- Multimodal embedding: billed per token and per image
- Custom deployment services: billed per request

### Price Reference

| Model/Service | Input Price | Output Price |
|---------------|-------------|--------------|
| ops-text-embedding-001 | 0.002 /tokens | 0.002 /tokens |
| ops-text-embedding-002 | 0.002 /tokens | 0.002 /tokens |
| ops-text-sparse-embedding-001 | 0.002 /tokens | 0.002 /tokens |
| ops-m2-encoder | 0.002 /tokens | 0.002 /tokens |
| ops-m2-encoder-large | 0.003 /tokens | 0.003 /tokens |
| ops-gme-qwen2-vl-2b-instruct | 0.004 /tokens | 0.004 /tokens |
| Custom deployment (standard) | 0.0001 / | 0.0001 / |

### Free Tier
- OpenAI-compatible text embedding: 1 million tokens free per month
- Custom deployment services: 10,000 free requests per month

### Usage Limits
- Max 32 inputs per request (text/sparse)
- Max 16 inputs per request (custom deployment)
- Max 8 MB request body size
- Max 8,192 tokens per input string

### Billing Notes
- Token count is measured from the `usage.token_count` field in the response
- Free tier resets monthly
- Reranking and embedding requests under custom deployments are billed per call, not per token