# opensearch-vector

Part of **OPENSEARCH**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Build a Retrieval-Augmented Generation (RAG) solution](../../intent/opensearch-build-solution/SKILL.md). If you're unsure which path to take, check the routing skill first.

# OpenSearch Vector Search

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|----------------|--------------|-------------|
| Vector Query | Synchronous | Retrieves the results of a vector query test in Vector Search Edition by instance ID and query parameters. |
| Calculate Vector Similarity Score | Synchronous | Computes proximity scores using ProximaScore within a custom Cava scorer for ranking logic. |
| Perform Vector Search | Synchronous | Executes similarity searches using vector embeddings with configurable thresholds, top_n, and HNSW parameters. |
| Perform Vector Retrieval | Synchronous | Runs vector-based search queries using HA3 or SQL syntax with filters and custom retrieval parameters. |
| Compute Vector Similarity | Synchronous | Uses built-in l2_distance or ip_distance functions to calculate vector similarity in sort/filter clauses. |
| Build RAG Solution | Synchronous | Implements Retrieval-Augmented Generation solutions using vector search and external LLMs like DeepSeek. |
| Generate Multi-Modal Embedding | Synchronous | Creates vector representations from combined text and image inputs using multimodal models. |
| Reduce Vector Dimension | Synchronous | Applies dimensionality reduction to high-dimensional vectors using fine-tuned models. |
| Text to Embedding | Synchronous | Converts text into dense vector embeddings for semantic retrieval and classification. |
| Text to Sparse Vector | Synchronous | Generates sparse vector representations of text for efficient lexical-semantic hybrid search. |
| Get Multimodal Embedding | Synchronous | Generates embeddings from multimodal inputs via SDK or REST API for semantic search. |
| Get Text Embedding | Synchronous | Produces dense vector embeddings from text input using SDK or REST API. |
| Get Text Sparse Embedding | Synchronous | Produces sparse vector representations of text via SDK or REST API. |

## API Calling Patterns

### Authentication
Use **Bearer Token** authentication as the primary method:
- Include the header: `Authorization: Bearer <your_api_key>`
- Store your credential in the environment variable: `DASHSCOPE_API_KEY`
- This method is used across all embedding, multimodal, and vector query endpoints.

### Service Endpoint (Endpoint)
APIs use region-specific endpoints with the following pattern:
- Base URL format: `https://{region}.opensearch.aliyuncs.com`
- Common regions: `cn-hangzhou`, `cn-shanghai`, `cn-beijing`
- For embedding services: `{host}/v3/openapi/workspaces/{workspace_name}/{service_type}/{service_id}`
- For OpenAI-compatible mode: `{host}/compatible-mode/v1/embeddings`

### Synchronous Pattern
All vector and embedding APIs follow a synchronous request-response flow:
1. Send a POST request with JSON body to the service endpoint
2. Include `Authorization: Bearer <API_KEY>` header
3. Receive a complete JSON response with embeddings, scores, or query results
4. No polling or async handling is required — the response is immediate

## Parameter Reference

### Text to Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array/String | Yes | — | max 32 items, length depends on model | Input text(s) to convert to embeddings. |
| input_type | String | No | document | one of: query, document | Specifies whether input is a search query or indexed document. |

### Text to Sparse Vector

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | Array/String | Yes | — | max 32 items, max 8192 tokens per input | Text inputs for sparse vectorization. |
| input_type | String | No | document | one of: query, document | Input type context for token weighting. |
| return_token | boolean | No | false | one of: true, false | Whether to return original tokens alongside weights. |

### Generate Multi-Modal Embedding

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | List[ContentObject] | Yes | — | max 32 items, total body ≤ 8MB | List of objects containing text and/or image. |
| text | String | No | — | — | Text content within an input object. |
| image | String | No | — | URL or base64-encoded image | Image as accessible URL or `data:image/...;base64,...` string. |

### Reduce Vector Dimension

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| input | List[List[Float]] | Yes | — | total body ≤ 8MB | Set of high-dimensional vectors to compress. |
| parameters.output_dimension | Integer | No | 512 | — | Target output dimension after reduction. |
| parameters.model_name | String | No | — | — | Name of user-trained compression model (required for custom models). |

### Perform Vector Retrieval

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| query | String | Yes | — | — | Vector retrieval query string with index name and vectors. |
| n | integer | No | — | max 10000 | Number of top results to return. |
| sf | number | No | — | range 0.0–10.0 | Similarity score threshold for filtering results. |
| search_params | String | No | — | valid JSON format | Fine-tuning parameters as JSON string (e.g., HNSW ef). |

## Code Examples

### Text Embedding - Python - All Regions

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetTextEmbeddingRequest

if __name__ == '__main__':
    config = Config(bearer_token="your-api-key",
                    endpoint="your-endpoint-hangzhou.opensearch.aliyuncs.com",
                    protocol="http")
    client = Client(config=config)

    request = GetTextEmbeddingRequest(input=["科学技术是第一生产力", "opensearch产品文档"], input_type="query")
    response = client.get_text_embedding("default", "ops-text-embedding-001", request)
    print(response)
```

### Multimodal Embedding - curl - All Regions

```bash
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
"http://your-endpoint-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
-d '{
"input":[
  {
    "image":"http://example.com/a.jpg"
  }
]
}'
```

### Vector Retrieval with Threshold - Shell - All Regions

```bash
# New version syntax
query=vector_index:'0.1,0.2,0.98,0.6;0.3,0.4,0.98,0.6'&vector_search={"index_name":{"threshold":0.8}}
```

### Sparse Text Embedding - Python - All Regions

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetTextSparseEmbeddingRequest

if __name__ == '__main__':
    config = Config(bearer_token="your-api-key",
                    endpoint="your-endpoint-hangzhou.opensearch.aliyuncs.com",
                    protocol="http")
    client = Client(config=config)

    request = GetTextSparseEmbeddingRequest(input=["test", "text"], input_type="document", return_token=True)
    response = client.get_text_sparse_embedding("default", "ops-text-sparse-embedding-001", request)
    print(response)
```

### OpenAI-Compatible Text Embedding - Bash - All Regions

```bash
curl http://your-endpoint-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "ops-text-embedding-001",
    "input": "搜索开发平台"
  }'
```

### Vector Dimension Reduction - Bash - All Regions

```bash
curl --location 'http://your-endpoint-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/embedding-tuning/ops-embedding-dim-reduction-001/' \
--header 'Authorization: Bearer your-api-key' \
--header 'Content-Type: application/json' \
--data '{  
  "input": [
    [0.111,0.222,0.333],
    [0.121,0.221,0.331]
  ],
  "parameters":{
    "output_dimension": "512",
    "model_name" : "your-model-name"
  }
}'
```

## Response Format

```json
{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "token_count": 3072
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.02868066355586052,
                    0.022033605724573135,
                    -0.0417383536696434,
                    ...
                ]
            }
        ]
    }
}
```

**Key Fields**:
- `result.embeddings[].embedding` — The generated dense vector (list of floats)
- `result.embeddings[].index` — Position of the input item in the request
- `usage.token_count` — Total tokens processed (used for billing)
- `request_id` — Unique identifier for debugging and tracing
- `latency` — Processing time in milliseconds

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|-------------------|----------------------------|----------------------------------------|
| 400 | Bad Request – The query syntax is invalid or contains malformed parameters. | Validate query structure, vector format, and parameter types. |
| 401 | Unauthorized – The API key or credentials are invalid or not authorized. | Verify `DASHSCOPE_API_KEY` is correct and has required permissions. |
| 403 | Forbidden – The user does not have sufficient permissions. | Check RAM policies and ensure 'searchengine:ListVectorQueryResult' or equivalent is granted. |
| 404 | Not Found – The specified instance or index does not exist. | Confirm instance ID, workspace name, and service ID are correct. |
| 429 | Too Many Requests – Rate limit exceeded. | Implement exponential backoff; check QPS limits per service. |
| 500 | Internal Server Error – Unexpected server-side failure. | Retry with backoff; contact support if persistent. |
| 503 | Service Unavailable – Temporary overload or maintenance. | Retry after a short delay. |
| InvalidParameter | Parameter format error (e.g., JSON parse failure). | Ensure request body is valid JSON and matches schema. |
| BadRequest.TaskNotExist | Task or service ID not found. | Verify service ID (e.g., `ops-text-embedding-001`) exists in your workspace. |

### Rate Limits & Retry
- **Text/Multimodal Embedding**: 50 QPS (includes main and RAM sub-accounts)
- **Vector Retrieval**: 100 QPS per index
- **Dimension Reduction**: 50 QPS
- **General Vector Query**: No explicit QPS stated, but subject to account-level quotas
- **Retry Strategy**: Use exponential backoff (e.g., 1s, 2s, 4s) on 429/5xx errors. Respect `Retry-After` if provided.

## Environment Requirements

- **Python SDK**: `pip install alibabacloud_searchplat20240529>=2.1.0 alibabacloud_tea_openapi>=0.3.0`
- **Environment Variable**: `export DASHSCOPE_API_KEY=your_api_key_here`
- **Endpoint Configuration**: Remove `http://` prefix when setting SDK `endpoint` parameter

## FAQ

Q: How do I choose between dense and sparse vectors?
A: Use dense vectors for semantic similarity (e.g., "car" ≈ "automobile"). Use sparse vectors for keyword matching and hybrid search (combines BM25-like precision with semantic recall).

Q: What distance metrics are supported for vector search?
A: OpenSearch supports Euclidean (l2) and inner product (ip) distance. Use `l2_distance()` or `ip_distance()` in query clauses, or configure the metric during index creation.

Q: Can I mix vector search with traditional keyword search?
A: Yes. Use multi-channel search or combine vector retrieval with filter/sort clauses that include text matching, category filters, and custom scoring logic.

Q: Are there free tiers available for embedding APIs?
A: Yes. Most embedding services (text, sparse, multimodal) offer 1,000 free requests per month. Dimension reduction and some advanced models may not include free quotas.

Q: How do I handle large documents in vector search?
A: Split long documents into chunks, generate one vector per chunk, and store all vectors in a single field. OpenSearch supports multi-vector fields and will match any chunk during retrieval.

## Pricing & Billing

### Billing Model
- **Embedding APIs (text, sparse, multimodal)**: Billed per token (input only)
- **Vector Query / Retrieval**: Billed per request
- **Dimension Reduction**: Billed per request

### Price Reference

| Model/Specification | Input Price | Output Price |
|---------------------|-------------|--------------|
| ops-text-embedding-001 | 0.002 /tokens | 0.002 /tokens |
| ops-text-sparse-embedding-001 | 0.002 /tokens | 0.002 /tokens |
| ops-m2-encoder | 0.002 /tokens | 0.002 /tokens |
| ops-mm-embedding-v1-7b | 0.010 /tokens | 0.010 /tokens |
| Vector Query (standard) | 0.001 / | 0.001 / |
| Vector Retrieval (standard) | 0.001 / | 0.001 / |

### Free Tier
- 1,000 free requests per month for most embedding and vector query services
- Free tier applies to both main account and RAM sub-accounts combined

### Usage Limits
- Max 32 inputs per embedding request
- Max 8MB request body size
- Max 8192 tokens per text input (varies by model)
- Max 10,000 namespaces in vector indexes
- Max 2 vector indexes per query

### Billing Notes
- Failed requests (e.g., 4xx/5xx) still count toward quota and billing
- Token count includes all input text; output tokens are not charged separately for embeddings
- Dimension reduction is billed per request regardless of vector count or size (within 8MB limit)