# opensearch-multimodal

Part of **OPENSEARCH**

# OpenSearch Multimodal Search

## Capabilities Overview

| Sub-capability | Model | Calling Mode | Description |
|----------------|-------|--------------|-------------|
| Score Multimodal Relevance | ops-mm-reranker-001 | Synchronous | Scores the relevance between a query and candidate documents, supporting text and image inputs. Returns documents sorted by relevance score from 0 to 1. |

## Model Selection Guide

### Score Multimodal Relevance

| Model ID | Calling Mode |
|----------|--------------|
| ops-mm-reranker-001 | Synchronous |

## API Calling Patterns

### Authentication
Use **Bearer Token** authentication with your DashScope API key.

- Include the header: `Authorization: Bearer <your_api_key>`
- Store your key in the environment variable: `DASHSCOPE_API_KEY`
- This is the only supported authentication method for this API.

### Service Endpoint
The base URL follows this pattern:
```text
http://{host}/v3/openapi/workspaces/{workspace_name}/multi-modal-reranker/{service_id}
```

- Replace `{host}` with your region-specific endpoint (e.g., `****-hangzhou.opensearch.aliyuncs.com`)
- Common regions include: `cn-hangzhou`, `cn-shanghai`, `cn-beijing`
- The full endpoint is constructed at runtime using your workspace and service configuration.

### Synchronous Request Flow
1. Send a `POST` request to the endpoint with JSON body containing `query`, `docs`, and optional `options`.
2. Include valid `Authorization` and `Content-Type: application/json` headers.
3. Receive an immediate JSON response with relevance scores, usage metrics, and request metadata.
4. Parse the `result.scores` array to get ranked document indices and their scores (0.0–1.0).

No polling or streaming is involved—this is a single-request, single-response interaction.

## Parameter Reference

### Score Multimodal Relevance

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| host | string | Yes | — | Example: `****-hangzhou.opensearch.aliyuncs.com` | The service endpoint. Call over public network or VPC. |
| workspace_name | string | Yes | — | Example: `default` | The name of the workspace. |
| service_id | string | Yes | — | Must start with `ops-`; e.g., `ops-mm-reranker-001` | The built-in service ID. |
| query | ContentObject | Yes | — | Provide either `text` or `image`—not both, not neither | The query content. |
| docs | List[ContentObject] | Yes | — | Max 100 documents | Candidate documents to rank. Each must contain `text` or `image`. |
| options | OptionObject | No | — | — | Configure image resize settings if input includes images. |

> **Note**: `ContentObject` is a JSON object with either a `text` field (string) or an `image` field (URL string).  
> **Note**: `OptionObject` may include fields like `resize_method` and `resize_options` for image preprocessing.

## Code Examples

### Multimodal Relevance Scoring - Bash - All Regions

```bash
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-reranker/ops-mm-reranker-001/' \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "service_id": "ops-mm-reranker-001",
    "query": {
      "text": "Is there a cake in the picture?"
    },
    "docs": [
      {
        "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250408/syuvxh/%E7%89%A9%E4%BD%93%E5%AE%9A%E4%BD%8D.png"
      }
    ]
  }'
```

### Text-to-Text Relevance Scoring - Python - cn-hangzhou

```python
import requests
import os

api_key = os.getenv("DASHSCOPE_API_KEY")
host = "****-hangzhou.opensearch.aliyuncs.com"
workspace = "default"
service_id = "ops-mm-reranker-001"

url = f"http://{host}/v3/openapi/workspaces/{workspace}/multi-modal-reranker/{service_id}/"

payload = {
    "service_id": service_id,
    "query": {"text": "What is the capital of France?"},
    "docs": [
        {"text": "Paris is the capital and most populous city of France."},
        {"text": "Berlin is the capital of Germany."}
    ]
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())
```

### Image-to-Text Relevance - Python - cn-shanghai

```python
import requests
import os

api_key = os.getenv("DASHSCOPE_API_KEY")
host = "****-shanghai.opensearch.aliyuncs.com"
workspace = "default"
service_id = "ops-mm-reranker-001"

url = f"http://{host}/v3/openapi/workspaces/{workspace}/multi-modal-reranker/{service_id}/"

payload = {
    "service_id": service_id,
    "query": {
        "image": "https://example.com/dog.jpg"
    },
    "docs": [
        {"text": "A photo of a golden retriever playing in the park."},
        {"text": "A black cat sleeping on a windowsill."}
    ]
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())
```

### Batch Document Reranking - Bash - cn-beijing

```bash
curl --location 'http://****-beijing.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-reranker/ops-mm-reranker-001/' \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "service_id": "ops-mm-reranker-001",
    "query": {
      "text": "Describe the Eiffel Tower"
    },
    "docs": [
      {"text": "The Eiffel Tower is a wrought-iron lattice tower in Paris."},
      {"text": "Mount Fuji is the highest mountain in Japan."},
      {"text": "The Statue of Liberty is in New York Harbor."}
    ]
  }'
```

## Response Format

```json
{
  "request_id": "1b87e3c462079a32999a7c8bc173ca7a",
  "latency": 1765,
  "usage": {
    "image_token": 1225,
    "text_token": 16
  },
  "result": {
    "scores": [
      {
        "index": 0,
        "score": 0.9183856248855591
      }
    ]
  }
}
```

**Key Fields**:
- `request_id` — Unique identifier for the request (useful for debugging)
- `latency` — Processing time in milliseconds
- `usage.image_token` — Number of image tokens consumed
- `usage.text_token` — Number of text tokens consumed
- `result.scores[].index` — Original index of the document in the input `docs` array
- `result.scores[].score` — Relevance score between 0.0 and 1.0 (higher = more relevant)

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|---------------------|
| 400 | Bad request. The request body is malformed or contains invalid parameters. | Validate JSON structure, ensure `query` has exactly one of `text` or `image`, and `docs` has ≤100 items. |
| 401 | Unauthorized. The API key is missing, invalid, or expired. | Verify that `DASHSCOPE_API_KEY` is set correctly and the key is active. |
| 429 | Too many requests. Rate limit exceeded. The QPS limit is 20 shared across all Alibaba Cloud accounts and RAM users. | Implement client-side rate limiting or exponential backoff. Consider submitting a ticket to increase quota. |
| 500 | Internal server error. The service encountered an unexpected issue while processing the request. | Retry with exponential backoff. If persistent, contact support with `request_id`. |

### Rate Limits & Retry
- **QPS Limit**: 20 requests per second (shared across all Alibaba Cloud accounts and RAM users).
- **Retry Strategy**: Use exponential backoff (e.g., 1s, 2s, 4s delays) on 429 or 500 errors.
- The `Retry-After` header is not currently used; rely on fixed or exponential backoff instead.

## Environment Requirements

- Set your API key as an environment variable:  
  ```bash
  export DASHSCOPE_API_KEY=your_api_key_here
  ```
- Required dependencies (for Python examples):  
  ```bash
  pip install requests
  ```

## FAQ

Q: Can I send both text and image in the same query?
A: No. The `query` object must contain **either** a `text` field **or** an `image` field—not both, and not neither.

Q: How many documents can I score in one request?
A: Up to 100 candidate documents per request. Exceeding this will result in a 400 error.

Q: What regions are supported for the multimodal reranker?
A: The service is available in major Alibaba Cloud regions including `cn-hangzhou`, `cn-shanghai`, and `cn-beijing`. Use your assigned endpoint host.

Q: Are image URLs required to be publicly accessible?
A: Yes. The service fetches images from the provided URLs, so they must be reachable over the public internet or within your VPC if using private endpoints.

Q: How are tokens counted for images?
A: Images are converted into visual tokens based on resolution and content. The exact count appears in `usage.image_token` in the response.

## Pricing & Billing

### Billing Model
Per-request billing based on total token usage (sum of `image_token` and `text_token`).

### Price Reference

| Model/Spec | Input Price | Output Price |
|------------|-------------|--------------|
| ops-mm-reranker-001 | 0.002 /tokens | 0.002 /tokens |

### Usage Limits
- 20 QPS (shared across Alibaba Cloud accounts and RAM users)

### Billing Notes
Charges are based on token usage (image and text tokens) per request. There is no free tier.