# es-image

Part of **ES**

# Elasticsearch Multimodal Processing

## Capabilities Overview

| Sub-capability | Models | API Pattern | Description |
|----------------|--------|-------------|-------------|
| Extract Image Content | ops-image-analyze-vlm-001, ops-image-analyze-ocr-001 | Async Task / Synchronous | Perform OCR and visual analysis to extract text and metadata from images. |
| Detect Objects in Image | ops-object-detect-001, ops-object-detect-face-001 | Synchronous | Identify and locate objects within images using computer vision models. |
| Identify Main Subjects in Image | — | Synchronous | Determine primary subjects or focal points in visual content. |
| Transcribe Audio/Video | — | Async Task / Synchronous | Convert spoken content in audio or video files to text. |
| Segment Video | — | Async Task | Divide videos into meaningful temporal segments. |
| Extract Video Keyframes | — | Async Task / Synchronous | Capture representative frames from video content. |
| Create Video Summarization Task | — | Async Task | Generate concise summaries of video content. |

## Model Selection Guide

### Extract Image Content

| Model ID | API Pattern |
|----------|-------------|
| ops-image-analyze-vlm-001 | Async Task / Synchronous |
| ops-image-analyze-ocr-001 | Async Task / Synchronous |

### Detect Objects in Image

| Model ID | API Pattern |
|----------|-------------|
| ops-object-detect-001 | Synchronous |
| ops-object-detect-face-001 | Synchronous |

## API Calling Patterns

### Authentication
The primary authentication method is Bearer Token authentication.

- Include the header: `Authorization: Bearer <your_api_key>`
- Store your API key in the environment variable: `DASHSCOPE_API_KEY`
- While other auth methods may exist, Bearer Token is the recommended and most commonly used approach across all multimodal APIs.

### Service Endpoint
The base URL pattern for API endpoints is:

```text
http://{region}-hangzhou.opensearch.aliyuncs.com
```

Common regions include:
- cn-hangzhou
- cn-shanghai  
- cn-beijing

Note: The actual endpoint includes your specific workspace and service identifiers in the path.

### Async Task Pattern
Used by: Extract Image Content, Transcribe Audio/Video, Segment Video, Extract Video Keyframes, Create Video Summarization Task

1. **Submit Task**: POST to `/.../async` endpoint with input data (URL, Base64, or OSS path)
2. **Receive Task ID**: Response contains a `task_id` for tracking
3. **Poll Status**: Repeatedly GET `/.../async/task-status?task_id={task_id}` until status changes from `PENDING`
4. **Get Results**: Final response includes `status: "SUCCESS"` and `data` with results, or `status: "FAILED"` with error details

Recommended polling interval: 5 seconds between requests.

### Synchronous Pattern
Used by: Extract Image Content, Detect Objects in Image, Identify Main Subjects in Image, Transcribe Audio/Video, Extract Video Keyframes

1. **Single Request**: POST to `/.../sync` endpoint with input data
2. **Immediate Response**: Receive complete results in the response body (no polling needed)
3. **Timeout Consideration**: Suitable for smaller files that process quickly (<30 seconds)

## Parameter Reference

### Extract Image Content

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| service_id | string | true | — | one of: ops-image-analyze-vlm-001, ops-image-analyze-ocr-001 | The built-in service ID |
| document.url | string | false | — | — | The URL of the image file. Supports HTTP and HTTPS |
| document.content | string | false | — | — | The Base64-encoded content of the image file |
| document.file_name | string | false | — | — | The file name. Required if both url and content are blank |
| document.file_type | string | false | — | one of: jpg, jpeg, png, bmp, tiff | The file type |

### Detect Objects in Image

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| service_id | string | true | — | must start with ops- | The service ID |
| image.url | string | false | — | must be publicly accessible | A publicly accessible URL of the image |
| image.content | string | false | — | base64-encoded string | The image file content, encoded as a base64 string |

### Transcribe Audio/Video

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.oss | string | false | — | mutually exclusive with content | The OSS path of the input file |
| input.content | string | false | — | mutually exclusive with oss | Base64-encoded audio or video data |
| input.file_name | string | false | — | — | The name of the audio or video file |
| output.type | string | false | oss | one of: text, oss | The output format |
| output.oss | string | false | — | required when type=oss | The OSS path for output files |

### Extract Video Keyframes

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.oss | string | false | — | mutually exclusive with content | The OSS path of the input video |
| input.content | string | false | — | mutually exclusive with oss | Base64-encoded video content |
| parameters.interval | integer | false | 1 | — | The keyframe extraction interval, in seconds |
| parameters.format | string | false | jpg | one of: jpg, png | The image format for captured frames |
| output.type | string | false | oss | one of: oss, base64 | The output type |

## Code Examples

### Extract Image Content from URL - Python - All Regions

```python
import time
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import (
    CreateImageAnalyzeTaskRequestDocument,
    CreateImageAnalyzeTaskRequest,
    CreateImageAnalyzeTaskResponse,
    GetImageAnalyzeTaskStatusRequest,
    GetImageAnalyzeTaskStatusResponse,
)

# Configure the client
config = Config(
    bearer_token="<your-api-key>",
    endpoint="<your-endpoint>",  # Do not include http://
    protocol="http",
)
client = Client(config=config)

# Submit the image by URL
document = CreateImageAnalyzeTaskRequestDocument(
    url="https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/6802494071/p756843.png",
)
request = CreateImageAnalyzeTaskRequest(document=document)
response: CreateImageAnalyzeTaskResponse = client.create_image_analyze_task(
    "<your-workspace>", "<your-service-id>", request
)
task_id = response.body.result.task_id
print("task_id:", task_id)

# Poll until complete
poll_request = GetImageAnalyzeTaskStatusRequest(task_id=task_id)
while True:
    response: GetImageAnalyzeTaskStatusResponse = client.get_image_analyze_task_status(
        "<your-workspace>", "<your-service-id>", poll_request
    )
    status = response.body.result.status
    print("status:", status)

    if status == "PENDING":
        time.sleep(5)
    elif status == "SUCCESS":
        print("content:\n" + response.body.result.data.content)
        print("usage:", response.body.usage)
        break
    else:
        print("error:", response.body.result.error)
        break
```

### Object Detection from URL - Bash - All Regions

```bash
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/image-object-detection/ops-object-detect-001/' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "image":
    {
      "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250408/syuvxh/%E7%89%A9%E4%BD%93%E5%AE%9A%E4%BD%8D.png"
    }
}'
```

### Video Segmentation from OSS - Bash - All Regions

```bash
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/video-segmentation/ops-video-segment-001/async' \
--header 'Authorization: Bearer your-api-key' \
--header 'Content-Type: application/json' \
--data '{
  "input": {
    "oss": "oss://my-bucket-name/example/test.mp4"
  },
  "output": {
    "type": "oss",
    "oss": "oss://my-bucket-name/dump/result/path"
  }
}'
```

### Video Summarization with Enhanced Transcript - Bash - All Regions

```bash
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/video-summarization/ops-video-summarize-001/async' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "input": {
    "oss": "oss://my-bucket-name/example/test.mp4"
  },
  "parameters": {
    "rewrite_transcript": true,
    "generate_tags": true,
    "extract_snapshot": true
  },
  "output": {
    "type": "oss",
    "oss": "oss://my-bucket-name/dump/result/path"
  }
}'
```

### Check Async Task Status - Bash - All Regions

```bash
curl -X GET \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API key>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/image-analyze/ops-image-analyze-vlm-001/async/task-status?task_id=d9781786-20b8-4fb4-bbb5-38f82e69****"
```

### Extract Image Content Synchronously - Bash - All Regions

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API key>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/image-analyze/ops-image-analyze-vlm-001/sync" \
  -d '{
    "document": {
      "url": "https://img01.yzcdn.cn/****/2017/05/11/FoTMgBa0SvUaAeFruY7i7O_EUMhf.jpg%21middle.jpg",
      "file_type": "jpg"
    }
  }'
```

## Response Format

```json
{
    "request_id": "CD4E26F0-23FF-449C-83DC-20CC8FF1****",
    "latency": 8.0,
    "http_code": 200,
    "result": {
        "task_id": "cd4e26f0-23ff-449c-83dc-20cc8ff1****"
    }
}
```

**Key Fields**:
- `result.task_id` — Unique identifier for tracking async tasks
- `request_id` — Unique request identifier for debugging
- `latency` — Processing time in milliseconds
- `http_code` — HTTP status code of the response
- `result.status` — Current status of async task (PENDING, SUCCESS, FAILED)
- `result.data` — Actual results when status is SUCCESS

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|-------------------|
| InvalidParameter | Invalid request. Example: 'document.content or document.url required, and both cannot be present at the same time'. | Ensure only one of url or content is provided |
| BadRequest.TaskNotExist | Task does not exist. This occurs when querying a task status using a non-existent task_id. | Verify the task_id is correct and was returned from a successful create task request |
| InternalServerError | Internal server error. | Retry the request after a short delay. If persistent, contact support |
| 400 | Bad request — invalid input format, missing required fields, or malformed URL. | Validate input parameters and ensure correct syntax |
| 401 | Unauthorized — invalid or missing API key. | Check that your API key is valid and properly included in the Authorization header |
| 429 | Too many requests — rate limit exceeded. | Wait and retry after the specified interval, or implement exponential backoff |

### Rate Limits & Retry
- Extract Image Content: 10 QPS (Alibaba Cloud account and RAM users)
- Detect Objects in Image: 20 QPS
- Transcribe Audio/Video: 5 QPS (shared across the Alibaba Cloud account and RAM users)
- Segment Video: 5 QPS (including root accounts and RAM users)
- Extract Video Keyframes: 5 QPS (shared across your Alibaba Cloud account and RAM users)
- Create Video Summarization Task: 5 QPS

Recommended retry strategy: Implement exponential backoff with jitter. For 429 errors, respect any `Retry-After` header if present, otherwise start with a 1-second delay and double on each retry up to a maximum of 30 seconds.

## Environment Requirements

- Python SDK: `pip install alibabacloud_searchplat20240529>=1.0.0`
- Environment variable setup: `export DASHSCOPE_API_KEY=your_api_key_here`
- Python version: 3.6 or higher (as required by the SDK)

## FAQ

Q: How do I choose between synchronous and asynchronous processing?
A: Use synchronous for small files that process quickly (<30 seconds). Use asynchronous for larger files, batch processing, or when you need to handle long-running operations without blocking your application.

Q: What file formats are supported for image processing?
A: Supported image formats include jpg, jpeg, png, bmp, and tiff. For video processing, supported formats include mp4, avi, mkv, mov, flv, and webm.

Q: Can I use both URL and Base64 content in the same request?
A: No, you must provide either a URL or Base64-encoded content, but not both. The API will return an InvalidParameter error if both are provided.

Q: How do I handle large video files that exceed size limits?
A: For files larger than 8 MB, use OSS (Object Storage Service) paths instead of Base64 encoding. Upload your file to OSS first, then reference it using the `oss://bucket-name/path/to/file` format.

Q: What happens if my async task fails during processing?
A: When polling the task status, you'll receive a response with `status: "FAILED"` and an error object containing details about what went wrong. Common causes include invalid file formats, inaccessible URLs, or corrupted data.

## Pricing & Billing

### Billing Model
- Extract Image Content: Per request (ops-image-analyze-ocr-001) or per output token (ops-image-analyze-vlm-001)
- Detect Objects in Image: Per request
- Identify Main Subjects in Image: Per request
- Transcribe Audio/Video: Per request with minimum 1-minute charge
- Segment Video: Per request
- Extract Video Keyframes: Per thousand frames
- Create Video Summarization Task: Per input and output token

### Price Reference

| Model/Service | Input Price | Output Price | Other Fees |
|---------------|-------------|--------------|------------|
| ops-image-analyze-vlm-001 | — | 0.002 /tokens | usage.token_count billed per output token |
| ops-image-analyze-ocr-001 | — | — | usage.pv_count fixed at 1 call per request |
| ops-object-detect-001 | 0.002 / | — | — |
| ops-object-detect-face-001 | 0.001 / | — | — |
| default (subject identification) | 0.001 / | 0.001 / | — |
| ops-audio-asr-001 | 0.002 /tokens | 0.002 /tokens | — |
| ops-video-segment-001 | 0.002 / | 0.002 / | — |
| ops-video-snapshot-001 | 0.002 / | 0.002 / | — |
| ops-video-summarize-001 | 0.0001 /tokens | 0.0002 /tokens | — |

### Free Tier
- Extract Image Content: 100 
- Identify Main Subjects in Image: 1000 
- Other services: 

### Usage Limits
- Request size limits: Maximum 8 MB per request
- Rate limits: 5-20 QPS depending on service (see Rate Limits section)
- Video processing: Minimum 1-minute charge for audio processing

### Billing Notes
- Async tasks are billed only upon completion (not on submission)
- The ops-image-analyze-vlm-001 service charges based on output tokens
- The ops-image-analyze-ocr-001 service charges based on number of calls (fixed at 1 per request)
- Video summarization charges based on both input and output token counts
- Minimum billing unit varies by service (per request, per token, or per frame)