# opensearch-image

Part of **OPENSEARCH**

# OpenSearch Multimodal Content Processing

## Capabilities Overview

| Sub-capability | Models | Calling Pattern | Description |
|----------------|--------|-----------------|-------------|
| Extract Image Content | ops-image-analyze-vlm-001, ops-image-analyze-ocr-001 | Async Task | Extract text and analyze visual content from images. |
| Detect Image Objects | ops-object-detect-001, ops-object-detect-face-001 | Synchronous | Identify and locate objects within images using computer vision models. |
| Detect Objects in Image | ops-object-detect-001, ops-object-detect-face-001 | Synchronous | Identify and locate objects within images, including faces. |
| Segment Video |  | Async Task | Break down videos into segments or scenes for analysis. |
| Video Segmentation |  | Async Task | Split videos into segments based on scene changes or other criteria. |
| Extract Video Keyframes |  | Async Task | Capture representative frames from video content. |
| Create Video Summarization Task |  | Async Task | Generate a textual summary of video content. |
| Create Video Summary | ops-video-summarize-001 | Async Task | Generate concise summaries of video content using AI models. |
| Create Video Snapshot Task |  | Async Task | Extract frames from videos at specified intervals or timestamps. |
| Transcribe Audio/Video |  | Async Task | Convert spoken audio from audio or video files into text. |
| Create Audio ASR Task |  | Async Task | Transcribe audio recordings into text using automatic speech recognition. |

## Model Selection Guide

### Extract Image Content

| Model ID | Calling Pattern |
|----------|-----------------|
| ops-image-analyze-vlm-001 | Async Task |
| ops-image-analyze-ocr-001 | Async Task |

### Detect Image Objects

| Model ID | Calling Pattern |
|----------|-----------------|
| ops-object-detect-001 | Synchronous |
| ops-object-detect-face-001 | Synchronous |

### Detect Objects in Image

| Model ID | Calling Pattern |
|----------|-----------------|
| ops-object-detect-001 | Synchronous |
| ops-object-detect-face-001 | Synchronous |

### Create Video Summary

| Model ID | Calling Pattern |
|----------|-----------------|
| ops-video-summarize-001 | Async Task |

## API Calling Patterns

### Authentication
Use Bearer Token authentication as the primary method.

- Header format: `Authorization: Bearer <your_api_key>`
- Environment variable: `DASHSCOPE_API_KEY`
- Alternative methods exist but Bearer Token is recommended for all API calls

### Service Endpoint
APIs use region-specific endpoints with this pattern:
`http://{region}-hangzhou.opensearch.aliyuncs.com`

Common regions include:
- cn-hangzhou
- cn-shanghai  
- cn-beijing

### Async Task Pattern
Used for most multimodal processing functions (image analysis, video processing, audio transcription):

1. **Submit task**: POST to `/async` endpoint with input parameters
2. **Receive task_id**: Response contains a unique task identifier
3. **Poll for status**: GET to `/async/task-status?task_id={task_id}` 
4. **Check status**: Continue polling while status is "PENDING"
5. **Get results**: When status becomes "SUCCESS", parse the result data

Key headers:
- `Content-Type: application/json`
- `Authorization: Bearer <your_api_key>`

Polling interval: Recommended 5-second delays between status checks

### Synchronous Pattern
Used for object detection functions:

1. **Single request**: POST to base endpoint with image data
2. **Immediate response**: Receive results directly in the response body
3. **No polling needed**: Results are returned synchronously

Key headers:
- `Content-Type: application/json`  
- `Authorization: Bearer <your_api_key>`

## Parameter Reference

### Extract Image Content

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| service_id | string | true |  | one of: ops-image-analyze-vlm-001, ops-image-analyze-ocr-001 | The built-in service ID |
| document.url | string | false |  |  | The URL of the image file. Supports HTTP and HTTPS |
| document.content | string | false |  |  | The Base64-encoded content of the image file |
| document.file_name | string | false |  |  | The file name. Required if document.url is blank |
| document.file_type | string | false |  | one of: jpg, jpeg, png, bmp, tiff | The file type |

### Detect Objects in Image

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| service_id | string | true |  | must start with ops- | The service ID |
| image.url | string | false |  | must be publicly accessible | A publicly accessible URL of the image |
| image.content | string | false |  | base64-encoded string | The image file content, encoded as a base64 string |

### Video Segmentation

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.oss | string | false |  |  | The OSS path of the input file |
| input.url | string | false |  |  | The URL of the input file |
| output.type | string | false | oss | one of: oss | The output type |
| output.oss | string | true |  |  | The OSS path for the output file |

### Video Snapshot

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.content | string | false |  | Supported formats: mp4, avi, mkv, mov, flv, webm | The Base64-encoded video content |
| input.oss | string | false |  |  | The OSS path of the input video |
| parameters.interval | int | false | 1 |  | The keyframe extraction interval, in seconds |
| parameters.format | string | false | jpg | one of: jpg, png | The image format for captured frames |
| output.type | string | false | oss | one of: oss, base64 | The output type |
| output.oss | string | false |  | Required when type is oss | The OSS path for output files |

### Video Summarization

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.oss | string | false |  |  | The OSS path of the input file |
| input.url | string | false |  |  | The HTTP URL of the input file |
| parameters.rewrite_transcript | boolean | false | false | true/false | Specifies whether to rewrite the ASR result |
| parameters.generate_tags | boolean | false | false | true/false | Specifies whether to extract video tags |
| parameters.extract_snapshot | boolean | false | false | true/false | Specifies whether to extract snapshots |
| output.type | string | false | oss |  | The output type |
| output.oss | string | true |  |  | The OSS path for the output file |

### Audio ASR

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| input.content | string | false |  | mutually exclusive with oss | Base64-encoded audio or video data |
| input.oss | string | false |  | mutually exclusive with content | The OSS path of the input file |
| output.type | string | false | oss | one of: text, oss | The output format |
| output.oss | string | false |  |  | The OSS path for output files |

## Code Examples

### Image Content Extraction - Python - All Regions

```python
import time
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import (
    CreateImageAnalyzeTaskRequestDocument,
    CreateImageAnalyzeTaskRequest,
    CreateImageAnalyzeTaskResponse,
    GetImageAnalyzeTaskStatusRequest,
    GetImageAnalyzeTaskStatusResponse,
)

# Configure the client
config = Config(
    bearer_token="<your-api-key>",
    endpoint="<your-endpoint>",  # Do not include http://
    protocol="http",
)
client = Client(config=config)

# Submit the image by URL
document = CreateImageAnalyzeTaskRequestDocument(
    url="https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/6802494071/p756843.png",
)
request = CreateImageAnalyzeTaskRequest(document=document)
response: CreateImageAnalyzeTaskResponse = client.create_image_analyze_task(
    "<your-workspace>", "<your-service-id>", request
)
task_id = response.body.result.task_id
print("task_id:", task_id)

# Poll until complete
poll_request = GetImageAnalyzeTaskStatusRequest(task_id=task_id)
while True:
    response: GetImageAnalyzeTaskStatusResponse = client.get_image_analyze_task_status(
        "<your-workspace>", "<your-service-id>", poll_request
    )
    status = response.body.result.status
    print("status:", status)

    if status == "PENDING":
        time.sleep(5)
    elif status == "SUCCESS":
        print("content:\n" + response.body.result.data.content)
        print("usage:", response.body.usage)
        break
    else:
        print("error:", response.body.result.error)
        break
```

### Object Detection - Python - All Regions

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetImageObjectDetectionRequest, GetImageObjectDetectionResponse

if __name__ == '__main__': 
    # Configure the token and endpoint.
    config = Config(bearer_token="Replace with your API key",
                     # endpoint: The unified request endpoint. Do not include http://.
                    endpoint="Replace with the API endpoint",
                    # The protocol can be set to HTTPS or HTTP.
                    protocol="http")
    client = Client(config=config)
    request = GetImageObjectDetectionRequest()
    request.from_map({"image":{"url":"https://img.alicdn.com/imgextra/i1/O1CN01WksnF41hlhBFsXDNB_!!6000000004318-0-tps-1000-1400.jpg"}})
    response : GetImageObjectDetectionResponse = client.get_image_object_detection("default", 'ops-object-detect-001', request)
    print (response.body.result)
    print (response.body.usage)
```

### Video Snapshot - Bash - All Regions

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/video-snapshot/ops-video-snapshot-001/async" \
  --data '{
    "input": {
      "oss": "oss://<BUCKET_NAME>/test.mp4"
    },
    "parameters": {},
    "output": {
      "type": "oss",
      "oss": "oss://<BUCKET_NAME>/result/path"
    }
  }'
```

### Audio ASR - Bash - All Regions

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API-KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async" \
  --data '{
    "input": {
      "oss": "oss://<BUCKET_NAME>/xxx/xxx.mp3",
      "file_name": "xxx"
    },
    "output": {
      "type": "oss",
      "oss": "oss://<BUCKET_NAME>/result"
    }
  }'
```

### Video Summarization - Curl - All Regions

```curl
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/video-summarization/ops-video-summarize-001/async' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "input": {
    "oss": "oss://my-bucket-name/example/test.mp4"
  },
  "parameters": {
    "rewrite_transcript": true,
    "generate_tags": true,
    "extract_snapshot": true
  },
  "output": {
    "type": "oss",
    "oss": "oss://my-bucket-name/dump/result/path"
  }
}'
```

### Video Segmentation - Curl - All Regions

```curl
curl --location 'http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/video-segmentation/ops-video-segment-001/async' \
--header 'Authorization: Bearer your-api-key' \
--header 'Content-Type: application/json' \
--data '{
  "input": {
    "oss": "oss://my-bucket-name/example/test.mp4"
  },
  "output": {
    "type": "oss",
    "oss": "oss://my-bucket-name/dump/result/path"
  }
}'
```

## Response Format

```json
{
  "request_id": "1",
  "latency": 0,
  "usage": {
    "audio_token": 200,
    "image_token": 100,
    "input_token": 300,
    "output_token": 300
  },
  "result": {
    "task_id": "test-summary-001",
    "status": "SUCCESS",
    "data": {
      "video_metadata": {
        "title": "Video Title",
        "summary": "This is a summary of a product introduction video.",
        "tags": [
          "Product Introduction",
          "Technology"
        ]
      },
      "chunks": [
        {
          "index": 0,
          "enhanced_transcript": "This is the enhanced transcript text.",
          "metadata": {
            "title": "Title of the First Segment",
            "summary": "Summary of the first segment.",
            "tags": [
              "Segment Tag 1",
              "Segment Tag 2"
            ]
          }
        }
      ]
    }
  }
}
```

**Key Fields**:
- `result.task_id` — Unique identifier for the asynchronous task
- `result.status` — Current status of the task (PENDING, SUCCESS, FAILED)
- `result.data.video_metadata.title` — Generated title for the video content
- `result.data.video_metadata.summary` — Concise summary of the entire video
- `result.data.video_metadata.tags` — Extracted tags describing video content
- `result.data.chunks[].index` — Index of the video segment
- `result.data.chunks[].enhanced_transcript` — Improved transcript with punctuation and noise removal
- `result.data.chunks[].metadata.title` — Title for the specific video segment
- `result.data.chunks[].metadata.summary` — Summary of the specific video segment
- `result.data.chunks[].metadata.tags` — Tags for the specific video segment
- `usage.input_token` — Number of input tokens consumed
- `usage.output_token` — Number of output tokens generated
- `usage.audio_token` — Audio processing tokens used
- `usage.image_token` — Image processing tokens used

## Error Handling

| Error Code | Description | Recommended Action |
|------------|-------------|-------------------|
| InvalidParameter | Invalid request parameters. | Check that required fields are provided and parameter formats are correct. |
| BadRequest.TaskNotExist | The specified task does not exist. | Verify the task_id is correct and was returned from a successful create task request. |
| InternalServerError | An internal server error occurred. | Retry the request after a short delay. If persistent, contact support. |
| 400 | Bad Request: The request body exceeds 8 MB or contains invalid parameters. | Ensure request body is under 8MB and parameters follow the correct format. |
| 401 | Unauthorized: Invalid or missing API key in the Authorization header. | Verify your API key is correct and properly included in the Authorization header. |
| 429 | Too Many Requests: Exceeded the QPS limit. | Reduce request frequency or submit a ticket to increase your limit. |
| 500 | Internal Server Error: An unexpected error occurred during task processing. | Retry after a delay; contact support if the issue persists. |

### Rate Limits & Retry
- **Image Content Extraction**: 10 QPS (Alibaba Cloud account and RAM users)
- **Object Detection**: 20 QPS (including root accounts and RAM users)  
- **Video/Audio Processing**: 5 QPS (shared across Alibaba Cloud account and RAM users)

Recommended retry strategy:
- Use exponential backoff with jitter
- Start with 1-second delay, double on each retry (1s, 2s, 4s, 8s)
- Maximum retry attempts: 5
- For 429 errors, respect any Retry-After header if present

## Requirements

- **Python SDK**: `alibabacloud_searchplat20240529>=1.0.0`
- **Java SDK**: `aliyun-searchplat20240529 >= 1.0.0`
- **Environment variable**: `export DASHSCOPE_API_KEY=your_api_key`
- **Python version**: Python>=3.8 (for some SDK features)

## FAQ

Q: How do I handle large video files that exceed the 8MB request limit?
A: Use OSS paths instead of Base64 encoding. Store your video files in Alibaba Cloud OSS and reference them using the `input.oss` parameter with format `oss://bucket-name/path/to/file.mp4`.

Q: What's the difference between synchronous and asynchronous API calls?
A: Synchronous calls (used for object detection) return results immediately in the response. Asynchronous calls (used for most other functions) return a task_id that you must poll to get results, which is better for long-running operations like video processing.

Q: How do I choose between the different image analysis services?
A: Use `ops-image-analyze-ocr-001` for pure text extraction from images (OCR). Use `ops-image-analyze-vlm-001` for multimodal understanding that combines visual analysis with text extraction.

Q: Can I process local files without uploading to OSS?
A: Yes, for files under 8MB you can Base64 encode them and send via the `content` parameter. For larger files, you must upload to OSS first and use the OSS path.

Q: How long do I need to wait for video processing tasks to complete?
A: Processing time varies by video length and complexity. Simple videos may complete in seconds, while longer videos could take minutes. Implement proper polling with 5-second intervals and handle timeout scenarios in your application.

## Pricing & Billing

### Billing Model
- **Image Analysis**: Per request (ops-image-analyze-ocr-001) or per output token (ops-image-analyze-vlm-001)
- **Object Detection**: Per request
- **Video Processing**: Per request or per token depending on service
- **Audio ASR**: Per request or per token

### Price Reference

| Model/Service | Input Price | Output Price | Other Fees |
|---------------|-------------|--------------|------------|
| ops-image-analyze-vlm-001 | 0.002 /tokens | 0.002 /tokens | usage.token_count billed per output token |
| ops-image-analyze-ocr-001 | | 0.001 / |
| ops-object-detect-001 | 0.002 / | |
| ops-object-detect-face-001 | 0.001 / | |
| ops-video-summarize-001 | 0.002 /tokens | 0.004 /tokens | 0.001 / |
| ops-video-snapshot-001 | 0.002 / | 0.002 / |
| ops-audio-asr-001 | 0.002 /tokens | 0.002 /tokens |

### Free Tier
- **Image Content Extraction**: Monthly 100 free calls
- **Video Summarization**: Monthly 10000 tokens free
- **Audio ASR**: Monthly 100 minutes free usage

### Usage Limits
- **Request size**: Maximum 8MB per request body
- **Image Analysis**: 10 QPS
- **Object Detection**: 20 QPS  
- **Video/Audio Processing**: 5 QPS

### Billing Notes
- Async tasks are billed only upon successful completion
- Minimum 1-minute charge applies for audio processing
- OSS storage fees are billed separately from API processing fees
- Failed tasks may still incur charges depending on the service