# opensearch-index

Part of **OPENSEARCH**

# OpenSearch Index and Data Management

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|--------|----------|------|
| Create Index | Synchronous | Creates an index in an Elasticsearch instance, specifying the index name, data source, schema, and configuration details such as partition count and data source information. |
| Delete Index | Synchronous | Deletes an index in an Elasticsearch instance. This API allows users to remove an index and optionally delete its associated data source. |
| Modify Index | Synchronous | Modifies an index in the Elasticsearch service. This API allows updating configuration, data source settings, and other properties of an existing index. |
| List Indexes | Synchronous | Retrieves a list of indexes for a specified Elasticsearch instance, including detailed information about each index such as version, data source, status, and configuration. |
| Get Index Details | Synchronous | Retrieves the details of an index table version in Elasticsearch, including metadata, configuration, and data source information. |
| Manage Index Versions | Synchronous | Handle operations related to index versions including creation, deletion, modification, and publishing. |
| Manage Index Files | Synchronous | Retrieve and modify files associated with index versions. |
| Control Index State | Synchronous | Start, stop, or recover search indexes. |
| Modify Index Partition | Synchronous | Adjust partition settings for a search index. |
| Rebuild Index | Synchronous | Trigger a complete rebuild of a search index. |
| Retrieve Index | Synchronous | Access index data and metadata for retrieval operations. |
| Configure Index Table Schema | Synchronous | Define the schema for an index table via API. |
| Configure Merge Policies | Synchronous | Set merge policies for offline cluster index operations. |
| Configure Index Table | Synchronous | Set up an index table for the online search system. |
| Configure Index Loading Policy | Synchronous | Define how indexes are loaded into memory. |
| Configure Inverted Index | Synchronous | Set up different types of inverted indexes. |
| Manage Data Collection | Synchronous | List or describe data collections. |
| Push Documents | Synchronous | Insert or update documents in an OpenSearch index. |
| Bulk Update Data | Synchronous | Process and update multiple documents at once. |
| Push Document | Synchronous | Add a single document to an index with proper formatting. |
| Stage and Submit Document Changes | Synchronous | Prepare and submit document modifications using DocumentClient class. |
| Push Documents to OpenSearch | Synchronous | Send document data to OpenSearch for indexing. |
| Push Documents Bulk | Synchronous | Perform bulk document ingestion operations. |
| Delete Document | Synchronous | Remove documents from the search index. |
| Push Unstructured Documents | Synchronous | Ingest unstructured document content into the search system. |
| Create Data Collection Task | Synchronous | Set up tasks for collecting data into OpenSearch. |
| Manage Data Collection Tasks | Synchronous | Create, describe, list, and remove data collection tasks. |
| List Schemas | Synchronous | Retrieve available schemas for data synchronization sources. |
| Manage Data Source Tables | Synchronous | List data source tables, fields, and validate data sources. |
| Get Database Schema | Synchronous | Retrieve database schema information. |
| Update SQL Instance | Synchronous | Update SQL instance content or parameters. |

## API Calling Patterns

### Authentication
The primary authentication method is Bearer Token authentication.

- Header format: `Authorization: Bearer <your_api_key>`
- Environment variable: `DASHSCOPE_API_KEY`

### Service Endpoint (Endpoint)
OpenSearch APIs use region-specific endpoints:

- China regions: `https://api.aliyun.com/...`
- International regions: `https://api.alibabacloud.com/...`

Common regions include:
- cn-hangzhou
- cn-shanghai
- cn-beijing

### Synchronous Pattern
All OpenSearch Index and Data Management APIs follow a synchronous calling pattern:

1. Construct a request with required parameters in the URL path, query string, or request body
2. Include the `Authorization: Bearer $DASHSCOPE_API_KEY` header
3. Send the HTTP request (GET, POST, PUT, or DELETE depending on the operation)
4. Receive an immediate JSON response with results or error details
5. Parse the response to extract the `requestId` and operation result

For document operations, the request body typically contains JSON-formatted document data with `cmd` and `fields` properties. For index management operations, parameters may be passed in the URL path or as a JSON body.

## Parameter Reference

### Create Index

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| instanceId | string | true | null | null | The instance ID. |
| name | string | false | null | null | The name of the index. |
| dataSource | string | false | null | null | The name of the data source. |
| domain | string | false | null | null | The data center of the data source. |
| content | string | false | null | null | The index schema. |
| partition | integer | false | null | null | The number of data shards. |
| type | string | false | null | one of: odps, swift, saro, oss | The type of the data source. Valid values: odps, swift, saro, oss. |
| dryRun | boolean | false | null | one of: true, false | Specifies whether to perform a dry run. A dry run only checks whether the data source is valid. Valid values: true, false. |

### Push Documents

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| cmd | string | true | null | one of: ADD, UPDATE, DELETE | The operation to perform on the document: ADD, UPDATE, or DELETE. |
| fields | object | true | null | null | The fields of the document to be pushed. Must include a primary key field. |
| timestamp | integer | false | null | Unix timestamp in milliseconds | Optional timestamp to control the order of document updates when multiple updates exist for the same primary key. |

### Manage Data Collection Tasks

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| appGroupIdentity | string | true | null | null | The name of the application. |
| dataCollectionIdentity | string | true | null | null | The ID of the data collection task. |
| name | string | true | null | null | The name of the data collection task. |
| type | string | true | null | one of: server | The type of the data source. Valid value: server. |

### Configure Inverted Index

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| index_name | string | true | null | Cannot be 'summary' | The name of the inverted index. Referenced in query statements. Cannot be set to 'summary'. |
| index_type | string | true | null | One of: PACK, TEXT, NUMBER, STRING, PRIMARYKEY64, PRIMARYKEY128, DATE, RANGE, SPATIAL | The type of inverted index to create. Must be one of: PACK, TEXT, NUMBER, STRING, PRIMARYKEY64, PRIMARYKEY128, DATE, RANGE, SPATIAL. |
| index_fields | array | true | null | Maximum 32 fields for PACK; one field for others | The fields to include in the index. For PACK, up to 32 TEXT fields; for others, a single field. All fields must be of the same type. |

## Code Examples

### Create Index - curl - all

```bash
curl -X POST 'https://api.aliyun.com/api/searchengine/2021-10-25/CreateIndex' \
-H 'Authorization: Bearer $DASHSCOPE_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "instanceId": "ha-cn-pl32rf0****",
  "name": "test_api",
  "dataSource": "odps_data_source",
  "type": "odps",
  "partition": 2,
  "autoBuildIndex": true
}'
```

### List Indexes - python - all

```python
import requests

url = "https://api.aliyun.com/openapi/ha3/instances/ose-test1/indexes"
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY",
    "Content-Type": "application/json"
}

response = requests.get(url, headers=headers)
print(response.json())
```

### Push Documents Bulk - go - all

```go
// This file is auto-generated, don't edit it. Thanks.
package main

import (
    "fmt"
    util "github.com/alibabacloud-go/tea-utils/service"
    "github.com/alibabacloud-go/tea/tea"
    opensearch "main/client"
)

func main() {
    // Create a client instance for sending requests.
    // Endpoint: the endpoint of the OpenSearch API in your region.
    // AccessKeyId and AccessKeySecret: the AccessKey pair used for authentication.
    config := &opensearch.Config{
        Endpoint:         tea.String( "<Endpoint>"),
      
       // Specify your AccessKey pair.
       // Obtain the AccessKey ID and AccessKey secret from environment variables. 
       // You must configure environment variables before you run this code. For more information, see the "Configure environment variables" section of this topic.
       // Specify the AccessKey ID.
        AccessKeyId:     tea.String(os.Getenv("ALIBABA_CLOUD_ACCESS_KEY_ID")),
        AccessKeySecret: tea.String(os.Getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET")),
    }

    // Create a client for sending requests.
    client, _clientErr := opensearch.NewClient(config)

    // If an exception occurs when the system creates the client, _clientErr is not nil. In this case, display the error information.
    if _clientErr != nil {
        fmt.Println(_clientErr)
        return
    }

    // Create documents to be pushed.
    // Make sure that a document contains a primary key field. This field will be used as the primary key field of the table to which the document is pushed. If you do not specify a primary key field, you cannot update the document.
    // To update a document, specify the fields to be updated and their values. Use UTF-8 to encode the document.
    // OpenSearch allows you to create, update, or delete a document by using the ADD, UPDATE, or DELETE command.
    
    document1st := map[string]interface{}{
        "cmd": "ADD",
        "fields": map[string]interface{}{
            "id":       1, // The primary key field of the document.
            "describe": "123456",  // A regular field in the document.
        },
    }

    // You can specify the timestamp field to customize the order of document updates. OpenSearch uses the values of this field to update documents with the same primary key field in the specified order.
    // If this parameter is not specified, OpenSearch updates documents based on the order in which OpenSearch receives the documents. 
    document2nd := map[string]interface{}{
        "cmd": "ADD",
        "timestamp": 1401342874778,
        "fields": map[string]interface{}{
            "id":       2,
            "describe": "123456",
        },
    }

    // Add the documents to an array. Each document contains the operation to be performed on the document.
    requestBody := []interface{}{document1st}

    // You can call the append method to add more documents.
    requestBody = append(requestBody, document2nd)

    // Specify the parameters that are used to configure the request and connection pool.
    runTime := &util.RuntimeOptions{
        ConnectTimeout: tea.Int(5000),
        ReadTimeout:    tea.Int(10000),
        Autoretry:      tea.Bool(false),
        IgnoreSSL:      tea.Bool(false),
        MaxIdleConns:   tea.Int(50),
    }

    // To push documents, you must specify the appName and tableName parameters.
    // appName: the name or version information of the application to which you want to push documents.
    // tableName: the name of the table to which you want to push documents. You can view the tables of an application in the OpenSearch console.
    appName := "<appName>"
    tableName := "<tableName>"

    // Call the method for sending a request.
    response, _requestErr := client.Request(
        tea.String("POST"),
        tea.String("/v3/openapi/apps/"+appName+"/"+tableName+"/actions/bulk"),
        nil,
        nil,
        requestBody,
        runTime)

    // If an exception occurs when the system sends the request, _requestErr is not nil. In this case, display the error information.
    if _requestErr != nil {
        fmt.Println(_requestErr)
        return
    }

    // Display the response if no exception occurs.
    fmt.Println(response)
}
```

### Delete Document - java - all

```java
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;

/**
 * Demo code for implementing the document deletion feature
 */
public class testDeleteDemo {

    private static String appName = "The name of the OpenSearch application for which you want to implement the document deletion feature";
    private static String accesskey = "The AccessKey ID of your Alibaba Cloud account";
    private static String secret = "The AccessKey secret of your Alibaba Cloud account";
    private static String host = "The endpoint of the OpenSearch API in your region";
    private static String path = "/apps/%s/actions/knowledge-bulk";

    public static void main(String[] args) {

        String appPath = String.format(path, appName);

        // Create an OpenSearch object.
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        // Use the OpenSearch object as a parameter to create an OpenSearchClient object.
        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);

        // Create a test document.
        JSONObject oneRequest = new JSONObject();
        oneRequest.put("cmd", "DELETE");
        JSONObject fields = new JSONObject();
        fields.put("id", "The ID of the test document to be deleted");
        oneRequest.put("fields", fields);

        // You can add multiple data entries at a time.
        JSONArray request = new JSONArray();
        request.add(oneRequest);

        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", request.toJSONString());
        }};
        try {
            OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
            // Display the returned result.
            System.out.println(openSearchResult.getResult());
        } catch (OpenSearchException e) {
            e.printStackTrace();
        } catch (OpenSearchClientException e) {
            e.printStackTrace();
        }
    }
}
```

### Manage Data Collection Tasks - python - all

```python
import requests

url = "https://opensearch.cn-hangzhou.aliyuncs.com/v4/openapi/app-groups/os_function_test_v1/data-collections"
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "type": "server",
    "name": "os_function_test_v1"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())
```

### Push Unstructured Documents - python - all

```python
# -*- coding: utf-8 -*-

import time, os
import base64
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from opensearch.V3_cases.doc_search.BaseRequest111 import Config, Client

class knowledge:
    def __init__(self, config: Config):
        self.Clients = Client(config=config)
        self.runtime = util_models.RuntimeOptions(
            connect_timeout=10000,
            read_timeout=10000,
            autoretry=False,
            ignore_ssl=False,
            max_idle_conns=50,
            max_attempts=3
        )
        self.header = {}

    def docBulk(self, app_name: str,doc_content: list):
        try:
            response = self.Clients._request(method="POST",
                                             pathname=f'/v3/openapi/apps/{app_name}/actions/knowledge-bulk',
                                             query={}, headers=self.header,
                                             body=doc_content, runtime=self.runtime)
            return response
        except Exception as e:
            print(e)

if __name__ == "__main__":
    # Specify the endpoint of the OpenSearch API. The value does not contain the http:// prefix.
    endpoint = "<endpoint>"
    # Specify the request protocol. Valid values: HTTPS and HTTP.
    endpoint_protocol = "HTTP"
    # Specify your AccessKey pair.
    # Obtain the AccessKey ID and AccessKey secret from environment variables. 
    # You must configure environment variables before you run this code. For more information, see the "Configure environment variables" section of this topic.
    access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
    access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
    # Specify the authentication method. Default value: access_key. A value of sts specifies authentication based on Resource Access Management (RAM) and Security Token Service (STS).
    # Valid values: sts and access_key.
    auth_type = "access_key"
    # If you use authentication based on RAM and STS, you must specify the security_token parameter. You can call the AssumeRole operation of Alibaba Cloud RAM to obtain an STS token.
    security_token = "<security_token>"
    # Specify common request parameters.
    # Note: The security_token and type parameters are required only if you use the SDK as a RAM user.
    Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
                     security_token=security_token, type=auth_type, protocol=endpoint_protocol)
    # Create an OpenSearch LLM-Based Conversational Search Edition instance.
    ops = knowledge(Configs)
    app_name = "The name of the OpenSearch LLM-Based Conversational Search Edition instance"

    # ---------------Push unstructured documents to an OpenSearch LLM-Based Conversational Search Edition instance---------------
    # Modify the paths of local files.
    with open('/Users/liu/Downloads/test.docx', 'rb') as file:
        data = file.read()
        data_b64 = base64.b64encode(data)

        document = [
        {
            "fields": {
                "id": "1",
                "title": "test.pdf",
                "url": "www.baidu.com",
                "content": data_b64,
                "category": "opensearch",
                "timestamp": 1691722088645,
                "score": 0.8821945219723084
            },
            "cmd": "BASE64"
        }
    ]

        # Delete documents.
        deletedocument = {"cmd": "DELETE", "fields": {"id": 2}}
        documents = document
        res5 = ops.docBulk(app_name=app_name, doc_content=documents)
        print(res5)
```

### Configure Index Loading Policy - json - all

```json
{
    "load_config": [
        {
            "file_patterns": [
                "_ATTRIBUTE_",
                "/index/title/.*",
                "/index/body/dictionary"
            ],
            "load_strategy": "mmap",
            "lifecycle": "hot",
            "load_strategy_param": {
                "lock": true,
                "partial_lock": true,
                "advise_random": false,
                "slice": 4194304,
                "interval": 2
            },
            "remote": false,
            "deploy": true,
            "warmup_strategy": "sequential"
        },
        {
            "file_patterns": [
                "_SUMMARY_"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 4096
            },
            "remote": true,
            "deploy": false
        },
        {
            "file_patterns": [
                ".*"
            ],
            "warmup_strategy": "none",
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": false
            }
        }
    ]
}
```

### List Data Source Tables - bash - all

```bash
curl -X GET 'https://api.aliyun.com/v4/openapi/assist/data-sources/rds/tables?params=%7B%22host%22%3A%22example.rds.com%22%2C%22port%22%3A3306%2C%22username%22%3A%22admin%22%2C%22password%22%3A%22secret%22%7D' \
-H 'Authorization: Bearer $DASHSCOPE_API_KEY' \
-H 'Content-Type: application/json'
```

## Response Format

```json
{
  "requestId": "407BFD91-DE7D-50BA-8F88-CDE52A3B5E46",
  "result": {}
}
```

**Key Fields**:
- `requestId` — Unique identifier for the API request
- `result` — Operation result object containing specific response data

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|---------------|--------------------|-----------------------------|
| 400 | Bad Request - The request is malformed or missing required parameters. | Verify all required parameters are included and properly formatted |
| 401 | Unauthorized - Authentication failed. Check your API key or credentials. | Ensure your DASHSCOPE_API_KEY is valid and properly set in the Authorization header |
| 403 | Forbidden - You do not have permission to perform this operation. | Check your RAM permissions and ensure you have the necessary access rights |
| 404 | Not Found - The specified instance or resource does not exist. | Verify the instanceId, indexName, or other resource identifiers are correct |
| 429 | Too Many Requests - Rate limit exceeded. Wait before retrying. | Implement exponential backoff and respect rate limits |
| 500 | Internal Server Error - An unexpected error occurred on the server. | Retry the request after a short delay; contact support if the issue persists |
| 503 | Service Unavailable - The service is temporarily unavailable. | Wait and retry later; the service may be undergoing maintenance |

### Rate Limits & Retry
- Standard rate limit: 100 QPS per instance/account
- Implement exponential backoff for retries (e.g., 1s, 2s, 4s, 8s)
- Check for Retry-After header in 429 responses for specific wait times

## Environment Requirements

- Set environment variable: `export DASHSCOPE_API_KEY=your_api_key`
- For SDK usage, install appropriate language-specific packages (e.g., requests for Python)

## FAQ

Q: How do I authenticate my OpenSearch API requests?
A: Use Bearer Token authentication by setting the Authorization header to "Bearer $DASHSCOPE_API_KEY" where DASHSCOPE_API_KEY is your API key stored as an environment variable.

Q: What's the difference between ADD, UPDATE, and DELETE commands when pushing documents?
A: ADD creates a new document or replaces an existing one, UPDATE modifies specific fields of an existing document, and DELETE removes a document entirely. All operations require the document's primary key field.

Q: How can I handle large document uploads efficiently?
A: Use the bulk push API (/actions/bulk) to send multiple documents in a single request rather than making individual requests for each document. This reduces network overhead and improves performance.

Q: What should I do if I receive a 429 (Too Many Requests) error?
A: Implement rate limiting in your client code with exponential backoff. The standard limit is 100 QPS per instance/account, so space out your requests accordingly.

Q: How do I configure an index for optimal performance?
A: Consider your query patterns when designing your schema. Use appropriate inverted index types, configure loading policies to keep frequently accessed data in memory, and set proper partition counts based on your data volume and query load.

## Pricing & Billing

### Billing Model
Billing is per-request for API calls.

### Price Reference

| Model/Specification | Input Price | Output Price | Other Fees |
|-----------|---------|---------|---------|
| standard | 0.0001 / | 0.0001 / |
| default | 0.001 / | 0.001 / |
| bulk_push | 0.0001 / | 0.0001 / |

### Free Tier
Most operations include a free tier of 1000 requests per month, though some specific operations may have lower limits (100 requests).

### Usage Limits
- Rate limits: 100 QPS per instance/account
- Request body size limit: 2 MB (before encoding) for bulk operations
- Single document field size limit: 1 MB per field