# opensearch-data

Part of **OPENSEARCH**

# OpenSearch Data Query

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|----------------|--------------|-------------|
| Query Merged Table | Synchronous | Queries the information about a wide table generated after a JOIN operation is performed on multiple tables in OpenSearch. |
| Query Cluster Data | Synchronous | The cluster clause is required in every query statement to specify the cluster to search and optionally narrow the query to specific partitions within that cluster. |
| Query Child Table | Synchronous | Explains how to query child tables in OpenSearch Retrieval Engine Edition using the UNNEST function, which enables inner join-like operations between parent and child tables. |
| Lookup Record by Primary Key | Synchronous | Retrieve specific records from a summary table by primary key, bypassing the inverted index for efficient point lookups. |
| Query Data from Table | Synchronous | The SELECT statement is used to query data from tables in the HA3 query syntax, supporting field selection, filtering, grouping, and conditional logic. |
| Combine Result Sets | Synchronous | The UNION operator combines result sets from multiple SELECT statements into a single result set, with UNION removing duplicates and UNION ALL retaining them. |
| Filter Rows by Condition | Synchronous | The WHERE clause filters rows based on boolean conditions, supporting comparison operators, logical operations, IN clauses, and full-text search functions like MATCHINDEX and QUERY. |
| Execute SQL Query | Synchronous | Allows users to write SQL statements to retrieve data from an OpenSearch index using standard SQL syntax. |
| Query Key-Value Table | Synchronous | Explains how to query data from key-value tables and Pkey-Skey-value tables, emphasizing the requirement to include primary key conditions in WHERE clauses. |

## API Calling Patterns

### Authentication
The primary authentication method is **Bearer Token** via the `Authorization` header.

- Use the header: `Authorization: Bearer <your-api-key>`
- Store your API key in the environment variable: `OPENSEARCH_API_KEY`
- Some endpoints (e.g., SQL query syntax features) do not require explicit authentication when executed within the OpenSearch engine context, but REST API calls to `/v4/` or `/_sql` endpoints require the Bearer token.

### Service Endpoint
OpenSearch Data Query APIs use two types of endpoints:

1. **Management/Schema APIs**:  
   Base URL: `https://opensearch.api.aliyun.com/v4/openapi/...`  
   Example: `POST /v4/openapi/assist/schema/actions/merge`

2. **SQL Query APIs**:  
   Base URL: `https://{your-opensearch-domain}/_sql`  
   Replace `{your-opensearch-domain}` with your actual OpenSearch instance endpoint (e.g., `my-search.cn-hangzhou.opensearch.aliyuncs.com`)

Common regions include: `cn-hangzhou`, `cn-shanghai`, `cn-beijing`.

### Synchronous Query Pattern
All data query operations in this domain are **synchronous**:
- Send a single HTTP request (usually POST or GET)
- Receive a complete JSON or tabular response immediately
- No polling or async task tracking is required
- For SQL-based queries (e.g., SELECT, WHERE, UNION), the query is embedded directly in the request body or as a URL parameter

## Parameter Reference

### Query Merged Table

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|--------|--------|------------|-------------|
| spec | string | false | opensearch.share.common | — | The specifications of the OpenSearch instance. Used to check exclusive instance limits. |
| body | Schema | false | null | — | The request body containing `tables` and `indexes` schema definitions. |

### Lookup Record by Primary Key

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|--------|--------|------------|-------------|
| table_name | string | true | — | — | The name of the base table for which the summary table is created. |
| primary_key_field | string | true | — | — | The field used as the primary key in the summary table. |
| value | any | true | — | — | The value(s) to match in the primary key field (supports `IN` or `OR`). |
| filter_condition | string | false | — | — | Additional conditions to filter results after the primary key lookup. |

### Filter Rows by Condition

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|--------|--------|------------|-------------|
| booleanExpression | string | false | — | — | A boolean condition using operators (`>`, `<`, `IN`, `AND`, `OR`) or functions (`MATCHINDEX`, `QUERY`, `UDF`). |

### Execute SQL Query

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|--------|--------|------------|-------------|
| query | string | true | — | Must be valid SQL; max length system-dependent | A complete SQL statement to retrieve data from an OpenSearch index. |

## Code Examples

### Query Merged Table - curl - all

```bash
POST /v4/openapi/assist/schema/actions/merge HTTP/1.1
Host: opensearch.api.aliyun.com
Authorization: Bearer <your-api-key>
Content-Type: application/json

{
  "spec": "opensearch.share.common",
  "body": {
    "tables": {},
    "indexes": {}
  }
}
```

### Execute SQL Query - Python - all

```python
import requests

url = "https://your-opensearch-domain.com/_sql"
params = {
    "query": "SELECT brand, COUNT(*) FROM phone"
}
headers = {
    "Content-Type": "application/json"
}

response = requests.get(url, params=params, headers=headers)
print(response.json())
```

### Lookup Record by Primary Key - SQL - all

```sql
SELECT brand, price FROM phone_summary_ WHERE nid IN (7, 8, 9) AND price < 2000
```

### Query Child Table with UNNEST - SQL - all

```sql
SELECT
    field_int32,
    field_int32 + 1 as output,
    sub_id,
    sub_string
FROM
    simple4,
    UNNEST(simple4.sub_simple4_table)
WHERE
    field_int8 >= 2
```

### Filter with Full-Text Search - SQL - all

```sql
SELECT * FROM table WHERE QUERY(brand, "Huawei OR OPPO")
```

### Combine Results with UNION ALL - SQL - all

```sql
SELECT nid, brand, price, size FROM phone WHERE nid < 5
UNION ALL
SELECT nid, brand, price, size FROM phone WHERE nid > 5
```

### Point Lookup on Key-Value Table - SQL - all

```sql
-- Point lookup by primary key
SELECT cat_id, category_name FROM category WHERE cat_id = 2;

-- Batch lookup using IN
SELECT cat_id, category_name FROM category WHERE cat_id IN (2, 3);
```

### Conditional Logic with CASE WHEN - SQL - all

```sql
SELECT
    CASE
        WHEN warehouse_id = 48 THEN warehouse_id
        WHEN warehouse_id = 24 THEN id
        ELSE wave_status
    END AS aa
FROM s_wmp_package_wave
WHERE wave_status = 0
LIMIT 10;
```

## Response Format

```json
{
  "requestId": "ABCDEFGH",
  "result": {
    "primaryKey": "-",
    "mergeTable": {
      "test": "test",
      "test2": 1
    },
    "fromTable": {
      "test": "test",
      "test2": 1
    }
  }
}
```

**Key Fields**:
- `requestId` — Unique identifier for the request
- `result.mergeTable` — Schema and metadata of the merged (wide) table
- `result.fromTable` — Source table information used in the JOIN
- `result.primaryKey` — Primary key definition of the resulting table

For SQL query responses (e.g., via `/_sql`):

```json
{
  "took": 123,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 100,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "phone",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "brand": "Apple",
          "count": 1
        }
      }
    ]
  }
}
```

**Key Fields**:
- `took` — Query execution time in milliseconds
- `hits.total.value` — Total number of matching documents
- `hits.hits[].source` — Actual document data
- `hits.hits[].score` — Relevance score for the document

## Error Handling

| Error Code | Description | Recommended Action |
|-----------|-------------|-------------------|
| 400 | Bad Request: The request body is invalid or missing required fields. | Validate request structure and required parameters (e.g., ensure `tables` is provided for merged table queries). |
| 401 | Unauthorized: The API key is missing, invalid, or does not have sufficient permissions. | Verify that `OPENSEARCH_API_KEY` is set and has `opensearch:GenerateMergedTable` or equivalent permissions. |
| 403 | Forbidden: The user does not have permission to perform this operation. | Check RAM policy permissions for the OpenSearch service. |
| 404 | Not Found: The specified resource (e.g., table, index) does not exist. | Confirm the table or index name exists in your OpenSearch instance. |
| 500 | Internal Server Error: An unexpected error occurred on the server side. | Retry the request; if persistent, contact support with the `requestId`. |
| 503 | Service Unavailable: The service is temporarily unavailable due to overload or maintenance. | Implement exponential backoff and retry after a delay. |

### Rate Limits & Retry
- **Query Merged Table**: No explicit rate limit documented.
- **Execute SQL Query**: Up to 100 queries per second per user.
- **General Guidance**: For 429 or 503 errors, use exponential backoff (e.g., wait 1s, 2s, 4s, …) and respect the `Retry-After` header if present.

## Environment Requirements

- Set your API key: `export OPENSEARCH_API_KEY=your_api_key_here`
- For Python examples: `pip install requests`
- OpenSearch Retrieval Engine Edition requires HA3 V3.7.0 or later (V3.7.5+ for advanced features like CASE WHEN and Pkey-Skey tables)

## FAQ

Q: Do I need authentication for all SQL queries?
A: Only REST API calls to management endpoints (e.g., `/v4/...`) require a Bearer token. In-engine SQL execution (e.g., via console or direct `_sql` endpoint with proper network access) may rely on instance-level security instead.

Q: Can I use JOIN operations in OpenSearch SQL?
A: Direct `JOIN` syntax is not supported. Instead, use `UNNEST` for parent-child table relationships or pre-join data into a merged (wide) table using the GenerateMergedTable API.

Q: What’s the difference between a summary table and a key-value table?
A: A summary table supports efficient primary key lookups and is optimized for point queries. A key-value (or Pkey-Skey) table enforces primary key constraints and requires primary key conditions in WHERE clauses for batch lookups.

Q: Why am I getting a 400 error on a valid-looking SQL query?
A: Ensure your OpenSearch engine version supports the SQL features you’re using (e.g., CASE WHEN requires HA3 V3.7.5+). Also verify that all referenced fields and tables exist.

Q: How do I perform full-text search in a WHERE clause?
A: Use built-in functions like `MATCHINDEX(field, "term")` for exact inverted index lookups or `QUERY(field, "term1 OR term2")` for flexible full-text parsing.

## Pricing & Billing

### Billing Model
All operations are billed **per request**, including successful executions. Failed requests (e.g., 4xx/5xx) are generally not billed, except for primary key lookups (which bill even on empty results).

### Price Reference

| Tier | Input Price | Output Price |
|------|-------------|--------------|
| default | 0.001 / | 0.001 / |
| summary_query | 0.0001 / | 0.0001 / |
| standard | 0.0001 / | 0.0001 / |
| standard | 0.0001 / | 0.0002 / |

### Free Tier
- Query Merged Table: 1,000 free calls per month
- Primary Key Lookup: 10,000 free queries per month
- General SQL Queries: 1,000 free queries per month

### Usage Limits
- Primary Key Lookup: Max 10,000 records per query
- WHERE Clause Queries: Max 10,000 rows per query
- SQL Queries: Max 10,000 records per query

### Billing Notes
- Billing occurs per successful request. Failed requests are not billed (except summary queries).
- Long-running queries may incur additional charges based on duration.
- Queries that exceed the free tier are billed per execution.