# opensearch-text_and_query_analysis

Part of **OPENSEARCH**

# OpenSearch Text and Query Analysis

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|--------|----------|------|
| Manage Text Analyzers | Synchronous | Create, list, and describe custom text analyzers for search. |
| Count Query Terms After Analysis | Synchronous | Count the number of terms in a query after text analysis processing. |
| Create Custom Analyzer | Synchronous | Define custom text analyzers for specialized text processing needs. |
| Define Custom Analyzer Intervention | Synchronous | Configure intervention rules for custom analyzers using UserAnalyzerEntry. |
| Manage User Dictionary | Synchronous | Create and maintain custom dictionaries for text analysis. |
| Manage Intervention Dictionary Entry | Synchronous | Handle individual entries in intervention dictionaries for text processing. |
| Add Custom Analyzer Entry | Synchronous | Add entries to custom text analyzers. |
| Manage Custom Dictionary | Synchronous | Handle custom dictionaries for text analysis. |
| Test Text Analysis Effect | Synchronous | Preview and test the results of text analysis configurations. |
| Configure Field Analyzer | Synchronous | Set analyzers for specific fields in the index schema. |
| Manage Custom Analyzers | Synchronous | Create, describe, list, and remove custom text analyzers for search indexing. |
| Manage Analyzer Entries | Synchronous | List and modify entries within custom analyzers. |
| Manage Intervention Dictionaries | Synchronous | Create, describe, list, and remove intervention dictionaries for search query modification. |
| Query NER Results | Synchronous | Retrieve named entity recognition results from intervention dictionaries. |
| Query Term Weight Analysis | Synchronous | Analyze term weighting results from intervention dictionaries. |
| Modify Intervention Entries | Synchronous | Add or remove entries in intervention dictionaries. |
| List Dictionary Related Entities | Synchronous | Retrieve related entities from intervention dictionaries. |
| List Dictionary Entries | Synchronous | Retrieve entries from intervention dictionaries. |
| Manage Query Processor | Synchronous | Create, describe, modify, list, or remove query analysis rules and processors. |
| Configure Query Analysis Rule | Synchronous | Set up query analysis rules using QueryProcessor data structure. |
| Manage Query Processors | Synchronous | Create, describe, list, modify, and remove query processors for search query analysis. |
| Get Entity Priority Settings | Synchronous | Retrieve named entity recognition priority settings from query processors. |
| Start Slow Query Analysis | Synchronous | Initiate analysis of slow-performing search queries. |
| Analyze Query | Synchronous | Process search queries to understand intent and extract key components. |
| Get Query Analysis | Synchronous | Performs linguistic and semantic analysis on a search query to extract structured features. |
| Create Intervention Dictionary | Synchronous | Set up dictionaries for search query interventions. |
| Configure Search Algorithm Modules | Synchronous | Set up and manage search algorithm parameters and modules. |
| Get Top Searches and Hints | Synchronous | Retrieve top search terms and hints for optimization. |
| Configure Rough Sort Expression | Synchronous | Set up first-stage ranking expressions for search results. |
| Configure Intervention Dictionary | Synchronous | Set up and manage intervention dictionaries for search optimization. |
| Manage Slow Query Analysis | Synchronous | Enable, disable, start analysis, or retrieve status and details of slow queries. |
| Optimize Query Latency | Synchronous | Use searcher_cache clause to reduce query response time. |
| Optimize Query Execution | Synchronous | Apply SQL hints to improve query performance. |
| Enable Query Result Caching | Synchronous | Configure caching to store and reuse query results. |
| Manage Slow Query Optimization | Synchronous | Disable, list categories, and retrieve slow query information for performance tuning. |
| Query Statistical Logs | Synchronous | Retrieve statistical log data for search operations. |
| List Statistic Report | Synchronous | Access statistical reports for search analytics. |

## API Calling Patterns

### Authentication
The primary authentication method is Bearer Token authentication.

- **Header format**: `Authorization: Bearer <your_api_key>`
- **Environment variable**: `DASHSCOPE_API_KEY`
- While other authentication methods may exist, Bearer Token is the recommended approach for all Text and Query Analysis APIs.

### Service Endpoint (Endpoint)
OpenSearch Text and Query Analysis APIs use region-specific endpoints:

- **Pattern**: `https://opensearch.{region}.aliyuncs.com`
- **Common regions**: 
  - `cn-hangzhou` (China Hangzhou)
  - `cn-shanghai` (US West)
  - `ap-southeast-1` (Singapore)

Some APIs may use global endpoints like `https://api.aliyun.com/v4/openapi/` or `https://openapi.aliyun.com/v4/openapi/`.

### Synchronous API Pattern
All Text and Query Analysis APIs follow a synchronous calling pattern:

1. **Make a direct HTTP request** (GET, POST, PUT, or DELETE) to the appropriate endpoint
2. **Include required headers**:
   - `Authorization: Bearer $DASHSCOPE_API_KEY`
   - `Content-Type: application/json` (for POST/PUT requests)
3. **Provide parameters** as query parameters (GET) or in the request body (POST/PUT)
4. **Receive immediate JSON response** with results or status information
5. **Handle errors** based on HTTP status codes and error response bodies

No polling or async task handling is required since all operations complete immediately.

## Parameter Reference

### Manage Text Analyzers

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| name | string | false | | | The name of the analyzer. |
| business | string | false | | | The name of the basic analyzer. |
| businessType | string | false | | one of: AUTO, MODEL, SYSTEM, USER | The type of the basic analyzer. |
| type | string | false | | one of: HA3, ES | The engine type. |
| businessAppGroupId | string | false | | | The application ID for the custom model-based analyzer. |
| dryRun | boolean | false | false | true or false | Specifies whether to perform a dry run. |

### Create Custom Analyzer

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| id | Integer | false | | | The ID of the custom analyzer. |
| name | String | true | | | The name of the custom analyzer. |
| business | String | true | | one of: chn_standard, chn_scene_name, chn_ecommerce, chn_it_content, en_min, th_standard, th_ecommerce, vn_standard, chn_community_it, chn_ecommerce_general, chn_esports_general, chn_edu_question | The built-in analyzer on which the custom analyzer is based. |
| dicts[] | Object | false | | | The custom dictionaries for analysis. |
| available | Boolean | false | true | | Specifies whether the custom analyzer is available. |
| created | Integer | false | | | The timestamp when the custom analyzer was created. Unit: seconds. |
| updated | Integer | false | | | The timestamp when the custom analyzer was last updated. Unit: seconds. |

### Define Custom Analyzer Intervention

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| cmd | string | true | | one of: add, delete | The action to perform on the intervention entry. |
| key | string | true | | | The search query to segment. |
| value | string | true | | | The analysis result for the search query. |
| status | string | false | ACTIVE | one of: ACTIVE, PENDING_ACTIVE | The status of the intervention entry. |
| splitEnabled | boolean | false | true | | Specifies whether to further segment the tokens generated after the search query is segmented. |
| created | integer | false | | | The Unix timestamp (in seconds) when the intervention entry was created. |
| updated | integer | false | | | The Unix timestamp (in seconds) when the intervention entry was last updated. |

### Manage Intervention Dictionary Entry

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| cmd | string | true | | one of: add, delete | The operation to perform. |
| word | string | true | | | The intervention entry (the query term to match). |
| status | string | false | | | The state of the intervention entry. ACTIVE means the entry is in effect. |
| created | integer | false | | | Unix timestamp when the entry was created. |
| updated | integer | false | | | Unix timestamp when the entry was last updated. |
| stopword | boolean | true | | | The intervention action. true adds the term as a stop word. false prevents the term from being treated as a stop word. |
| alias | array | false | | | The synonyms to add for the entry. |
| antiAlias | array | false | | | The synonyms to block for the entry. |
| correction | string | true | | | The corrected query to use in place of the matched term. |
| enabled | boolean | true | | | The intervention action. true applies the correction. false prevents this correction from taking effect. |
| relevance | object | true | | | Key-value pairs mapping category IDs to relevance scores. |
| tokens[] | object | true | | | The list of terms and their weights for this entry. |
| tokens[].token | string | true | | | The term. |
| tokens[].weight | integer | true | | range 1-7 | The weight of the term. 7 = high, 4 = medium, 1 = low. |
| tokens[].tag | string | true | | one of: brand, category, material, element, style, color, function, scenario, people, season, model, region, name, adjective, category-modifier, size, quality, suit, new-release, series, marketing, entertainment, organization, movie, game, number, unit, common, new-word, proper-noun, symbol, prefix, suffix, gift, negative, agent | The internal name of the entity type. |
| tokens[].order | integer | true | | range 1-999 | The position of the entity in the entry, starting from 1. |
| matchType | integer | false | 0 | one of: 0, 1, 2 | Determines when the intervention applies. |
| rank | integer | true | | range 1-10 | The position in the top search results. |
| expirationTime | integer | false | | | Unix timestamp when the entry expires. Unit: seconds. |

### Analyze Query

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| query | String | true | | | This request content. |
| history | List<Message> | false | | max 8MB request body | Historical messages. |
| functions | List<Function> | false | | | Enabled features and corresponding parameters. |
| functions[].name | String | false | | | If function is set, name must be set. |
| functions[].parameters.enable | boolean | false | | | Whether to enable this feature. |

## Code Examples

### Create Custom Analyzer - curl - all

```bash
curl -X POST https://api.alibabacloud.com/v4/openapi/user-analyzers \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "name": "jmbon_analyzer",
  "business": "Chinese-General analyzer",
  "businessType": "AUTO",
  "type": "HA3",
  "businessAppGroupId": "110123123",
  "dryRun": false
}'
```

### Describe User Analyzer - python - all

```python
import requests

url = "https://opensearch.cn-hangzhou.aliyuncs.com/v4/openapi/user-analyzers/kevin_test"
params = {
    "with": "all"
}
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY",
    "Content-Type": "application/json"
}

response = requests.get(url, params=params, headers=headers)
print(response.json())
```

### Test Text Analysis Effect - python - all

```python
import requests

url = "https://openapi.aliyun.com/v4/openapi/analyze"
params = {
    "text": "abcde",
    "tokenizer": "chn_standard",
    "dict": "segment:584,segment:585"
}
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY"
}

response = requests.get(url, params=params, headers=headers)
print(response.json())
```

### Query NER Results - curl - all

```bash
curl -X GET 'https://your-endpoint/v4/openapi/intervention-dictionaries/ner_dict11/ner-results?query=aaaa' \
-H 'Authorization: Bearer YOUR_API_KEY'
```

### Get Query Analysis - python - all

```python
from alibabacloud_tea_openapi.models import Config
from alibabacloud_searchplat20240529.client import Client
from alibabacloud_searchplat20240529.models import GetQueryAnalysisRequest, GetQueryAnalysisRequestHistory

if __name__ == '__main__':
    config = Config(bearer_token="替换为您的API-KEY",
                    # endpoint: 配置统一的请求入口 需要去掉http://
                    endpoint="替换API访问地址",
                    # 支持 protocol 配置 HTTPS/HTTP
                    protocol="http")
    client = Client(config=config)

    # --------------- 请求体参数 ---------------
    history = [
        GetQueryAnalysisRequestHistory(content="中国的首都在哪", role="user"),
        GetQueryAnalysisRequestHistory(content="北京", role="assistant")
    ]
    functions = [
        GetQueryAnalysisRequestFunctions(name = "intent", parameters = {"enable":True}),
        GetQueryAnalysisRequestFunctions(name = "similar_query", parameters = {"enable":True}),
        
        GetQueryAnalysisRequestFunctions(name = "nl2sql", parameters = {"enable":True,"config_name":""}),
    ]
    request = GetQueryAnalysisRequest(history=history, query="有多少人口？",functions=functions)

    # default：替换工作空间名称, ops-query-analyze-001: 服务id
    response = client.get_query_analysis("default", "ops-query-analyze-001", request)
    print(response)
```

### Get Top Searches and Hints - php - all

```php
require_once("../OpenSearch/Autoloader/Autoloader.php");

use OpenSearch\Client\OpenSearchClient;

// Read credentials from environment variables.
$accessKeyId = getenv('ALIBABA_CLOUD_ACCESS_KEY_ID');
$secret      = getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET');
$endPoint    = '<endpoint>';
$appName     = '<app-name>';
$modelName   = '<model-name>'; // Optional.

$options = array('debug' => true);
$client  = new OpenSearchClient($accessKeyId, $secret, $endPoint, $options);

$uri = "/apps/{$appName}/actions/hot";

$params = [];
$params['hit']        = 10;
$params['sort_type']  = 'default';
$params['user_id']    = '1231453';
$params['model_name'] = $modelName; // Optional. Specify the model name of the top searches or hints.

try {
    $ret = $client->get($uri, $params);
    print_r(json_decode($ret->result, true));
} catch (\Throwable $e) {
    print_r($e);
}
```

### Manage Slow Query Analysis - python - all

```python
import requests

url = "https://opensearch.cn-hangzhou.aliyuncs.com/v4/openapi/app-groups/{appGroupIdentity}/optimizers/slow-query/actions/run"
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY",
    "Content-Type": "application/json"
}
response = requests.post(url, headers=headers, json={})
print(response.json())
```

### Query Statistical Logs - python - all

```python
import requests

url = "https://opensearch.cn-hangzhou.aliyuncs.com/v4/openapi/app-groups/my_app/statistic-logs/hot"
params = {
    "columns": "wordsTopPv",
    "startTime": 1591459200
}
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY"
}

response = requests.get(url, params=params, headers=headers)
print(response.json())
```

## Response Format

```json
{
  "result": {},
  "RequestId": "98724351-D6B2-5D8A-B089-7FFD1821A7E9"
}
```

**Key Fields**:
- `result` — Contains the main response data specific to the API operation
- `RequestId` — Unique identifier for the request, useful for debugging and support

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|---------------|--------------------|-----------------------------|
| InvalidParameter | One or more parameters are invalid. Check the parameter values and ensure they match the required format. | Validate all input parameters against the API specification and correct any formatting issues. |
| Unauthorized | The user does not have sufficient permissions to perform this operation. Verify RAM policy and AccessKey configuration. | Check your RAM policies and ensure your API key has the necessary permissions for the requested operation. |
| InternalError | An internal server error occurred. Retry the request after a short delay. | Implement exponential backoff and retry logic with a short initial delay (1-2 seconds). |
| 400 | Bad Request - The request is malformed or contains invalid parameters. | Review the request format and ensure all required parameters are provided with correct values. |
| 401 | Unauthorized - Authentication failed or missing API key. | Verify that your API key is valid and properly included in the Authorization header. |
| 403 | Forbidden - The user does not have sufficient permissions to access the resource. | Check your account permissions and ensure you have access to the specified resource. |
| 404 | Not Found - The specified analyzer does not exist. | Verify the resource name/ID exists and is spelled correctly in your request. |
| 429 | Too Many Requests – Rate limit exceeded. Wait before retrying. | Implement rate limiting in your client and respect the service's QPS limits. |
| 500 | Internal Server Error - An unexpected error occurred on the server side. | Retry the request after a short delay; if persistent, contact support with your RequestId. |

### Rate Limits & Retry
- Most APIs have a rate limit of 100 QPS per account/user
- Some APIs have lower limits (e.g., 10 QPS for query analysis)
- Implement exponential backoff for retries: start with 1-second delay, double each attempt up to 30 seconds
- For 429 errors, check if the response includes a `Retry-After` header and respect its value

## Environment Requirements

- **Environment variable setup**: `export DASHSCOPE_API_KEY=your_key`
- **Python SDK**: `alibabacloud_tea_openapi>=1.0.0`, `alibabacloud_searchplat20240529>=1.0.0`
- **PHP SDK**: `open-search-php-sdk>=1.0.0`

## FAQ

Q: How do I authenticate my API requests to OpenSearch Text and Query Analysis services?
A: Use Bearer Token authentication by including the header `Authorization: Bearer $DASHSCOPE_API_KEY` in all your requests, where `DASHSCOPE_API_KEY` is your API key stored as an environment variable.

Q: What's the difference between custom analyzers and intervention dictionaries?
A: Custom analyzers define how text should be processed (tokenized, normalized, etc.) for indexing and searching, while intervention dictionaries provide specific rules for modifying query behavior (synonyms, stopwords, spelling corrections, NER rules, etc.).

Q: How can I test my text analysis configuration before deploying it?
A: Use the `ListAnalyzerResults` API (Test Text Analysis Effect) to preview how your analyzer and dictionaries will process specific text inputs without affecting your production configuration.

Q: What regions are supported for OpenSearch Text and Query Analysis APIs?
A: Common regions include `cn-hangzhou` (China Hangzhou), `cn-shanghai` (US West), and `ap-southeast-1` (Singapore). Use the endpoint pattern `https://opensearch.{region}.aliyuncs.com` with your specific region.

Q: How are API calls billed and what are the rate limits?
A: Most APIs follow a per-request billing model with free tiers (typically 1000-10000 requests/month). Rate limits are usually 100 QPS per account, though some services like query analysis may have lower limits (10 QPS).

## Pricing & Billing

### Billing Model
Per-request billing model where each API call counts as one request regardless of response size or complexity.

### Price Reference

| Model/Specification | Input Price | Output Price | Other Fees |
|-----------|---------|---------|---------|
| default | 0.0001 / | 0.0001 / |
| standard | 0.0001 / | 0.0002 / |

### Free Tier
Monthly free quotas ranging from 100 to 10,000 requests depending on the specific API:
- Most management APIs: 1000 requests/month
- Query analysis APIs: 1000-10,000 requests/month  
- Slow query analysis: 100 requests/month

### Usage Limits
- Rate limits: Typically 100 QPS per account, with some services limited to 10 QPS
- Request size limits: Usually 8KB-8MB maximum request body size
- Pagination limits: Up to 100 items per page for list operations

### Billing Notes
- Dry run requests are typically free but still count toward quota limits
- Failed requests are usually billed the same as successful requests
- Free tier resets monthly and unused quota does not roll over
- Bulk operations are charged per request, not per individual item processed