# opensearch-model

Part of **OPENSEARCH**

# OpenSearch Model Management Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|---------|------------------|
| Invalid Request Body | `Algorithm.Model.Missing.RequestBody` | High | Provide a valid JSON request body with all required fields |
| Model Training Failed | `Algorithm.Model.TrainingHasNotSuccess` | High | Check data quality and retrigger training after fixing issues |
| Insufficient Training Data | `Algorithm.Data.NoData` or `Algorithm.Data.NotEnoughDay` | Medium | Ensure sufficient historical data is available before training |
| Duplicate Model Name | `Algorithm.Model.Duplicated` | Low | Use a unique model name or delete the existing model first |
| Model Not Found | `Algorithm.Model.NotFound` | Medium | Verify the model ID or name exists and was correctly specified |

## Problem Details

### Problem 1: Invalid Request Body

**Symptoms**
- Error message: `Algorithm.Model.Missing.RequestBody`
- Behavior: API returns 400 Bad Request immediately upon submission
- Context: Occurs during model creation, update, or prediction requests

**Root Cause**
The request body is either missing entirely, malformed, or lacks required fields. The OpenSearch Model API requires a well-formed JSON payload with specific parameters depending on the operation.

**Solution**
1. Validate that your request includes a JSON body
2. Ensure all required fields for the specific API endpoint are present (e.g., `app_group`, `model_name`, `model_type`)
3. Use a JSON validator to check syntax before sending the request

Example of a minimal valid request body for model creation:
```json
{
  "app_group": "my-app-group",
  "model_name": "recommendation-model-v1",
  "model_type": "ranking"
}
```

**Verification**
- Resubmit the request with the corrected body
- Expected response: `200 OK` with model metadata or `202 Accepted` for asynchronous operations

### Problem 2: Model Training Failed

**Symptoms**
- Error message: `Algorithm.Model.TrainingHasNotSuccess`
- Behavior: Subsequent operations (e.g., prediction) fail even after waiting
- Context: Occurs after initiating model training but before successful completion

**Root Cause**
The model training process encountered an unrecoverable error due to invalid configuration, insufficient data, or internal system issues. The model remains in a failed state until retrained successfully.

**Solution**
1. Check the training logs for detailed failure reasons (via model status API)
2. Verify that training data meets minimum requirements:
   - Sufficient number of days (`Algorithm.Data.NotEnoughDay`)
   - Adequate page views on the last day (`Algorithm.Data.NotEnoughPvAtLastDay`)
   - Valid behavioral data types (`Algorithm.Data.TooManyInvalidBhvType`)
3. Correct any data or configuration issues
4. Retrigger training using the model retrain API

**Verification**
- Poll the model status endpoint until status changes to `TRAINING_SUCCESS`
- Expected status values progression: `TRAINING` → `TRAINING_SUCCESS`

### Problem 3: Insufficient Training Data

**Symptoms**
- Error messages: 
  - `Algorithm.Data.NoData`
  - `Algorithm.Data.NotEnoughDay`
  - `Algorithm.Data.NotEnoughPvAtLastDay`
  - `Algorithm.Data.NotEnoughIpvAtLastDay`
- Behavior: Training request fails immediately or times out without progress
- Context: Attempting to train a model without adequate historical interaction data

**Root Cause**
OpenSearch Model Management enforces minimum data requirements to ensure model quality. Training cannot proceed if:
- No data is available for the specified app group
- Historical data spans fewer days than the minimum threshold
- Recent traffic volume (PVs/IPVs) is below required levels

**Solution**
1. Confirm data ingestion is working correctly for your app group
2. Wait until you have at least the minimum required days of data (typically 7+ days)
3. Ensure the previous day has sufficient user interactions:
   - Page views (PVs) above threshold
   - Item page views (IPVs) above threshold
4. Clean behavioral data to remove invalid `bhv_type` values if needed

**Verification**
- Query the data readiness endpoint for your app group
- Expected response: `"data_ready": true` and sufficient metrics in data summary

### Problem 4: Duplicate Model Name

**Symptoms**
- Error message: `Algorithm.Model.Duplicated`
- Behavior: Model creation request fails with 400 error
- Context: Attempting to create a model with a name already in use within the same app group

**Root Cause**
Model names must be unique within an app group. The system prevents accidental overwrites by rejecting duplicate names.

**Solution**
1. List existing models in the app group to confirm name collision:
```bash
GET /v1/models?app_group=my-app-group
```
2. Choose one of these options:
   - Use a new, unique model name (e.g., append version or timestamp)
   - Delete the existing model first if it's no longer needed:
```bash
DELETE /v1/models/{existing_model_id}
```

**Verification**
- Retry model creation with the new name
- Expected response: `201 Created` with new model ID

### Problem 5: Model Not Found

**Symptoms**
- Error message: `Algorithm.Model.NotFound`
- Behavior: API operations return 404 error for a specific model ID/name
- Context: Performing operations on a model that doesn't exist or was deleted

**Root Cause**
The specified model ID or name does not correspond to any active model in the system. This can occur due to typos, using a deleted model, or referencing a model in the wrong app group.

**Solution**
1. Verify the model identifier is correct (check for typos)
2. List all models in the relevant app group:
```bash
GET /v1/models?app_group=my-app-group
```
3. If the model was deleted, recreate it with the same configuration
4. Ensure you're operating within the correct app group context

**Verification**
- Confirm the model appears in the list response
- Retry the original operation with the verified model ID

## FAQ

**Q: How do I check if my model is ready for prediction?**
A: Query the model status endpoint. The model must have `status: "TRAINING_SUCCESS"` before prediction operations will succeed. Models in `TRAINING`, `TRAINING_FAILED`, or `PREDICTING_FAILED` states cannot be used for inference.

**Q: What permissions are required to manage models in OpenSearch?**
A: You need appropriate authorization to perform model operations within your app group. Ensure your Alibaba Cloud account ID is valid (`Algorithm.Model.InvalidAliyunUserId` indicates permission issues) and that you have write access to the specified app group.

**Q: How can I debug model training failures?**
A: First check for data-related error codes (`Algorithm.Data.*`). Then verify all request parameters meet validation rules (name format, cron syntax, etc.). If the error is `InternalError`, wait a few minutes and retry; if it persists, contact support with your model ID and request details.

**Q: What are the minimum data requirements for model training?**
A: Requirements vary by model type but generally include: (1) multiple days of historical data, (2) sufficient page views (PVs) on the most recent day, and (3) valid behavioral event types. Specific thresholds are enforced by the system and reflected in error messages like `Algorithm.Data.NotEnoughDay`.

**Q: Can I update a model while it's training?**
A: No. Operations on a model are blocked while it is in the `TRAINING` state (`Algorithm.Model.ModelIsTraining`). You must wait for training to complete (successfully or unsuccessfully) before making further changes.