# pai-experiment

Part of **PAI**

# Platform for AI (PAI) Experiment Management

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|--------|----------|------|
| Manage Experiments | Synchronous | Create, list, get details, stop, and delete individual experiments. |
| Manage Experiment Plans | Synchronous | Create, list, get details, update, and delete experiment plans. |
| Manage Experiment Plan Templates | Synchronous | Create, list, get details, update, and delete experiment plan templates. |
| Get Experiment Results | Synchronous | Retrieve result data from completed experiments. |
| Manage Experiment Resources | Synchronous | Control resource allocation and flow for experiments. |
| Manage Run | Synchronous | Create, delete, update, get details, or list runs within experiments. |
| Manage Experiment Label | Synchronous | Set or delete labels for experiments. |
| Manage Run Metric | Synchronous | Log or retrieve metrics associated with experiment runs. |
| Create Experiment Label | Synchronous | Create labels to categorize and track experiments. |
| Define and Create Labels | Synchronous | Define label structures and create labels for metadata entities. |
| Manage Autofe Experiments | Synchronous | Create and retrieve details of automated feature engineering experiments. |
| Configure Auto FE Experiment | Synchronous | Configure settings for automated feature engineering experiments. |
| Manage HPO Experiments | Synchronous | Create, list, get, update, stop, and delete hyperparameter optimization experiments. |
| Manage HPO Trials | Synchronous | Get, list, stop, and restart individual trials within HPO experiments. |
| Access HPO Logs and Commands | Synchronous | Retrieve logs and commands associated with HPO experiments and trials. |
| Configure HPO Experiments | Synchronous | Set up configuration parameters for hyperparameter optimization experiments. |
| Define Hyperparameter Ranges | Synchronous | Define ranges of hyperparameters for automated tuning experiments. |

## API Calling Modes

### Authentication
The primary authentication method is Bearer Token authentication.

- **Header Format**: `Authorization: Bearer <your_api_key>`
- **Environment Variable**: `DASHSCOPE_API_KEY`
- While other methods may exist in the broader Alibaba Cloud ecosystem, Bearer Token is consistently used across all PAI Experiment Management API endpoints documented here.

### Service Endpoint (Endpoint)
The APIs use region-specific endpoints following this pattern:

`https://api.aliyun.com/api/{service}/{version}` for China regions  
`https://api.alibabacloud.com/api/{service}/{version}` for international regions

Common service paths include:
- `AIWorkSpace/2021-02-04` for core experiment and run management
- `eflo-cnp/2023-08-28` for experiment plans, templates, and results
- `paiAutoML/2022-08-28` for AutoFE and HPO operations

Common regions referenced are `cn-hangzhou`, `cn-shanghai`, and `cn-beijing`.

### Synchronous API Pattern
All operations in the PAI Experiment Management domain follow a synchronous calling pattern. This means:
1. The client sends an HTTP request (GET, POST, PUT, or DELETE) to the specific endpoint with required parameters and headers.
2. The server processes the request immediately.
3. The server returns a JSON response containing either the requested data (for GET/LIST operations) or a confirmation of the operation's success/failure (for CREATE/UPDATE/DELETE operations), along with a unique `RequestId`.
4. There is no need for polling or handling asynchronous job IDs; the response is final upon receipt.

## Parameter Reference

### Manage Experiments

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| ExperimentId | string / integer | True | | | The unique identifier for the experiment. |
| WorkspaceId | string | True | | | The ID of the workspace where the experiment resides. |
| Name | string | True | | 1 to 63 characters; start with letter; contain only letters, digits, '_', '-'; case-sensitive | The name of the experiment. |
| ArtifactUri | string | False | | must be OSS path format (e.g., oss://bucket.region.aliyuncs.com/path) | The default output path of artifacts for all tasks associated with the experiment. Only OSS paths are supported. |
| Accessibility | string | False | PRIVATE | one of: PRIVATE, PUBLIC | The visibility of the experiment. PRIVATE is visible only to the creator and their Alibaba Cloud account; PUBLIC is visible to all users. |
| Verbose | boolean | False | false | one of: true, false | Specifies whether to retrieve the latest run information for the experiment. |
| ResourceGroupId | string | False | | | The ID of the resource group. |
| Order | integer | False | | | The order for listing experiments. |
| PageNumber | integer | False | | | The page number for pagination (starts from 1). |
| PageSize | integer | False | | | The number of entries per page. |

### Manage Experiment Plans

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| PlanId | integer | True | | | The ID of the experiment plan. |
| TemplateId | integer | True | | | The ID of the template used to create the plan. |
| ResourceId | integer | True | | | The ID of the resource allocated to the plan. |
| ExternalParams | object | False | | | Additional parameters for the plan. |
| ResourceGroupId | string | False | | | The ID of the resource group. |
| Tag | array<object> | False | | | Resource tags for the plan. |
| PlanTemplateName | string | False | | | The name of the plan template. |
| CreatTimeOrder | string | False | desc | one of: asc\|desc | The sort order for the creation time. |
| StartTimeOrder | string | False | desc | one of: asc\|desc | The sort order for the start time. |
| EndTimeOrder | string | False | desc | one of: asc\|desc | The sort order for the end time. |
| Page | integer | False | 1 | min 1 | The page number. |
| Size | integer | False | 100 | max 100 | The number of entries per page. |

### Manage HPO Experiments

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| ExperimentId | string | True | | | The ID of the HPO experiment. |
| Name | string | False | | | The name of the experiment. |
| WorkspaceId | string | False | | | The workspace ID. |
| Description | string | False | | | The description of the experiment. |
| Accessibility | string | False | | one of: PUBLIC, PRIVATE | The visibility of the code configuration. |
| Tag | array<object> | False | | | Tag information for the experiment. |
| HpoExperimentConfiguration | HpoExperimentConfig | False | | | The parameter struct of the HPO experiment configuration. |
| MinCreateTime | string | False | | Format: yyyy-mm-dd hh:mm:ss or yyyy-mm-dd | The earliest experiment creation time. |
| MaxCreateTime | string | False | | Format: yyyy-mm-dd hh:mm:ss or yyyy-mm-dd | The latest experiment creation time. |
| Status | string | False | | One of: CREATED, RUNNING, FINISHED, FAILED, EARLY_STOPPED, USER_CANCELED, SYS_CANCELED, WAITING, NO_MORE_TRIAL, UNKNOWN | The status of the experiment. |
| IncludeConfigData | string | False | | One of: True, False | Specifies whether to include experiment configuration data in the response. |
| SortBy | string | False | | | The filter used to sort experiments. |
| Order | string | False | DESC | One of: ASC, DESC | The sorting order. |

### Manage Run Metric

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| RunId | string | True | | | The ID of the run. |
| Key | string | True | | | The key of the metric for the run. |
| PageToken | integer | False | 0 | range 0 to max | The paging token. |
| MaxResults | integer | False | 10 | max 100 | The maximum number of results to return. |
| Metrics | array | False | | | The list of metrics to log. |

### Configure Auto FE Experiment

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| data_type | string | False | | odps, oss | The type of the data source. |
| action | string | False | | fs_train, train, analyze, pipeline, selection, transform | The operations of the experiment. |
| iv_thresh | string | False | 0.02 | | The information value threshold for feature selection. |
| aggregate_only | string | False | | true / false | Specifies whether to perform only aggregate data analysis. |
| debug_mode | string | False | | | Specifies whether to enable the debug mode. |
| workers | string | False | | | The number of workers to be used in AutoFE. |
| memory | string | False | | | The memory usage of each worker. |
| cpu | string | False | | | The CPU usage of each worker. |
| label | string | False | | | The name of the label column of the input data. |
| data_source | string | False | | | The name of the data source. |
| exclude_columns | string | False | | | The name of the column to be ignored (e.g., ID columns). |
| sample_size | string | False | | | The number of data samples (for large datasets). |
| sample_ratio | string | False | | | The ratio of data samples (for large datasets). |

### Define Hyperparameter Ranges

| Parameter | Type | Required | Default | Constraints | Description |
|------|------|------|--------|------|------|
| Enum | array | False | | | Hyperparameter enumeration list. |
| MinLength | integer | False | | | Minimum length for string hyperparameters. |
| MaxLength | integer | False | | | Maximum length for string hyperparameters. |
| Minimum | string | False | | | Minimum value for numeric hyperparameters. |
| Maximum | string | False | | | Maximum value for numeric hyperparameters. |
| ExclusiveMinimum | boolean | False | | | Whether the minimum value is exclusive. |
| ExclusiveMaximum | boolean | False | | | Whether the maximum value is exclusive. |
| Pattern | string | False | | | Regular expression for string hyperparameters. |

## Code Examples

### Create an Experiment - Python - All Regions

```python
import requests

url = "https://api.aliyun.com/api/AIWorkSpace/2021-02-04/CreateExperiment"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "WorkspaceId": "478**",
    "Name": "exp-test",
    "ArtifactUri": "oss://test-bucket.oss-cn-hangzhou.aliyuncs.com/test",
    "Accessibility": "PRIVATE"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())
```

### List Experiments with Filtering - curl - All Regions

```bash
curl -X GET 'https://api.aliyun.com/api/AIWorkSpace/2021-02-04/ListExperiment?Name=exp-test&WorkspaceId=1517**&Labels=is_evaluation:true&SortBy=GmtCreateTime&Order=DESC&PageNumber=1&PageSize=10' \
-H 'Authorization: Bearer $DASHSCOPE_API_KEY' \
-H 'Content-Type: application/json'
```

### Log Metrics for a Run - curl - All Regions

```bash
curl -X POST https://api.aliyun.com/api/AIWorkSpace/2021-02-04/runs/run-1qJhzJ2YXgX****/metrics/action/log \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
  "Metrics": [
    {
      "name": "accuracy",
      "value": 0.95
    }
  ]
}'
```

### Create an HPO Experiment - curl - All Regions

```bash
POST /api/automl/v1/autofe/experiment HTTP/1.1
Host: api.alibabacloud.com
Content-Type: application/json
Authorization: Bearer <your-api-key>

{
  "Name": "my experiment x",
  "Description": "This is an AutoFE experiment.",
  "Accessibility": "PUBLIC",
  "WorkspaceId": "283301",
  "Tag": [
    {
      "Key": "group",
      "Value": "group_name"
    }
  ]
}
```

### Retrieve HPO Trial Logs - Python - All Regions

```python
import requests

url = "https://api.alibabacloud.com/api/paiAutoML/2022-08-28/ListHpoTrialLogs"
params = {
    "ExperimentId": "abcde",
    "TrialId": "asdf",
    "LogName": "trial.log",
    "PageNumber": 1,
    "PageSize": 10
}
headers = {
    "Authorization": "Bearer $DASHSCOPE_API_KEY",
    "Content-Type": "application/json"
}

response = requests.get(url, params=params, headers=headers)
print(response.json())
```

### Delete an Experiment - Python - All Regions

```python
import requests

url = "https://api.aliyun.com/api/AIWorkSpace/2021-02-04/experiments/exp-1zpfthdx******"
headers = {
    "Authorization": "Bearer <your-api-key>",
    "Content-Type": "application/json"
}

response = requests.delete(url, headers=headers)
print(response.json())
```

### Update Experiment Labels - Python - All Regions

```python
import requests

url = "https://api.aliyun.com/api/AIWorkSpace/2021-02-04/experiments/exp-1zpfthdx******//labels"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}
data = {
    "Labels": [
        {"Key": "project", "Value": "ml-training"}
    ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())
```

### Stop an HPO Experiment - curl - All Regions

```bash
PUT /api/automl/v1/hpo/experiment/{ExperimentId}/stop HTTP/1.1
Host: api.alibabacloud.com
Authorization: Bearer <your-api-key>
Content-Type: application/json

{}
```

## Response Format

```json
{
  "Data": {
    "Task": {
      "TaskId": 167420,
      "CreateTime": 1710000000,
      "UpdateTime": 1710000000,
      "StartTime": 1710000000,
      "EndTime": 1710003600,
      "Params": {},
      "Scene": "baseline",
      "Status": "success"
    },
    "Workload": {
      "WorkloadId": 13,
      "WorkloadName": "test",
      "WorkloadDescription": "test",
      "WorkloadType": "AI",
      "Family": "AI",
      "Scene": "NLP-LLM",
      "Scope": "common",
      "JobKind": "PyTorchJob",
      "DefaultCpuPerWorker": 90,
      "DefaultGpuPerWorker": 8,
      "DefaultMemoryPerWorker": 500,
      "DefaultShareMemory": 500,
      "ParamSettings": [
        {
          "ParamName": "ITERATION",
          "ParamDesc": "number",
          "ParamValue": "100",
          "DefaultValue": "100",
          "ParamRegex": "[0-9]+",
          "ParamType": "number"
        }
      ],
      "StaticConfig": {
        "FrameWork": "pyTorch",
        "SoftwareStack": "python",
        "Os": "linux",
        "Parameters": "7B"
      },
      "VersionId": 1
    },
    "Resource": {
      "ResourceId": 189,
      "ResourceName": "cluster-abc"
    }
  },
  "RequestId": "E67E2E4C-2B47-5C55-AA17-1D771E070AEF",
  "AccessDeniedDetail": "{}",
  "TotalCount": 0
}
```

**Key Fields**:
- `Data.Task.TaskId` — The unique identifier for the task associated with the experiment.
- `Data.Task.Status` — The current status of the experiment task (e.g., "success", "running").
- `Data.Workload.WorkloadName` — The name of the workload configuration used for the experiment.
- `Data.Resource.ResourceName` — The name of the computing resource allocated to the experiment.
- `Data.Results.SamplesPerSecond` — Performance metric indicating throughput.
- `Data.Results.Mfu` — Model FLOPs utilization, a measure of hardware efficiency.
- `Data.Results.WarningWorker[].Hostname` — Hostnames of nodes reporting warnings during execution.
- `Data.Results.ErrorWorker[].Hostname` — Hostnames of nodes reporting errors during execution.
- `RequestId` — A unique identifier for the API request, useful for troubleshooting.

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|---------------|--------------------|-----------------------------|
| 400 | Bad Request - The request parameters are invalid or missing. | Validate all request parameters against the API specification. Check for correct data types, required fields, and format constraints (e.g., name format, OSS URI). |
| 401 | Unauthorized - Authentication failed. Check your API key or credentials. | Ensure the `Authorization: Bearer <your_api_key>` header is included and that the API key is valid and active. Verify the `DASHSCOPE_API_KEY` environment variable is set correctly. |
| 403 | Forbidden - The user does not have sufficient permissions to access the requested resource. | Check your RAM (Resource Access Management) policies to ensure your account has the necessary permissions (e.g., `paiexperiment:CreateExperiment`, `paiautoml:ListHpoExperiments`) for the requested operation and resources. |
| 404 | Not Found - The specified resource (e.g., experiment, plan, workspace) does not exist. | Verify that the provided IDs (ExperimentId, PlanId, WorkspaceId, etc.) are correct and that the resource has not been deleted. |
| 429 | Too Many Requests - Rate limit exceeded. Wait before retrying. | Implement exponential backoff in your client code. Reduce the frequency of your requests. |
| 500 | Internal Server Error - An unexpected error occurred on the server side. | Retry the request after a short delay. If the problem persists, contact Alibaba Cloud support with the `RequestId` from the error response. |
| 503 | Service Unavailable - The service is temporarily unavailable due to overload or maintenance. | Wait for a period of time and then retry the request. |

### Rate Limits & Retry
The system enforces rate limits to ensure service stability. Common limits observed are 100 QPS (Queries Per Second) per user/account for many endpoints, while some specific operations may have lower limits (e.g., 10 QPS).

- **Retry Strategy**: For `429` errors, implement an exponential backoff strategy. Start with a 1-second delay, then double the delay for each subsequent retry (e.g., 1s, 2s, 4s, 8s), up to a maximum of 30 seconds.
- **Handling `Retry-After`**: While not explicitly mentioned in all docs, it's good practice to check for a `Retry-After` header in `429` responses and respect its value if present.

## Environment Requirements

- **Authentication Setup**: Export your API key as an environment variable: `export DASHSCOPE_API_KEY=your_api_key_here`.
- **HTTP Client**: Any standard HTTP client library can be used (e.g., Python `requests`, Java `HttpClient`, `curl`).
- **Python Version**: No specific version is mandated, but a modern version (3.6+) is recommended for compatibility with common HTTP libraries.

## FAQ

Q: How do I authenticate my API requests to the PAI Experiment Management service?
A: You must include an `Authorization: Bearer <your_api_key>` header in every request. Your API key can be managed in the Alibaba Cloud console, and it's recommended to store it in the `DASHSCOPE_API_KEY` environment variable.

Q: What is the difference between an Experiment, an Experiment Plan, and an Experiment Plan Template?
A: An **Experiment Plan Template** is a reusable blueprint defining a sequence of workloads and their configurations. An **Experiment Plan** is a concrete instance created from a template, bound to specific resources. An **Experiment** is a higher-level container for tracking runs, metrics, and artifacts, often used in conjunction with training jobs and can be linked to plans.

Q: How can I track the performance of my machine learning runs?
A: Use the `Manage Run Metric` capability. You can log custom metrics (like accuracy or loss) during your run using the `LogRunMetrics` API, and later retrieve these metrics with `ListRunMetrics` for analysis and visualization.

Q: Are there any free tiers or quotas for these APIs?
A: Yes, many APIs offer a free tier (e.g., 100 or 1000 free requests per month) before standard per-request pricing applies. Additionally, there are rate limits (e.g., 100 QPS) to prevent abuse. Check the specific pricing details for each API operation.

Q: Can I automate hyperparameter tuning with these APIs?
A: Absolutely. The `Manage HPO Experiments` and `Manage HPO Trials` capabilities allow you to fully automate hyperparameter optimization. You can define a search space, launch an HPO experiment, monitor trial progress, access logs, and retrieve the best-performing hyperparameters programmatically.

## Pricing & Billing

### Billing Model
The PAI Experiment Management API uses a **per-request** billing model. Each successful API call (e.g., `CreateExperiment`, `ListHpoTrials`, `LogRunMetrics`) is counted as one request and billed accordingly. Failed requests may also be counted depending on the specific API.

### Price Reference

| Tier / Model | Input Price | Output Price | Other Fees |
|-----------|---------|---------|---------|
| CreateExperiment | 0.001 / | |
| ListExperiment | 0.001 / | 0.001 / |
| GetExperiment | 0.0001 / | 0.0001 / |
| CreateRun | 0.001 / | 0.001 / |
| LogRunMetrics | 0.0001 / | 0.0001 / |
| CreateHpoExperiment | 0.001 / | 0.002 / |
| hpo_experiment | 0.002 / | 0.002 / |

### Free Tier
Most APIs include a monthly free tier:
- `CreateExperiment`: 100 free calls/month
- `ListExperiment`: 1000 free requests/month
- `GetExperiment`: 1000 free requests/month
- `CreateRun`: 100 free requests/month
- `LogRunMetrics`: 1000 free calls/month
- `CreateHpoExperiment`: 100 free calls/month
- HPO Experiment Configuration: 100 free calls/month

### Usage Limits
- **Rate Limits**: Commonly 100 QPS per user/account, with some operations limited to 10 QPS.
- **Pagination**: List operations often cap the number of results per page (e.g., max 100).
- **HPO Specific**: HPO experiments may have limits on the maximum number of trials (e.g., 100 trials per experiment).

### Billing Notes
- Billing occurs at the end of the month based on total usage.
- Each API call counts as one request, regardless of the amount of data returned.
- For HPO and AutoFE, the API call to create the experiment is billed separately from the underlying compute resources used to run the trials or feature engineering jobs.