# pai-dataset

Part of **PAI**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Manage and process training datasets](../../intent/pai-manage-data/SKILL.md). If you're unsure which path to take, check the routing skill first.

# Platform for AI (PAI) Dataset Acceleration

## Capabilities Overview

| Sub-capability | Calling Mode | Description |
|----------------|--------------|-------------|
| Describe Endpoint | Synchronous | Queries the details of a specified endpoint for dataset acceleration, including metadata such as creation time, status, network configuration, and ownership information. |
| Unbind Endpoint | Synchronous | Removes the association between mount points and dataset acceleration slots in the Platform for AI (PAI) service. |
| Update Slot | Synchronous | Updates the information for a dataset acceleration slot, including name, description, tags, storage type, storage URI, capacity, and lifecycle. |
| Slot Lifecycle | Synchronous | Configures the lifecycle management policy for a dataset acceleration slot (e.g., KeepAlive, RelativeTime). |
| Slot Status | Synchronous | Provides detailed status information about a dataset acceleration slot, including loaded file count and size. |

## API Calling Patterns

### Authentication
The primary authentication method is **Bearer Token** using an API key.

- Include the header: `Authorization: Bearer $DASHSCOPE_API_KEY`
- Set the environment variable: `DASHSCOPE_API_KEY=your_api_key_here`
- Some endpoints may accept AccessKey ID/Secret in the format `Authorization: Bearer $ACCESS_KEY_ID:$ACCESS_KEY_SECRET`, but the Bearer token with `DASHSCOPE_API_KEY` is recommended for consistency.

### Service Endpoint
APIs use region-specific base URLs:

- China regions: `https://api.aliyun.com/api/PAIElasticDatasetAccelerator/2022-08-01/...`
- International regions: `https://api.alibabacloud.com/api/PAIElasticDatasetAccelerator/2022-08-01/...`

Common regions include `cn-hangzhou`, `cn-shanghai`, and `ap-southeast-1`. The exact endpoint path varies by operation (see code examples).

### Synchronous Request Pattern
All operations in this domain follow a **Synchronous** calling pattern:
1. Send an HTTP request (GET, PUT, or DELETE) to the specific resource endpoint
2. Include required headers (`Authorization`, `Content-Type` for PUT)
3. Receive a JSON response immediately (no polling or async handling needed)
4. Parse the response body for results or error details

## Parameter Reference

### Describe Endpoint

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| EndpointId | string | Yes | — | — | The ID of the dataset acceleration slot mount target. To get a mount target ID, see ListEndpoints. |

### Unbind Endpoint

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| EndpointId | string | Yes | — | — | The ID of the dataset acceleration slot mount target. For information about how to obtain the mount target ID, see ListEndpoints. |
| SlotId | string | Yes | — | — | The ID of the dataset acceleration slot. For information about how to obtain the dataset acceleration slot ID, see ListSlots. |

### Update Slot

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| SlotId | string | Yes | — | — | The slot ID. For more information, see ListSlots. |
| Name | string | No | — | max 64 characters | The name of the slot. |
| Description | string | No | — | max 1024 characters | The description of the slot. |
| Tags | array<object> | No | — | — | The custom tags for the slot. |
| Key | string | No | — | max 64 characters | The key of the tag. |
| Value | string | No | — | max 64 characters | The value of the tag. |
| StorageType | string | Yes | — | one of: OSS, NAS | The data storage type for the slot. Valid values: OSS: OSS file storage. NAS: NAS file storage. |
| StorageUri | string | Yes | — | — | The resource identifier for the data in the slot. The format varies based on the data type. |
| Capacity | string | Yes | — | format: quantity (e.g., 30.0G) | The maximum capacity of the slot. The string must be in the format specified in Quantity. |
| LifeCycle | SlotLifeCycle | No | — | — | The lifecycle of the slot. |

### Slot Lifecycle

| Parameter | Type | Required | Default | Constraints | Description |
|----------|------|----------|---------|-------------|-------------|
| Type | string | Yes | — | One of: KeepAlive, RelativeTime, AbsoluteTime, MaximumIdleTime | The type of dataset acceleration slot lifecycle. |
| Config | string | No | `{}` | JSON string format | Configuration of the dataset acceleration slot lifecycle. The structure varies by type. |

## Code Examples

### Describe Endpoint Details - Python - All Regions

```python
import requests

def describe_endpoint(endpoint_id):
    url = f"https://api.aliyun.com/api/PAIElasticDatasetAccelerator/2022-08-01/endpoints/{endpoint_id}"
    headers = {
        "Authorization": "Bearer $DASHSCOPE_API_KEY",
        "Content-Type": "application/json"
    }
    response = requests.get(url, headers=headers)
    return response.json()

# Example usage
result = describe_endpoint("end-my1tk3jggooi5uwks5")
print(result)
```

### Unbind Endpoint from Slot - Bash - All Regions

```bash
curl -X DELETE \
  https://api.aliyun.com/api/PAIElasticDatasetAccelerator/2022-08-01/UnbindEndpoint/end-my1tk3jggooi5uwks5/slot-my1tk3jggooi5uwks5 \
  -H 'Authorization: Bearer $DASHSCOPE_API_KEY'
```

### Update Dataset Acceleration Slot - Bash - All Regions

```bash
curl -X PUT https://api.aliyun.com/api/PAIElasticDatasetAccelerator/2022-08-01/slots/slot-my1tk3jggooi5uwks5 \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
  "Name": "slot_1",
  "Description": "xgboost dataset acceleration slot",
  "StorageType": "OSS",
  "StorageUri": "oss://pai-vision-data-hz2.oss-cn-hangzhou-internal.aliyuncs.com/data/VOCdevkit/VOC2007/ImageSets/Main/val.txt",
  "Capacity": "30.0G"
}'
```

### Describe Endpoint with International Endpoint - Bash

```bash
curl -X GET 'https://api.alibabacloud.com/api/PAIElasticDatasetAccelerator/2022-08-01/endpoints/end-my1tk3jggooi5uwks5' \
-H 'Authorization: Bearer $DASHSCOPE_API_KEY' \
-H 'Content-Type: application/json'
```

## Response Format

```json
{
  "RequestId": "A731A84D-55C9-44F7-99BB-E1CF0CF19197",
  "UserId": "276065346797410278",
  "OwnerId": "1557702098194904",
  "GmtCreateTime": "2014-10-02T15:01:23Z",
  "GmtModifiedTime": "2014-10-02T15:01:23Z",
  "Uuid": "end-ivrq92qhbyrg4jctih",
  "Name": "endpoint-1",
  "Type": "VPC",
  "VpcId": "vpc-j6co2fcdsl1q0gnuc3ym3",
  "VswitchId": "vsw-j6cmr00qjyrft6jo2mg7g",
  "Status": {
    "Phase": "Ready",
    "Code": "200",
    "Message": "Init Succeed",
    "Detail": {
      "IpPortMapping": {
        "key": {
          "Ip": "10.0.0.5",
          "Port": "3306"
        }
      }
    }
  }
}
```

**Key Fields**:
- `RequestId` — Unique identifier for the API request (useful for troubleshooting)
- `Status.Phase` — Current operational phase of the endpoint (e.g., Ready, Pending)
- `Status.Code` — Numeric status code indicating success or failure reason
- `Status.Message` — Human-readable status message
- `Uuid` — Internal unique identifier for the endpoint
- `VpcId` / `VswitchId` — Network configuration identifiers for VPC-type endpoints

## Error Handling

| Error Code (Code) | Description (Description) | Recommended Action (Recommended Action) |
|-------------------|----------------------------|----------------------------------------|
| InvalidParameter.EndpointId | The specified EndpointId is invalid or does not exist. Verify the endpoint ID and try again. | Check the EndpointId format and ensure it exists via ListEndpoints. |
| ResourceNotFound | The specified dataset acceleration endpoint could not be found. Ensure the endpoint exists and the request is made in the correct region. | Confirm the endpoint exists and you're using the correct regional endpoint. |
| UnauthorizedOperation | You do not have permission to access this endpoint. Check your RAM policy or credentials. | Verify your account has the necessary permissions and API key is valid. |
| 400 | Bad Request: The request is malformed or contains invalid parameters. | Validate all required parameters and their formats (e.g., StorageType must be OSS or NAS). |
| 404 | Not Found: The specified EndpointId or SlotId does not exist. | Ensure both IDs are correct and belong to the same region/account. |
| 500 | Internal Server Error: An unexpected error occurred on the server side. | Retry the request after a short delay; contact support if persistent. |

## Environment Requirements

- Set the environment variable: `export DASHSCOPE_API_KEY=your_api_key_here`
- For Python examples: `pip install requests`
- No specific runtime version requirements are documented

## FAQ

Q: What is a dataset acceleration slot?
A: A dataset acceleration slot is a logical unit in PAI that caches and accelerates access to datasets stored in OSS or NAS, improving data loading performance for machine learning training jobs.

Q: How do I get an EndpointId or SlotId?
A: Use the ListEndpoints and ListSlots APIs (not covered in this skill) to retrieve existing IDs. These are typically created via the PAI console or other provisioning APIs.

Q: Can I use the same API key across regions?
A: Yes, the `DASHSCOPE_API_KEY` is global and works with both China (`api.aliyun.com`) and international (`api.alibabacloud.com`) endpoints.

Q: What storage types are supported for slots?
A: Only `OSS` (Object Storage Service) and `NAS` (Network Attached Storage) are supported. Specify the full URI in `StorageUri` (e.g., `oss://bucket/path`).

Q: Is there a way to check if an endpoint is ready for use?
A: Yes, call Describe Endpoint and check `Status.Phase`. A value of `"Ready"` indicates the endpoint is fully initialized and usable.

## Pricing & Billing

### Billing Model
Billing is **per-request**, meaning each API call (regardless of success/failure) counts as one billable request.

### Price Reference

| Tier/Model | Input Price | Output Price |
|------------|-------------|--------------|
| standard (DescribeEndpoint) | 0.0001 / | 0.0001 / |
| standard (UnbindEndpoint) | 0.001 / | 0.001 / |

### Free Tier
- DescribeEndpoint: 1000 free requests per month
- UnbindEndpoint: 100 free calls per month

### Usage Limits
- DescribeEndpoint: Maximum 10 requests per second
- UnbindEndpoint: Each request limited to 8K tokens (though this is likely a documentation artifact since it's not a text-generation API)

### Billing Notes
Each API call is counted as one request regardless of response size or outcome (success/failure).