# pai-designer

Part of **PAI**

# Platform for AI (PAI) Pipeline & Workflow Management Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|----------|------------------|
| Component fails to execute with "Module not found" error | Error message: `ModuleNotFoundError: No module named 'xxx'` | High | Install missing dependency in the component environment or use a custom image |
| Workflow submission fails with permission denied | Error message: `AccessDeniedException` or HTTP 403 | High | Assign required RAM permissions to the user or role |
| Data loading fails due to path format mismatch | Error message: `Invalid OSS path format` or `File not found` | Medium | Use absolute OSS URI with correct bucket and endpoint format |
| Designer UI becomes unresponsive during large pipeline editing | Behavior: UI freezes or lags when dragging components | Low | Reduce pipeline complexity or refresh browser with cleared cache |

## Problem Details

### Problem 1: Component Fails to Execute Due to Missing Python Module

**Symptoms**
- Error message: `ModuleNotFoundError: No module named 'sklearn'` (or similar)
- Behavior: Component status shows "Failed" immediately after starting
- Context: Occurs when using built-in components that depend on libraries not pre-installed in the default runtime

**Root Cause**
The default execution environment for PAI Designer components includes only a minimal set of Python packages. If a component requires additional libraries (e.g., `xgboost`, `transformers`, or custom packages), and they are not explicitly installed, the job fails at import time.

**Solution**
1. **Option A: Use a custom Docker image**  
   Build a Docker image with required dependencies and specify it in the component configuration:
   ```json
   {
     "image": "registry.cn-hangzhou.aliyuncs.com/pai-images/custom-ml:latest"
   }
   ```
2. **Option B: Add installation command in component code**  
   For script-based components, prepend pip install commands:
   ```python
   import os
   os.system("pip install -q scikit-learn==1.3.0")
   import sklearn
   ```
3. **Option C: Use the "Install Dependencies" field** (if available in UI)  
   In the component properties panel, enter package names in the "Additional Python Packages" field (comma-separated).

**Verification**
- Re-run the component
- Check logs for successful import (no `ModuleNotFoundError`)
- Confirm output artifacts are generated as expected

### Problem 2: Workflow Submission Fails with Access Denied

**Symptoms**
- Error message: `AccessDeniedException: User is not authorized to perform this action`
- Behavior: Pipeline fails to start; error appears in submission log
- Context: Occurs when submitting workflows that access OSS, MaxCompute, or other Alibaba Cloud resources

**Root Cause**
The RAM user or role used to run the PAI Designer workflow lacks necessary permissions to access required cloud resources (e.g., OSS buckets, MaxCompute projects, or PAI services).

**Solution**
1. Go to **RAM Console > Users** and select your user
2. Attach the following managed policies (minimum required):
   - `AliyunOSSFullAccess` (if using OSS)
   - `AliyunMaxComputeFullAccess` (if using MaxCompute)
   - `AliyunPAIFullAccess`
3. Alternatively, create a custom policy with least-privilege permissions:
   ```json
   {
     "Version": "1",
     "Statement": [
       {
         "Action": ["oss:GetObject", "oss:PutObject"],
         "Resource": "acs:oss:*:*:your-bucket/*",
         "Effect": "Allow"
       },
       {
         "Action": "pai:*",
         "Resource": "*",
         "Effect": "Allow"
       }
     ]
   }
   ```
4. Ensure the PAI workspace is associated with a service-linked role that has required trust relationships

**Verification**
- Resubmit the workflow
- Confirm no `AccessDeniedException` in logs
- Verify data is read/written successfully to target storage

### Problem 3: Data Loading Fails Due to Incorrect OSS Path Format

**Symptoms**
- Error message: `Invalid OSS path format` or `File not found: oss://...`
- Behavior: Data import component fails during initialization
- Context: Common when copying paths from OSS Browser or external tools

**Root Cause**
PAI Designer requires OSS paths in the exact format: `oss://<bucket-name>/<object-key>`. Paths missing the bucket name, using HTTP URLs, or including region endpoints (e.g., `oss-cn-beijing.aliyuncs.com`) are invalid.

**Solution**
1. Use the **OSS URI format without endpoint**:  
   ✅ Correct: `oss://my-data-bucket/datasets/train.csv`  
   ❌ Incorrect: `https://my-data-bucket.oss-cn-beijing.aliyuncs.com/datasets/train.csv`
2. In the Designer UI, use the **OSS file selector** (folder icon) to browse and auto-fill valid paths
3. If referencing cross-region buckets, ensure the PAI workspace and OSS bucket are in the same region, or configure cross-region access via RAM

**Verification**
- Run a test data import component
- Check logs for successful file read (e.g., "Loaded 1000 rows")
- Confirm downstream components receive input data

### Problem 4: Designer UI Becomes Unresponsive with Large Pipelines

**Symptoms**
- Behavior: Browser tab freezes, lag when dragging components, slow save operations
- Context: Occurs with pipelines containing >50 components or complex connections

**Root Cause**
The browser-based Designer UI renders the entire pipeline as an interactive graph. Very large DAGs exceed rendering performance limits of standard client hardware.

**Solution**
1. Break large pipelines into **sub-pipelines** using the "Pipeline Component" feature
2. Disable auto-layout: In UI settings, turn off "Auto Arrange Components"
3. Clear browser cache and disable unnecessary extensions
4. Use **Chrome or Edge** (recommended browsers for PAI Designer)

**Verification**
- UI remains responsive during editing
- Save and load operations complete in <10 seconds
- Canvas zoom/pan works smoothly

## FAQ

**Q: How do I view detailed logs for a failed component?**  
A: In the pipeline run details page, click the failed component, then go to the "Logs" tab. You can also access raw logs in the OSS logging directory specified in your job configuration.

**Q: What permissions are required to use Machine Learning Designer?**  
A: At minimum, the user needs `pai:CreateJob`, `pai:ListJobs`, `oss:GetObject`, and `oss:PutObject` permissions. Full access is granted via the `AliyunPAIFullAccess` and `AliyunOSSFullAccess` RAM policies.

**Q: Can I use my own Docker image in Designer components?**  
A: Yes. In the component configuration, specify a valid image URI from Alibaba Cloud Container Registry (ACR). The image must include Python 3.7+ and required ML libraries.

**Q: Why does my pipeline succeed but produce empty output?**  
A: This often occurs when output paths are misconfigured. Verify that your component writes to the correct output variable (e.g., `/mnt/output/data`) and that the OSS output path is writable.

**Q: How do I enable debug mode for component execution?**  
A: Set the environment variable `LOG_LEVEL=DEBUG` in the component’s advanced settings. This increases verbosity in the execution logs, showing detailed step-by-step output.