# alinux-system

Part of **ALINUX**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Diagnose and resolve system performance issues](../../intent/alinux-troubleshoot-performance/SKILL.md). If you're unsure which path to take, check the routing skill first.

# Alibaba Cloud Linux System Performance Console Guide

## Operations Overview

| Operation | Console Entry | Prerequisites | Description |
|-----------|---------------|---------------|-------------|
| Tune Dirty Page Thresholds | Product > Operating Systems > Alibaba Cloud Linux > System Tuning | Root access to an Alibaba Cloud Linux 2 or 3 instance, Understanding of basic Linux system administration | Adjust system-wide dirty page thresholds for writeback control |
| Configure Context Readahead | — | Alibaba Cloud Linux 2 kernel version 4.19.91-18 or higher, Alibaba Cloud Linux 3 system | Set up context-based readahead to improve file access performance |
| Enable Continuous GPU Profiling | Console > ECS > Instances > Performance & Diagnostics > Continuous GPU Profiling | SysOM component installed and upgraded to version 3.9.0 or later, Access to the operating system console | Activate ongoing GPU performance profiling for workloads |
| Track Process Hotspots | Console > ECS > System Observation > Process Hotspot Tracking | Instance must be running, Access to system observation features, Java process selected for advanced tracking | Identify CPU or memory-intensive processes causing performance issues |
| Analyze AI Performance | Operating System Console > AI Profiling | Instance is managed, Instance has GPU and is running AI workload, Python 3.9–3.12, torch 2.4–2.7, CUDA 12.0–12.8 (excluding 12.7), Target process uses torch library, pip installed | Profile and monitor AI model performance, training speed, and inference frameworks |
| Perform Hotspot Analysis | System Observation > Hotspot Comparative Analysis | Java hotspot tracking configured (if analyzing memory/lock hotspots), x86 architecture instance, OS supports hotspot tracking | Conduct comparative hotspot analysis across time periods or instances |
| Conduct GPU Profiling | Operating System Console > GPU Performance & Diagnostics > GPU Continuous Profiling | SysOM component installed, SysOM version 3.9.0 or above, Target GPU instance is managed | Perform continuous GPU analysis and profiling for performance insights |
| Monitor Memory and CPU Latency | — | Alibaba Cloud Linux 3, Kernel version 5.10.134-12 or higher | Use memsli and sched_sli to track latency statistics in cgroups |
| Configure Reserved Memory for Page Cache | — | Alibaba Cloud Linux 3, Kernel version 5.10.112-11 or higher | Set aside memory specifically for page cache operations |
| Diagnose Memory Usage | Console > ECS > System Diagnostics > Memory Diagnostics > Memory Panorama Analysis | JDK 1.8 or higher, Instance or Pod is managed (if enabling application memory profiling) | Perform comprehensive memory usage analysis and diagnostics |

## Operation Steps

### Enable Continuous GPU Profiling

**Navigation**: Console > ECS > Instances > Performance & Diagnostics > Continuous GPU Profiling

**Prerequisites**:
- SysOM component installed and upgraded to version 3.9.0 or later
- Access to the operating system console

1. Navigate to component management in the left-side navigation pane  
   - Element: **component management** (link) — left-side navigation pane  
   - Notes: This step ensures the SysOM agent is ready

2. Create a new configuration with continuous GPU profiling enabled  
   - Element: **Enable continuous GPU profiling** (checkbox) — configuration form  
   - Notes: Enter a configuration name such as 'gpu continuous profiling configuration'

3. Activate the configuration using a management plan  
   - Element: **Submit** (button) — management plan form  
   - Notes: Set the SysOM component configuration to the created configuration. After submission, SysOM Agent memory limit increases from 300 MB to 2 GB.

4. Navigate to the continuous GPU profiling page and start analysis  
   - Element: **Start Analysis** (button) — main content area  
   - Notes: Select target instance, process PID, and time range before clicking

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Configuration Name | text_input | Yes | — | A user-defined name for the profiling configuration |
| Enable continuous GPU profiling | checkbox | No | — | Enables the continuous GPU profiling feature in the configuration |

### Track Process Hotspots

**Navigation**: Console > ECS > System Observation > Process Hotspot Tracking

**Prerequisites**:
- Instance must be running
- Access to system observation features
- Java process selected for advanced tracking

1. Navigate to the system observation section  
   - Element: **system observation** (menu) — left-side navigation pane  

2. Select process hotspot tracking  
   - Element: **process hotspot tracking** (menu) — left-side navigation pane  

3. Choose tracking method based on needs  
   - Element: **Execute Hotspot Tracking** (button) — main content area  
   - Notes: Use this if Java call stack is not needed.

4. Configure Java hotspot tracking for Java processes  
   - Element: **Configure Java Hotspot Tracking** (button) — main content area  
   - Notes: Button is disabled until instance ID and PID are selected. May cause application crash in rare cases.

5. Enable Java memory/lock hotspot tracking  
   - Element: **Enable Java Memory/Lock Hotspot Tracking** (checkbox) — configuration panel  
   - Notes: Required for collecting Java-specific hotspot data.

6. Set storage path and running duration  
   - Element: **Set Storage Path** (text_input) — configuration panel  
   - Notes: Directory requires 50 MB space and must be accessible by user process.

7. Select hotspot type and time range  
   - Element: **Hotspot Type** (dropdown) — configuration panel  
   - Notes: Options include ONCPU, Memory, and Lock when enabled for Java processes.

8. Execute hotspot tracking  
   - Element: **Execute Hotspot Tracking** (button) — main content area  

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Memory/Lock Hotspot Tracking | checkbox | No | — | Specifies whether to enable Java memory/lock hotspot tracking. |
| Set Storage Path | text_input | Yes | — | Stores the ATP agent and the hotspot information it collects. Ensure the user process has access to this path. The directory requires 50 MB of storage space. |
| Running Duration | number_input | Yes | — | The duration of Java memory/lock hotspot tracking. The maximum duration is 7 days. |
| Hotspot Type | dropdown | Yes | ONCPU, Memory, Lock | Selects the type of hotspot to track. Memory and Lock appear only when Java tracking is enabled. |

### Analyze AI Performance

**Navigation**: Operating System Console > AI Profiling

**Prerequisites**:
- Instance is managed
- Instance has GPU and is running AI workload
- Python 3.9–3.12
- torch 2.4–2.7
- CUDA 12.0–12.8 (excluding 12.7)
- Target process uses torch library
- pip installed

1. Access the AI Profiling page from the top-right corner  
   - Element: **Operating System Console-AI Profiling** (link) — top-right corner  

2. Select or enter conditions and click Start Analysis  
   - Element: **Start Analysis** (button) — main content area  
   - Notes: You can enter AI job PID or process name, multiple values separated by commas. If both are filled, the union is used.

3. In the analysis records section, click View Report  
   - Element: **View Report** (button) — analysis records section  

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Instance ID | dropdown | Yes | — | Select a managed instance with GPU running AI workloads. |
| AI Job PID | text_input | No | — | Specify AI job process IDs, multiple allowed (comma-separated). |
| AI Job Process Name | text_input | No | — | Specify AI job process names, multiple allowed (comma-separated). |
| Data Richness | checkbox | No | GPU, Python, CPU, Torch, (FLOPS), GPU, RDMA Monitor, DCGM Monitor, NVTX, Record Shapes, TCP | Select metrics to collect; multiple selections supported. |
| Analysis Mode | radio | Yes | duration, iteration | Choose time-based or iteration-based collection. |
| Collection Duration | number_input | No | — | Default 2000 ms; range: 1000–5000 ms. |
| Iteration Range | number_input | No | — | Default 0～10 iterations; supports skipping initial iterations. |
| Iteration Entry Module | text_input | No | — | Example: transformers.trainer |
| Iteration Entry Function | text_input | No | — | Training default: Optimizerstep; Inference default: LLMEngine.step |

### Conduct GPU Profiling

**Navigation**: Operating System Console > GPU Performance & Diagnostics > GPU Continuous Profiling

**Prerequisites**:
- SysOM component installed
- SysOM component version 3.9.0 or above
- Target GPU instance is managed

1. Click Component Management in the left navigation panel to create a new configuration  
   - Element: **Component Management** (link) — left-side navigation panel  

2. Enter a configuration name and check the box to enable GPU continuous profiling  
   - Element: **Enable GPU Continuous Profiling** (checkbox) — configuration creation form  
   - Notes: Example configuration name: 'gpu continuous profiling configuration'

3. Use the OS console management plan to apply the configuration to GPU instances  
   - Element: **Submit** (button) — operation confirmation area  
   - Notes: After enabling, SysOM Agent memory limit increases from 300 MB to 2 GB

4. Navigate to GPU Continuous Profiling under GPU Performance & Diagnostics  
   - Element: **GPU Continuous Profiling** (link) — GPU Performance & Diagnostics menu  

5. Select the target instance, AI application PID, and time range, then click Start Analysis  
   - Element: **Start Analysis** (button) — analysis parameter setting area  
   - Notes: Time axis dragging is supported for interactive viewing

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Configuration Name | text_input | Yes | — | Unique identifier for the configuration |
| Enable GPU Continuous Profiling | checkbox | No | — | Enables the GPU continuous profiling feature |
| Instance | dropdown | Yes | — | Select the target GPU instance running AI applications |
| PID | text_input | Yes | — | Enter the process ID of the AI application |
| Creation Time | date_picker | Yes | — | Select the analysis time interval |

### Diagnose Memory Usage

**Navigation**: Console > ECS > System Diagnostics > Memory Diagnostics > Memory Panorama Analysis

**Prerequisites**:
- JDK 1.8 or higher
- Instance or Pod is managed (if enabling application memory profiling)

1. Click System Diagnostics in the left navigation bar  
   - Element: **System Diagnostics** (link) — left-side navigation bar  

2. Select Memory Diagnostics as the diagnostic type and Memory Panorama Analysis as the item, choose target instance, then click Execute Diagnosis  
   - Element: **Execute Diagnosis** (button) — diagnostic configuration area  

3. In the diagnosis records section, click View Report  
   - Element: **View Report** (button) — diagnosis records section  

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Diagnosis Mode | dropdown | Yes | , Pod | Scope: entire instance or specific Pod |
| Diagnosis Type | dropdown | Yes | Diagnostic category (only memory supported) |
| Diagnosis Item | dropdown | Yes | Specific diagnostic function |
| Instance ID | text_input | Yes | — | Target instance to diagnose |
| Pod | dropdown | No | — | Required only in Pod diagnosis mode |
| Cluster Type | dropdown | No | — | Required only in Pod diagnosis mode |
| Cluster ID | text_input | No | — | Required only in Pod diagnosis mode |
| Enable Application Memory Profiling | checkbox | No | — | Toggle application memory profiling on/off |
| Profiling Duration | number_input | No | — | Duration in minutes; 0 means no profiling |
| pid | text_input | No | — | Optional Java PID; defaults to highest-memory Java process |

## FAQ

Q: Where do I find the GPU Continuous Profiling feature in the console?
A: Navigate to Operating System Console > GPU Performance & Diagnostics > GPU Continuous Profiling. Ensure your instance is managed and SysOM version is 3.9.0 or higher.

Q: What happens if I leave the Storage Path empty in Process Hotspot Tracking?
A: The system will use a default path, but you must ensure the user process has write access and at least 50 MB free space. Leaving it unspecified may cause tracking to fail.

Q: Can I modify the AI Profiling configuration after starting analysis?
A: No. Once analysis starts, parameters are fixed. You must create a new analysis session with updated settings.

Q: What permissions do I need to use System Observation features?
A: You need RAM permissions to access system observation and diagnostics. Your instance must also be in a running state and properly managed by SysOM.

Q: Why is the "Configure Java Hotspot Tracking" button disabled?
A: This button is only enabled after you select a valid instance ID and a PID corresponding to a Java process. Ensure both are chosen first.

## Pricing & Billing

### Billing Model
Most system performance features are free to use. However, Process Hotspot Tracking is billed per request after free quota is exhausted.

### Price Reference
| Tier | Input Price | Output Price |
|------|-------------|--------------|
| process_hotspot_tracking | ¥0.001 per request | ¥0.001 per request |

### Free Tier
- Process Hotspot Tracking: 100 free requests per month
- All other features (GPU Profiling, AI Profiling, Memory Diagnostics, etc.): Free with no usage limits

### Billing Notes
- Enabling Continuous GPU Profiling increases SysOM Agent memory usage from 300 MB to 2 GB but incurs no additional charges.
- AI Profiling and Memory Panorama Analysis are currently free, though they consume instance resources during execution.
- Process Hotspot Tracking charges apply only after exceeding the monthly free quota of 100 requests.