# alinux-monitoring

Part of **ALINUX**

# Alibaba Cloud Linux System Monitoring and Logging CLI Reference

## Command Overview

| Command | Purpose | Syntax |
|--------|---------|--------|
| `kdumpctl status` | Check kdump service status | `sudo kdumpctl status` |
| `kdumpctl start` | Start the kdump service | `sudo kdumpctl start` |
| `kdumpctl readlog` | View kernel crash dump boot log | `sudo kdumpctl readlog` |
| `sysak list` | List available SysAK diagnostic tools | `sysak list [flags]` |
| `sysak loadtask` | Show system load summary | `sysak loadtask [flags]` |
| `sysak iofsstat` | Monitor disk I/O statistics | `sysak iofsstat [flags]` |
| `sysak memleak` | Detect memory leaks (e.g., slab) | `sysak memleak [flags]` |
| `sysak nosched` | Detect CPU scheduling delays | `sysak nosched [flags]` |
| `sysak irqoff` | Detect long interrupt-off periods | `sysak irqoff [flags]` |
| `sysak pingtrace` | Trace network latency to a host | `sysak pingtrace [flags]` |
| `sysak memgraph` | Display memory usage chart | `sysak memgraph [flags]` |
| `sysak mservice` | Manage continuous system monitoring service | `sudo sysak mservice [flags]` |

## Command Details

### kdumpctl

**Purpose**: Manage and inspect kernel crash dump (kdump) service and logs on Alibaba Cloud Linux 3.

**Syntax**:
```bash
sudo kdumpctl [command]
```

| Parameter | Short | Type | Required | Description |
|----------|-------|------|----------|-------------|
| `status` | — | string | Yes | Check if kdump is active and ready |
| `start` | — | string | Yes | Start the kdump service |
| `readlog` | — | string | Yes | Read the console and tty logs from the last crash dump |

```bash
# Check kdump service status
sudo kdumpctl status

# Start kdump service manually
sudo kdumpctl start

# Simulate a kernel crash (for testing only)
echo c | sudo tee /proc/sysrq-trigger

# View boot logs from the crash dump environment
sudo kdumpctl readlog
```

**Example Output**:
```text
console log: 
[    0.000000] Linux version 5.10.134-17.2.al8.x86_64 ...
[    0.000000] Command line: BOOT_IMAGE=... panic=10 ...
...
ttylog:

Welcome to Alibaba Cloud Linux 3.2104 U10 (OpenAnolis Edition) dracut-049-228.git20230802.0.1.al8 (Initramfs)!
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Reached target Local File Systems.
...
```

### sysak

**Purpose**: Run system diagnostics and monitoring using the SysAK toolkit for CPU, memory, I/O, network, and scheduling analysis.

**Syntax**:
```bash
sysak [subcommand] [options]
```

| Parameter | Short | Type | Required | Description |
|----------|-------|------|----------|-------------|
| `list` | — | string | Yes | List all available diagnostic tools |
| `-a` | — | boolean | No | With `list`, show all tools including hidden ones |
| `loadtask` | — | string | Yes | Display current system load (CPU, memory, tasks) |
| `-s` | — | boolean | No | With `loadtask`, show summary only |
| `iofsstat` | — | string | Yes | Monitor filesystem I/O statistics |
| `-T` | — | integer | Yes | Duration in seconds to monitor I/O |
| `memleak` | — | string | Yes | Scan for memory leaks in specified allocator |
| `-t` | — | string | Yes | Leak type (e.g., `slab`) |
| `-c` | — | boolean | No | Quick check mode |
| `nosched` | — | string | Yes | Detect scheduling latency issues |
| `-t` | — | integer | Yes | Threshold in milliseconds for delay detection |
| `-s` | — | integer | Yes | Sampling duration in seconds |
| `irqoff` | — | string | Yes | Detect long interrupt-disabled periods |
| `-t` | — | integer | Yes | First arg: threshold (ms), second: duration (s) |
| `pingtrace` | — | string | Yes | Measure and trace network latency |
| `-c` | — | string | Yes | Target IP address or hostname |
| `memgraph` | — | string | Yes | Visualize memory usage over time |
| `-g` | — | boolean | No | Generate graphical output (text-based) |
| `mservice` | — | string | Yes | Control background monitoring service |
| `-S` | — | boolean | Yes | Start the monitoring service |
| `-l` | — | boolean | Yes | Launch interactive viewer for collected metrics |

```bash
# List all diagnostic tools in SysAK
sysak list -a

# Show a quick system load summary
sysak loadtask -s

# Monitor disk I/O for 10 seconds
sysak iofsstat -T 10

# Perform a quick slab memory leak check
sysak memleak -t slab -c

# Detect CPU scheduling delays >20ms over 30 seconds
sysak nosched -t 20 -s 30

# Detect interrupt-off periods longer than 5ms for 60 seconds
sysak irqoff -t 5 60

# Trace network latency to 8.8.8.8
sysak pingtrace -c 8.8.8.8

# Display memory usage chart
sysak memgraph -g

# Start the background monitoring service
sudo sysak mservice -S

# View live monitoring data interactively
sysak mservice -l
```

**Example Output**:  
*(Varies by subcommand; typically tabular or time-series text output showing metrics like I/O ops/sec, memory usage %, latency ms, etc.)*

## Common Scenarios

### Scenario 1: Diagnose System Crash After Unexpected Reboot
```bash
# Step 1: Verify kdump is active
sudo kdumpctl status

# Step 2: If inactive, start it (if crash just occurred, logs may still be available)
sudo kdumpctl start

# Step 3: Retrieve and inspect crash boot logs
sudo kdumpctl readlog
```

### Scenario 2: Investigate High System Load and Latency
```bash
# Step 1: Get an overview of current system load
sysak loadtask -s

# Step 2: Check for I/O bottlenecks over 15 seconds
sysak iofsstat -T 15

# Step 3: Detect CPU scheduling delays (>10ms) for 60 seconds
sysak nosched -t 10 -s 60

# Step 4: Check for memory leaks in slab allocator
sysak memleak -t slab -c
```

### Scenario 3: Set Up Continuous System Monitoring
```bash
# Step 1: Install SysAK (if not present)
sudo yum install -y sysak

# Step 2: Enable and start the SysAK service
sudo systemctl enable sysak
sudo systemctl start sysak

# Step 3: Start background metric collection
sudo sysak mservice -S

# Step 4: Access metrics via local HTTP endpoint (optional)
curl http://127.0.0.1:9200/metrics/raw/
```

## Environment Setup

### Installation

SysAK is not installed by default. Install it using one of the following methods:

**Via YUM (recommended)**:
```bash
sudo yum install -y sysak
```

**Via RPM (offline)**:
```bash
wget https://mirrors.openanolis.cn/sysak/packages/sysak-1.3.0-2.x86_64.rpm
sudo rpm -ivh --nodeps sysak-1.3.0-2.x86_64.rpm
```

The `kdumpctl` tool is pre-installed on Alibaba Cloud Linux 3 with kernel version 5.10.134-14 or later.

### Configuration

- **kdump**: Requires system memory >2 GB. No additional configuration needed for basic log viewing.
- **SysAK**: No authentication or cloud credentials required. Runs entirely locally.
- Ensure the system uses a supported OS: Alibaba Cloud Linux 2/3, Anolis OS 8.4+, or CentOS 7 with kernel ≥3.10 on x86_64.

## FAQ

Q: How do I verify that kdump captured a crash?
A: Run `sudo kdumpctl status`. If it shows "ready" and a vmcore file exists in `/var/crash/`, a dump was captured. Use `sudo kdumpctl readlog` to view the boot log from the crash context.

Q: What’s the difference between `sysak memleak -t slab` and general memory monitoring?
A: `memleak` specifically scans kernel slab allocators for unreleased objects (indicating leaks), while `memgraph` or `loadtask` show overall memory usage trends without leak detection.

Q: Why does `sysak nosched` report high scheduling delays?
A: Delays > threshold (e.g., 20ms) indicate CPU contention, real-time task interference, or kernel lock contention. Combine with `iofsstat` and `irqoff` to identify root cause (I/O, interrupts, or CPU-bound processes).

Q: Can I access SysAK metrics programmatically?
A: Yes. When `sysak mservice -S` is running, metrics are exposed via HTTP at `http://127.0.0.1:9200/metrics/raw/` in plain text format, suitable for scraping or scripting.

Q: Is kdumpctl available on Alibaba Cloud Linux 2?
A: The documented `kdumpctl readlog` feature is specific to Alibaba Cloud Linux 3 (kernel ≥5.10.134-14). Alibaba Cloud Linux 2 uses older kexec-tools and may not support this subcommand.