# alinux-system

Part of **ALINUX**

# Alibaba Cloud Linux System Performance Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|---------|----------|------------------|
| High ksoftirqd Latency | `132169 usec` IRQ-to-softirq latency on CPU 11 | High | Use bpftrace/ftrace to identify blocking tasks; adjust scheduling policy |
| Slab Unreclaimable Memory Leak | `SUnreclaim` > 10% of total memory | High | Use crash/perf to locate leaking slab; analyze kmalloc allocations |
| io_uring Creation Fails with ENOMEM | `io_uring_queue_init()` returns -ENOMEM | Medium | Increase locked memory limit via ulimit or run as root |
| CPU Sys Usage Spikes Under Memory Pressure | High %sys in top/iostat during memory stress | Medium | Increase `vm.watermark_scale_factor` to 150 |
| Load Average >1 on Idle System | `load average` always >1 with no workload | Low | Disable async load calc: `echo 0 > /proc/async_load_calc` |
| Container Memory Usage Higher Than Host | `free` shows more used memory in container than host | Low | Update procps-ng to version 3.3.15-14.0.3 or later |
| eBPF LRU Hash Causes CPU Spike | CPU utilization spikes with eBPF programs using LRU maps | High | Install kernel hotfix for LRU hash memory leak |
| ext4 Write Performance Degradation | Poor Buffer I/O write performance with specific mount options | Medium | Remove `nodelalloc` or upgrade kernel |

## Problem Details

### Problem 1: High ksoftirqd Latency

**Symptoms**
- Error message: `132169 usec`
- Behavior: Network or I/O operations experience high latency; softirq processing delayed
- Context: Occurs under heavy interrupt load or when real-time tasks monopolize CPU

**Root Cause**
The ksoftirqd thread is not scheduled promptly due to higher-priority real-time tasks (SCHED_FIFO, SCHED_RR, or SCHED_DEADLINE) or long-running kernel operations blocking softirq execution.

**Solution**
1. Install bpftrace if not present:
   ```bash
   sudo yum install -y bpftrace
   ```
2. Download and run the softirq latency diagnostic script:
   ```bash
   sudo wget -P /tmp https://gitee.com/dtcccccc/softirq_net_latency/raw/master/softirq_net_latency.bt
   sudo bpftrace /tmp/softirq_net_latency.bt 100000
   ```
3. For deeper analysis, enable ftrace events and apply patch:
   ```bash
   sudo wget -P /tmp https://gitee.com/dtcccccc/softirq_net_latency/raw/master/softirq_ftrace.patch
   sudo sh -c 'echo "irq:softirq_raise irq:softirq_entry sched:sched_switch sched:sched_wakeup raw_syscalls:sys_enter raw_syscalls:sys_exit" > /sys/kernel/debug/tracing/set_event'
   sudo sh -c 'echo 1 > /sys/kernel/debug/tracing/tracing_on'
   cd /tmp
   sudo patch -p1 < /tmp/softirq_ftrace.patch
   sudo bpftrace --unsafe /tmp/softirq_net_latency.bt 100000
   ```
4. Check trace output for specific CPU (e.g., CPU 11):
   ```bash
   sudo su
   cd /sys/kernel/debug/tracing/per_cpu/cpu11/
   grep "vec=3" ./trace
   ```

**Verification**
- After identifying and adjusting priority of blocking tasks, re-run diagnostic script
- Expected: IRQ-to-softirq latency drops below 10000 usec (10 ms)

### Problem 2: Slab Unreclaimable Memory Leak

**Symptoms**
- Error message: `slab_unreclaimable_high`
- Behavior: `SUnreclaim` in `/proc/meminfo` exceeds 10% of total system memory; OOM killer may trigger
- Context: Long-running systems with heavy kernel module or driver usage

**Root Cause**
Kernel components or drivers allocate memory via slab allocator but fail to release it properly, causing unreclaimable slab memory accumulation.

**Solution**
1. Check current unreclaimable memory:
   ```bash
   cat /proc/meminfo | grep "SUnreclaim"
   slabtop -s -a
   ```
2. Identify problematic slab cache:
   ```bash
   cat /sys/kernel/slab/<slab_NAME>/reclaim_account
   ```
3. Install crash and debuginfo packages:
   ```bash
   sudo yum install crash -y
   sudo yum install kernel-debuginfo-$(uname -r) --enablerepo=alinux3-plus-debug
   ```
4. Analyze static allocations with crash:
   ```bash
   sudo crash
   kmem -S kmalloc-192
   kmem -S kmalloc-192 | tail -n 10
   rd ffff88028398a000 512 -S
   ```
5. For dynamic analysis, use perf to track allocations/frees:
   ```bash
   sudo yum install perf -y
   sudo perf record -a -e kmem:kmalloc --filter 'bytes_alloc == 192' -e kmem:kfree --filter ' ptr != 0' sleep 200
   sudo perf script > testperf.txt
   cat testperf.txt
   ```

**Verification**
- After fixing the leaking component, monitor `SUnreclaim` over time
- Expected: `SUnreclaim` stabilizes below 5% of total memory

### Problem 3: io_uring Creation Fails with ENOMEM

**Symptoms**
- Error message: `ENOMEM`
- Behavior: `io_uring_queue_init()` returns -1 with errno set to ENOMEM
- Context: Creating io_uring instances in containers or unprivileged contexts

**Root Cause**
System cannot allocate sufficient locked memory for io_uring ring buffers due to low `ulimit -l` (max locked memory) setting or insufficient physical memory.

**Solution**
1. Check current limits:
   ```bash
   ulimit -a
   ```
2. Either run with elevated privileges:
   ```bash
   sudo your_io_uring_application
   ```
3. Or increase locked memory limit in Docker by running container in privileged mode:
   ```bash
   docker run --privileged your_image
   ```
4. For non-containerized apps, adjust limits in `/etc/security/limits.conf`:
```text
   * soft memlock unlimited
   * hard memlock unlimited
   ```

**Verification**
- Re-run io_uring application
- Expected: `io_uring_queue_init()` returns 0 (success)

### Problem 4: CPU Sys Usage Spikes Under Memory Pressure

**Symptoms**
- Behavior: High `%sys` CPU usage in `top` or `iostat` during memory-intensive workloads
- Context: Systems with large memory footprints approaching available RAM

**Root Cause**
Kernel spends excessive time in direct memory reclaim because watermark thresholds (min/low/high) are too close together, triggering frequent synchronous reclaim instead of background kswapd activity.

**Solution**
1. Increase watermark scale factor to widen gaps between watermarks:
   ```bash
   sudo sh -c 'echo 150 > /proc/sys/vm/watermark_scale_factor'
   ```
2. Verify zone watermarks:
   ```bash
   cat /proc/zoneinfo
   ```

**Verification**
- Under memory pressure, monitor CPU usage
- Expected: `%sys` decreases significantly; kswapd handles most reclaim

### Problem 5: Load Average >1 on Idle System

**Symptoms**
- Behavior: `load average` consistently >1 even with no user processes running
- Context: Alibaba Cloud Linux 3 with kernel version kernel-5.10.60-9.al8

**Root Cause**
Container resource statistics enhancement feature introduces inaccurate load calculation logic, though actual system performance is unaffected.

**Solution**
1. Disable async load calculation:
   ```bash
   sudo sh -c 'echo 0 > /proc/async_load_calc'
   ```
2. To persist across reboots, add to startup script (e.g., `/etc/rc.local`)

**Verification**
- Run `uptime` or `top` after applying fix
- Expected: `load average` approaches 0.00 on idle system

### Problem 6: Container Memory Usage Higher Than Host

**Symptoms**
- Behavior: `free` command shows higher memory usage inside container than on host
- Context: Alibaba Cloud Linux 3 with older procps-ng versions

**Root Cause**
Memory calculation formula incorrectly treats shared memory (shmem) as free memory instead of used memory.

**Solution**
1. Update procps-ng package:
   ```bash
   sudo yum update procps-ng
   ```

**Verification**
- Compare `free` output inside container and on host
- Expected: Container memory usage ≤ host memory usage

### Problem 7: eBPF LRU Hash Causes CPU Spike

**Symptoms**
- Error message: `CPU_Utilization_Spike`
- Behavior: Sudden CPU usage increase when running eBPF programs
- Context: Alibaba Cloud Linux 3 with kernel ≥ 5.10.134-15.al8.x86_64

**Root Cause**
Defect in kernel's LRU hash implementation causes memory leaks and lock contention in eBPF maps.

**Solution**
1. Verify kernel version and LRU map usage:
   ```bash
   uname -r
   sudo bpftool map show | grep "type lru_hash"
   ```
2. Install hotfix:
   ```bash
   yum install kernel-hotfix-22519882-<OS_SUBVERSION>
   ```

**Verification**
- Monitor CPU usage after hotfix installation
- Expected: CPU utilization returns to baseline levels

### Problem 8: ext4 Write Performance Degradation

**Symptoms**
- Error message: `PerformanceDegradation`
- Behavior: Poor Buffer I/O write performance on ext4 filesystems
- Context: Alibaba Cloud Linux 2 with `dioread_nolock` and `nodelalloc` mount options

**Root Cause**
Combination of `dioread_nolock` and `nodelalloc` prevents merging of 4KB dirty pages, causing excessive small writes.

**Solution**
1. Identify affected mount points:
   ```bash
   df <$DIR> | grep -v Filesystem | awk '{ print $1 }'
   mount | grep -w <$Partition> | grep ext4 | grep -w dioread_nolock | grep -w nodelalloc
   ```
2. Remount with `delalloc`:
   ```bash
   sudo mount -o remount,delalloc <$Device> <$MountPoint>
   ```
3. Alternatively, upgrade kernel to fixed version

**Verification**
- Run I/O benchmark after remounting
- Expected: Write throughput improves significantly

## FAQ

**Q: How do I check if Transparent Huge Pages (THP) is enabled?**
A: Run `cat /proc/meminfo | grep AnonHugePages`. A non-zero value indicates THP is active. You can also check configuration with `cat /sys/kernel/mm/transparent_hugepage/enabled`.

**Q: What permissions are needed to use io_uring with SQPOLL?**
A: The process needs CAP_SYS_ADMIN capability or must run as root. In containers, you may need `--privileged` mode or specific capability grants (`--cap-add=SYS_ADMIN`).

**Q: How do I enable debug logging for memory management issues?**
A: Use kernel tracing interfaces: `echo 1 > /sys/kernel/debug/tracing/tracing_on` and configure relevant events in `/sys/kernel/debug/tracing/set_event`. Tools like `crash`, `perf`, and `bpftrace` provide deeper diagnostics.

**Q: What are common causes of high %sys CPU usage?**
A: Common causes include memory pressure (triggering direct reclaim), network interrupt storms, excessive system calls, or kernel bugs. Use `perf top` to identify hot kernel functions.

**Q: How do I roll back a failed kernel parameter change?**
A: Most runtime parameters (e.g., in `/proc/sys/vm/`) revert on reboot. For boot parameters modified via `grubby`, restore original GRUB config from backup or re-run `grubby` with original arguments. Always create system snapshots before major changes.