# alinux-network

Part of **ALINUX**

# Alibaba Cloud Linux Networking Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|---------|------------------|
| SMC Connection Fails with Fallback to TCP | Error code `0x03010000` or `SMC-001`; connection falls back to TCP | High | Verify SMC support on both ends and check RDMA device availability |
| TCP BBR Causes High CPU and Low Performance | Elevated CPU usage and degraded network throughput under high PPS | Medium | Switch to Cubic congestion control or enable tc-fq qdisc |
| Missing sch_netem Kernel Module | `modprobe sch_netem` fails with "Module not found" | Medium | Install `kernel-modules-extra` or `kernel-modules-internal` package |
| IPVS Estimation Causes Network Jitter | Unstable latency in large Kubernetes clusters | High | Disable IPVS estimation via sysctl or modprobe configuration |
| Policy Routing Commands Fail | `ip rule add` or `ip route add table` returns "Operation not permitted" | Medium | Upgrade kernel or verify `CONFIG_IP_MULTIPLE_TABLES` support |

## Problem Details

### Problem 1: SMC Connection Fails with Fallback to TCP

**Symptoms**
- Error message: `0x03010000` — "Peer does not support SMC"
- Error message: `SMC-001` — "SMC connection negotiation failed, fell back to TCP"
- Behavior: Applications show no performance improvement despite SMC being enabled
- Context: Occurs when establishing SMC connections between Alibaba Cloud Linux 3 instances

**Root Cause**
- The peer endpoint does not support SMC (lacks SMC TCP option flags in SYN/SYN-ACK)
- No available SMC-R (RDMA) or SMC-D devices due to missing eRDMA configuration
- IPv6 is enabled, but SMCv2 does not support AF_INET6 in current implementation
- UEID mismatch between peers when using SMCv2

**Solution**
1. Check SMC connection status and fallback reason:
   ```bash
   smcss -a
   ```
2. Verify RDMA device availability:
   ```bash
   smcr d
   ibv_devinfo -d <device> -v | grep max_mr
   ```
3. If IPv6 is not required, disable it system-wide:
   ```bash
   sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
   sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
   ```
   Or disable per interface:
   ```bash
   sudo sysctl -w net.ipv6.conf.<NetInName>.disable_ipv6=1
   ```
4. For SMCv2, ensure both hosts have the same UEID:
   ```bash
   smcr ueid set <UEID>
   ```
5. Check RDMA error statistics (for eRDMA):
   ```bash
   eadm stat -d <ibdev_name> -l
   ```

**Verification**
- Run `smcss -a` and confirm connections show `SMC-R` or `SMC-D` without fallback
- Use `sockperf` or `iperf3` to validate performance improvement over TCP baseline

### Problem 2: TCP BBR Causes High CPU and Low Performance

**Symptoms**
- Error message: None (silent performance degradation)
- Behavior: High CPU usage, reduced throughput, slow application response (e.g., Redis)
- Context: Occurs on Alibaba Cloud Linux 2 with kernel ≤ 4.19.48-14.al7 and default BBR

**Root Cause**
- In older kernels, BBR without `tc-fq` qdisc creates one hrtimer per connection
- This increases CPU overhead under high connection count or PPS, degrading performance

**Solution**
Choose one of the following based on your use case:

**Option A: Switch to Cubic (recommended for internal services)**
```bash
sudo sysctl -w net.ipv4.tcp_congestion_control=cubic
# Make persistent
echo 'net.ipv4.tcp_congestion_control = cubic' | sudo tee -a /etc/sysctl.conf
```

**Option B: Enable tc-fq qdisc (recommended for public-facing services needing BBR)**
```bash
# Replace eth0 with your primary interface
sudo tc qdisc replace dev eth0 root fq
# Verify
tc qdisc show dev eth0
```

**Option C: Upgrade kernel**
- Upgrade to a newer Alibaba Cloud Linux 2 kernel (> 4.19.48-14.al7) where this issue is resolved

**Verification**
- Monitor CPU usage during load test:
  ```bash
  top -p $(pgrep -d',' your_app)
  ```
- Confirm congestion control algorithm:
  ```bash
  sysctl net.ipv4.tcp_congestion_control
  ```
- Expected output: `cubic` or continued use of `bbr` with stable CPU under load

### Problem 3: Missing sch_netem Kernel Module

**Symptoms**
- Error message: `modprobe: FATAL: Module sch_netem not found`
- Behavior: Cannot simulate network delay, loss, or corruption using `tc netem`
- Context: On Alibaba Cloud Linux 3 with kernel < 5.10.134-16

**Root Cause**
- The `sch_netem` module was moved to separate packages (`kernel-modules-extra` or `kernel-modules-internal`) due to kernel modularization

**Solution**
1. Identify your exact kernel version:
   ```bash
   uname -r
   ```
2. Install the appropriate package:
   ```bash
   # For most users
   sudo yum install -y kernel-modules-extra-$(uname -r)
   # If above fails, try
   sudo yum install -y kernel-modules-internal-$(uname -r)
   ```
3. Load the module:
   ```bash
   sudo modprobe sch_netem
   ```

**Verification**
- Confirm module is loaded:
  ```bash
  lsmod | grep sch_netem
  ```
- Test basic netem functionality:
  ```bash
  sudo tc qdisc add dev lo root netem delay 100ms
  ping -c 3 127.0.0.1  # Should show ~100ms RTT
  sudo tc qdisc del dev lo root
  ```

### Problem 4: IPVS Estimation Causes Network Jitter

**Symptoms**
- Error message: None
- Behavior: Increased network latency jitter in Kubernetes clusters using IPVS mode
- Context: Large-scale container deployments with frequent service updates

**Root Cause**
- IPVS estimation feature uses kernel timers to update connection statistics
- Under high service churn, timer overhead causes CPU contention and jitter

**Solution**
1. Disable estimation via sysctl (immediate effect):
   ```bash
   sudo sysctl -w net.ipv4.vs.run_estimation=0
   ```
2. Make persistent across reboots:
   ```bash
   echo 'net.ipv4.vs.run_estimation = 0' | sudo tee -a /etc/sysctl.conf
   ```
   OR configure via modprobe:
   ```bash
   echo 'options ip_vs run_estimation=0 post-up sysctl -p' | sudo tee /etc/modprobe.d/ipvs.conf
   ```

**Verification**
- Confirm setting is active:
  ```bash
  sysctl net.ipv4.vs.run_estimation
  # Expected: net.ipv4.vs.run_estimation = 0
  ```
- Check IPVS stats no longer update:
  ```bash
  sudo ipvsadm -Ln --stats
  # Run twice with 10s interval; values should remain static
  ```

### Problem 5: Policy Routing Commands Fail

**Symptoms**
- Error message: `RTNETLINK answers: Operation not permitted`
- Error code: `OperationDenied` or `InvalidParameter`
- Behavior: Cannot create custom routing tables or policy rules
- Context: Alibaba Cloud Linux 2 with kernel ≤ 4.19.34-11.al7

**Root Cause**
- Kernel compiled without `CONFIG_IP_MULTIPLE_TABLES` support
- Required for `ip rule` and multi-table routing (`ip route ... table N`)

**Solution**
1. Check kernel version:
   ```bash
   uname -r
   ```
2. If kernel ≤ 4.19.34-11.al7, upgrade to a newer Alibaba Cloud Linux 2 kernel
3. After upgrade, configure policy routing normally:
   ```bash
   # Example: Route traffic from specific source via secondary ENI
   sudo ip -4 route add default via <eth1_gateway> dev eth1 table 1001
   sudo ip -4 rule add from <source_ip> lookup 1001
   ```

**Verification**
- List custom routing table:
  ```bash
  ip route list table 1001
  ```
- Show policy rules:
  ```bash
  ip rule show
  ```
- Expected: No errors during creation and rules appear in output

## FAQ

**Q: How do I check if SMC is working correctly?**  
A: Use `smcss -a` to list all SMC connections. Active SMC links will show protocol as `SMC-R` or `SMC-D`. If it shows `TCP`, the connection fell back. Also verify RDMA devices with `smcr d`.

**Q: What permissions are needed to configure traffic control (tc) rules?**  
A: Root privileges are required. Use `sudo` with `tc` commands. The user must be able to modify network qdiscs, which is restricted to CAP_NET_ADMIN capability (typically root).

**Q: How do I enable debug logging for SMC issues?**  
A: Increase kernel log level for SMC:
```bash
echo 'module smc +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
dmesg -w  # Monitor real-time logs during connection attempts
```

**Q: Why does disabling IPv6 help with SMC?**  
A: Current SMCv2 implementation in Alibaba Cloud Linux does not support IPv6 (AF_INET6). If IPv6 is enabled, SMC may fail to establish links. Disabling IPv6 forces use of IPv4, which is fully supported.

**Q: Can I use BBR and tc-fq together safely?**  
A: Yes. In fact, combining BBR with `tc-fq` is recommended for optimal performance. The `fq` (fair queue) qdisc works synergistically with BBR by providing per-flow queuing, reducing bufferbloat and improving fairness.