# alinux-storage

Part of **ALINUX**

# Alibaba Cloud Linux Storage and Filesystem Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|---------|------------------|
| ext4 "No space left on device" despite free disk space | Error message: `No space left on device` even when `df` shows available space | High | Clean up files or reformat with higher inode density using `mkfs.ext4 -N` |
| ext4 "Structure needs cleaning" error | Error message: `Structure needs cleaning` on file access; filesystem becomes read-only | High | Unmount and repair with `fsck.ext4 -y` |
| Poor Buffer I/O write performance on ext4 | Write throughput significantly lower than expected on Alibaba Cloud Linux 2 | Medium | Remount without `dioread_nolock,nodelalloc` or upgrade kernel |
| NFS mount fails with "command not found" or "Operation not permitted" | Errors: `mount.nfs: command not found`, `mount point does not exist`, or `Operation not permitted` | High | Install `nfs-utils`/`rpcbind`, create mount point, or configure `no_root_squash` on server |
| Low NFS read performance on Alibaba Cloud Linux 3 | Slow large-file reads due to reduced pre-read window | Medium | Increase `read_ahead_kb` via sysfs, udev rule, or `/etc/nfs.conf` |
| Data corruption risk in ext4 nojournal mode | Potential silent data corruption during frequent file operations | High | Avoid `nojournal`; reformat with journal enabled |
| OverlayFS "Permission denied" on touch/copy-up | Error: `Permission denied` when writing to OverlayFS-mounted directory lacking read perms | Medium | Apply kernel hotfix or upgrade kernel |
| FAT filesystem mount warning about UTF-8 | Kernel warning: `utf8 is not a recommended IO charset for FAT filesystems` | Low | Use `utf8=1,iocharset=ascii` instead of `iocharset=utf8` |

## Problem Details

### Problem 1: ext4 "No space left on device" Despite Available Disk Space

**Symptoms**
- Error message: `No space left on device`
- Behavior: Applications fail to write files even though `df -h` shows ample free space
- Context: Common in environments with many small files (e.g., logs, caches, mail spools)

**Root Cause**
The error occurs due to **inode exhaustion**, not actual disk space depletion. ext4 filesystems have a fixed number of inodes allocated at creation time. When all inodes are used (even if total data size is small), no new files can be created.

**Solution**
1. Check inode usage:
   ```bash
   df -i
   ```
2. Identify directories consuming the most inodes:
   ```bash
   for i in /path/to/suspect/*; do echo $i; find $i | wc -l; done
   ```
3. Clean up unnecessary files (e.g., old logs, temporary files).
4. If cleanup isn't sufficient and the filesystem must support more files, back up data, reformat with higher inode count, and restore:
   ```bash
   sudo mkfs.ext4 -N <higher_inode_count> /dev/<device>
   ```

**Verification**
After cleanup or reformatting:
```bash
df -i
```
Confirm `IUse%` is below 90% and new files can be created successfully.

---

### Problem 2: ext4 "Structure needs cleaning" Error

**Symptoms**
- Error message: `Structure needs cleaning`
- Behavior: Filesystem becomes read-only; file operations fail
- Context: Often follows unexpected shutdowns, power loss, or storage hardware issues

**Root Cause**
Filesystem metadata corruption has occurred, triggering the kernel's safety mechanism to remount the filesystem as read-only to prevent further damage.

**Solution**
1. Unmount the affected filesystem:
   ```bash
   sudo umount /mnt
   ```
2. Ensure `e2fsprogs` is installed:
   ```bash
   sudo yum -y install e2fsprogs
   ```
3. Repair the filesystem non-interactively:
   ```bash
   sudo fsck.ext4 -y /dev/vdd1
   ```
   Replace `/dev/vdd1` with your actual device.

**Verification**
After repair:
```bash
sudo mount /dev/vdd1 /mnt
ls /mnt  # Should succeed without errors
```
Check system logs for any residual errors:
```bash
dmesg | tail -20
```

---

### Problem 3: Poor Buffer I/O Write Performance on ext4

**Symptoms**
- Behavior: Write performance significantly lower than expected on Alibaba Cloud Linux 2 instances
- Context: Occurs when ext4 is mounted with both `dioread_nolock` and `nodelalloc` options

**Root Cause**
The combination of `dioread_nolock` and `nodelalloc` prevents merging of 4KB dirty pages into larger writes, causing excessive I/O operations and degraded performance.

**Solution**
1. Identify the problematic mount:
   ```bash
   df /path/to/dir | grep -v Filesystem | awk '{ print $1 }'
   mount | grep -w <Partition> | grep ext4 | grep -w dioread_nolock | grep -w nodelalloc
   ```
2. Remount with `delalloc` (delayed allocation) enabled:
   ```bash
   sudo mount -o remount,delalloc /dev/<device> /mount/point
   ```
   Alternatively, upgrade to a newer kernel version that resolves this issue.

**Verification**
Monitor I/O after remount:
```bash
iostat -xm 1
```
Write throughput should improve, and `%util` should stabilize under load.

---

### Problem 4: NFS Mount Failures and Permission Errors

**Symptoms**
- Error messages:
  - `mount.nfs: command not found`
  - `mount point does not exist`
  - `Operation not permitted`
- Behavior: NFS mount fails or root user cannot write to mounted share
- Context: Initial setup or cross-version NFS client/server configurations

**Root Cause**
- Missing `nfs-utils` or `rpcbind` packages
- Local mount directory not created
- NFS server maps root user to anonymous user by default (security feature)

**Solution**
1. Install required packages:
   ```bash
   sudo yum install -y nfs-utils rpcbind
   sudo systemctl enable --now rpcbind
   ```
2. Create local mount point:
   ```bash
   sudo mkdir -p /nfs/mountpoint
   ```
3. For root write access, on the **NFS server**, edit `/etc/exports`:
   ```text
   /exported/path client_ip(rw,sync,no_root_squash)
   ```
   Then re-export:
   ```bash
   sudo exportfs -ra
   ```

**Verification**
Mount and test write access:
```bash
sudo mount -t nfs server:/exported/path /nfs/mountpoint
sudo touch /nfs/mountpoint/testfile
ls -l /nfs/mountpoint/testfile  # Should show root ownership
```

---

### Problem 5: Low NFS Read Performance on Alibaba Cloud Linux 3

**Symptoms**
- Behavior: Slow reading of large files from NFS shares
- Context: After upgrading to Alibaba Cloud Linux 3 (kernel 5.10+)

**Root Cause**
The `read_ahead_kb` parameter for NFS block devices was reduced from 15,360 KB to 128 KB in newer kernels, disabling effective pre-read optimization.

**Solution**
Choose one of the following methods:

**Option A: Temporary fix for single mount**
```bash
# Get major:minor numbers
MAJOR_MINOR=$(sudo mountpoint -d /nfs/mountpoint)
# Set read_ahead_kb to 15360
echo 15360 | sudo tee /sys/class/bdi/$MAJOR_MINOR/read_ahead_kb
```

**Option B: Persistent fix via udev (recommended for multiple mounts)**
Create `/etc/udev/rules.d/99-nfs.rules`:
```text
SUBSYSTEM=="bdi", ACTION=="add", PROGRAM=="/bin/awk -v bdi=$kernel 'BEGIN{ret=1} {if ($4 == bdi) {ret=0}} END{exit ret}' /proc/fs/nfsfs/volumes", ATTR{read_ahead_kb}="15360"
```

**Option C: Global config (nfs-utils ≥ 2.3.3-57.0.1.al8.1)**
Edit `/etc/nfs.conf`:
```ini
[nfsrahead]
nfs=15000
nfs4=16000
```
Then remount all NFS shares.

**Verification**
Test read performance:
```bash
dd if=/nfs/mountpoint/largefile of=/dev/null bs=1M
```
Compare timing before and after. Also verify setting:
```bash
cat /sys/class/bdi/$(mountpoint -d /nfs/mountpoint)/read_ahead_kb
```
Should return `15360`.

---

### Problem 6: Data Corruption Risk in ext4 nojournal Mode

**Symptoms**
- Behavior: Silent data corruption during high-frequency file create/delete operations
- Context: Filesystems created with `-O ^has_journal` (nojournal mode)

**Root Cause**
Disabling the journal removes atomicity guarantees. Under memory pressure or concurrent I/O, metadata updates may become inconsistent, leading to data corruption.

**Solution**
1. **Do not use nojournal mode** for production workloads.
2. If already deployed, back up data immediately.
3. Recreate the filesystem with journal enabled:
   ```bash
   sudo mkfs.ext4 /dev/<device>
   ```
   (Default includes journal; no special flags needed.)

**Verification**
Confirm journal is enabled:
```bash
sudo dumpe2fs /dev/<device> | grep has_journal
```
Output should include `Filesystem features: ... has_journal ...`.

---

### Problem 7: OverlayFS "Permission denied" on Write Operations

**Symptoms**
- Error message: `Permission denied`
- Behavior: `touch` or write fails on OverlayFS-mounted directory where underlying file lacks read permission
- Context: Common in containerized environments using OverlayFS

**Root Cause**
During "copy up" (when modifying a file from lower layer), OverlayFS requires read permission on the source file. If missing, the operation fails even for write-only access.

**Solution**
Apply the appropriate kernel hotfix based on your OS version:

**For Alibaba Cloud Linux 3 (kernel 5.10):**
```bash
sudo yum install -y kernel-hotfix-13108708-5.10.134-13.1
```

**For Alibaba Cloud Linux 2 (kernel 4.19):**
```bash
sudo yum install -y kernel-hotfix-13110805-4.19.91-27
```

Alternatively, upgrade to a fixed kernel version:
```bash
sudo yum upgrade kernel
```
Then reboot.

**Verification**
After reboot, reproduce the test case:
```bash
mkdir -p /root/test/lower/dir /root/test/upper /root/test/work /root/test/mount
chmod 0737 /root/test/lower/dir
chown root:bin /root/test/lower/dir
mount -t overlay -o lowerdir=/root/test/lower,upperdir=/root/test/upper,workdir=/root/test/work overlay /root/test/mount
cd /root/test && sudo -u bin -g bin touch mount/dir/RANDOM  # Should succeed
```

---

### Problem 8: FAT Filesystem Mount Warning About UTF-8

**Symptoms**
- Kernel warning: `fat: utf8 is not a recommended IO charset for FAT filesystems`
- Behavior: Filesystem mounts but may exhibit case-sensitivity issues
- Context: Using `mount -o iocharset=utf8` on FAT32/exFAT volumes

**Root Cause**
The Linux kernel’s FAT driver lacks UTF-8 case conversion tables. Using `iocharset=utf8` disables case-insensitive filename matching, which breaks expected FAT behavior.

**Solution**
Use the dedicated `utf8=1` option with ASCII I/O charset:
```bash
sudo mount -o utf8=1,iocharset=ascii /dev/vdb /mnt
```

**Verification**
Check mount options:
```bash
mount | grep /mnt
```
Should show `utf8=1,iocharset=ascii`. Test file access with mixed-case names to confirm case-insensitivity works.

## FAQ

**Q: How do I check if my ext4 filesystem is running out of inodes?**  
A: Run `df -i` to view inode usage. If `IUse%` is near 100%, you’ve exhausted inodes even if disk space remains.

**Q: What permissions are required to mount an NFS share as root?**  
A: The NFS server must export the share with `no_root_squash` in `/etc/exports`. Otherwise, root is mapped to `nobody`, causing "Operation not permitted" on writes.

**Q: How can I permanently fix low NFS read performance on Alibaba Cloud Linux 3?**  
A: Create a udev rule in `/etc/udev/rules.d/99-nfs.rules` to automatically set `read_ahead_kb=15360` for all NFS block devices upon mount.

**Q: Is it safe to use ext4 in nojournal mode?**  
A: No. Disabling the journal increases data corruption risk during crashes or power loss. Always use the default journal-enabled mode for production systems.

**Q: Why does OverlayFS require read permission for write operations?**  
A: OverlayFS performs a "copy up" when modifying files from the lower layer, which requires reading the original file. Without read permission, this copy fails, causing "Permission denied" even for write attempts.