# ecs-storage-troubleshooting

Part of **ECS**

# ECS Storage Troubleshooting Guide

## Problem Index

| Problem | Symptoms | Severity | Solution Summary |
|--------|----------|----------|------------------|
| Cloud disk capacity appears smaller than purchased size | `df -h` shows less space than expected | Medium | Calculate filesystem overhead using `tune2fs` and account for reserved blocks and inodes |
| Snapshot deletion fails with error 403 | "Forbidden" or "Cannot delete snapshot used by image" | High | Remove image dependency before deleting the snapshot |
| Disk fails to mount after attachment | Mount command hangs or returns "special device does not exist" | High | Verify partition table and create filesystem if missing |

## Problem Details

### Problem 1: Cloud Disk Capacity Appears Smaller Than Purchased Size

**Symptoms**
- Error message: none (silent discrepancy)
- Behavior: Running `df -h` on a Linux ECS instance shows significantly less usable space than the purchased cloud disk capacity (e.g., 93 GiB instead of 100 GB)
- Context: Occurs after attaching and formatting a new data disk or after resizing an existing disk

**Root Cause**
The difference arises from multiple factors:
- Filesystem metadata overhead (superblocks, journal, etc.)
- Reserved blocks (typically 5% by default in ext4 for root user safety)
- Inode table allocation consuming space
- Unit conversion: cloud providers bill in decimal gigabytes (GB = 10^9 bytes), while `df` reports in binary gibibytes (GiB = 2^30 bytes)

**Solution**
1. Identify the correct partition (e.g., `/dev/vdb1`) using:
   ```bash
   lsblk
   ```
2. Use `tune2fs` to inspect filesystem details:
   ```bash
   sudo tune2fs -l /dev/vdb1 | grep -E "Block count|Block size|Reserved block count|Inode count|Inode size"
   ```
3. Calculate actual usable space:
   - Total raw blocks × block size = raw capacity
   - Subtract reserved blocks and inode space
4. (Optional) Reduce reserved space if non-root workload:
   ```bash
   sudo tune2fs -m 1 /dev/vdb1  # sets reserved blocks to 1%
   ```

**Verification**
- Re-run `df -h` and confirm usable space aligns with calculated value
- Compare with raw disk size from `lsblk` (which shows binary units):
  ```bash
  lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT
  ```

### Problem 2: Snapshot Deletion Fails with Error 403

**Symptoms**
- Error message: `{"Code":"Forbidden","Message":"The specified snapshot is used by other resources."}`
- Behavior: Attempt to delete a snapshot via console or API returns HTTP 403
- Context: User tries to clean up old snapshots but operation is blocked

**Root Cause**
Snapshots cannot be deleted if they are referenced by other resources, most commonly:
- Custom images created from the snapshot
- Snapshot chains where the snapshot is a base for incremental snapshots
- Ongoing backup or replication jobs

**Solution**
1. Check if the snapshot is used by an image:
   ```bash
   aliyun ecs DescribeImages --SnapshotId your-snapshot-id
   ```
2. If an image exists, either:
   - Delete the image first:
     ```bash
     aliyun ecs DeleteImage --ImageId your-image-id
     ```
   - Or use the console: Navigate to **Images > Custom Images**, find the image, and delete it
3. After removing dependencies, delete the snapshot:
   ```bash
   aliyun ecs DeleteSnapshot --SnapshotId your-snapshot-id
   ```

**Verification**
- Confirm snapshot no longer appears in list:
  ```bash
  aliyun ecs DescribeSnapshots --SnapshotId your-snapshot-id
  ```
- Expected output: empty `Snapshots` array or `InvalidSnapshotId.NotFound` error

### Problem 3: Disk Fails to Mount After Attachment

**Symptoms**
- Error message: `mount: /mnt/data: special device /dev/vdb1 does not exist.`
- Behavior: Disk is attached in ECS console, but `mount` command fails
- Context: New data disk attached to running Linux instance

**Root Cause**
The disk has been attached at the hypervisor level, but:
- No partition table exists on the disk
- Or partition exists but no filesystem was created
- Or kernel hasn’t rescanned the device (rare on modern systems)

**Solution**
1. Confirm disk is visible to OS:
   ```bash
   lsblk
   ```
   Should show `/dev/vdb` (or similar) with no partitions
2. Create a partition table (if needed):
   ```bash
   sudo fdisk /dev/vdb
   # In fdisk: n → p → 1 → [defaults] → w
   ```
3. Create a filesystem:
   ```bash
   sudo mkfs.ext4 /dev/vdb1
   ```
4. Mount the disk:
   ```bash
   sudo mkdir -p /mnt/data
   sudo mount /dev/vdb1 /mnt/data
   ```
5. (Optional) Add to `/etc/fstab` for auto-mount on boot

**Verification**
- Check mount point:
  ```bash
  df -h /mnt/data
  ```
- Expected: Shows mounted filesystem with correct size
- Write test file:
  ```bash
  echo "test" | sudo tee /mnt/data/test.txt
  ```

## FAQ

**Q: How do I check if my cloud disk is properly attached to my ECS instance?**  
A: Use `lsblk` or `fdisk -l` in the instance OS to list block devices. Also verify attachment status in the ECS console under the instance’s **Cloud Disks** tab.

**Q: What permissions are required to manage snapshots?**  
A: Your RAM user needs `ecs:DescribeSnapshots`, `ecs:CreateSnapshot`, `ecs:DeleteSnapshot`, and related actions. Additionally, `ecs:DescribeImages` is required to check image dependencies before deletion.

**Q: Why does my resized disk still show the old size after?** 
A: After online resizing, you must extend the partition and filesystem inside the OS. For ext4, use `resize2fs /dev/vdb1`. For xfs, use `xfs_growfs /mount/point`.

**Q: How can I reduce snapshot storage costs?**  
A: Delete unnecessary snapshots, especially those not part of a retention policy. Note that only the incremental changes are stored, so keeping a chain of snapshots may be more efficient than isolated full backups.

**Q: Are snapshots encrypted if the source disk is encrypted?**  
A: Yes. Snapshots created from encrypted cloud disks are automatically encrypted using the same key. You cannot decrypt or disable encryption on such snapshots.