# ecs-monitoring-troubleshooting

Part of **ECS**

# ECS Monitoring Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|--------|--------|---------|------------------|
| Missing performance metrics in console | No CPU, memory, disk, or network data shown for ECS instance | High | Install or restart the CloudMonitor plugin |
| CloudMonitor plugin not auto-installed on new instance | New ECS instance lacks monitoring data despite "auto-install" being enabled | Medium | Verify auto-install setting and manually install if needed |
| Monitoring data delay or gaps | Metrics appear intermittently or with significant latency | Low | Check instance network connectivity and plugin status |

## Problem Details

### Problem 1: Missing Performance Metrics in Console

**Symptoms**
- Error message: `No monitoring data available`
- Behavior: The CloudMonitor console shows empty charts for CPU, memory, disk, or network usage for a running ECS instance
- Context: Occurs after instance creation or after OS-level changes (e.g., kernel updates, firewall rules)

**Root Cause**
- The CloudMonitor plugin (also known as the host monitoring agent) is not installed, not running, or unable to communicate with the monitoring backend
- Common triggers include disabled auto-install during instance creation, manual removal of the agent, or restrictive security group rules blocking outbound traffic

**Solution**
1. Log in to the ECS instance via SSH
2. Check if the CloudMonitor plugin is installed and running:
   ```bash
   sudo /usr/local/cloudmonitor/wrapper/bin/cloudmonitor.sh status
   ```
3. If not installed, download and install the plugin:
   ```bash
   wget http://cms-download.aliyun.com/release/agent-linux64.tar.gz
   tar -xzf agent-linux64.tar.gz
   cd agent-linux64
   sudo ./install.sh
   ```
4. Start the service if it’s installed but stopped:
   ```bash
   sudo /usr/local/cloudmonitor/wrapper/bin/cloudmonitor.sh start
   ```
5. Ensure outbound HTTPS (port 443) is allowed in the instance’s security group

**Verification**
- Wait 2–5 minutes, then refresh the CloudMonitor console at `https://cloudmonitornext.console.aliyun.com/`
- Expected behavior: CPU, memory, disk, and network charts display real-time data
- Confirm plugin status returns "running":
  ```bash
  sudo /usr/local/cloudmonitor/wrapper/bin/cloudmonitor.sh status
  ```

### Problem 2: CloudMonitor Plugin Not Auto-Installed on New Instance

**Symptoms**
- Behavior: A newly created ECS instance shows no monitoring data, even though the "Auto-install CloudMonitor on new ECS" toggle was left enabled
- Context: Occurs during or shortly after instance provisioning

**Root Cause**
- The auto-install feature depends on instance metadata service availability and OS compatibility during boot
- Some custom images or minimal OS distributions may lack required dependencies (e.g., curl, systemd), causing installation to fail silently

**Solution**
1. Navigate to the CloudMonitor console:  
   **Navigation path**: `Cloud Resources Monitoring > Host Monitoring`
2. Locate your ECS instance in the host list
3. Check the "Monitoring Status" column — if it shows "Not Installed", proceed manually
4. Manually install the plugin using the steps in Problem 1, Solution step 3
5. Alternatively, enable auto-install globally:
   - In the **Host Monitoring** page, find the **Settings** section
   - Ensure the toggle **"Auto-install CloudMonitor on new ECS"** is **ON**

**Verification**
- After manual installation, verify data appears in the console within 5 minutes
- For future instances, confirm the toggle remains enabled before launching new ECS instances

### Problem 3: Monitoring Data Delay or Gaps

**Symptoms**
- Behavior: Monitoring charts show intermittent data points or delays exceeding 5 minutes
- Context: Typically observed under high network load or during instance migration

**Root Cause**
- Network instability between the ECS instance and CloudMonitor endpoints
- High CPU or I/O pressure on the instance may delay plugin execution
- Plugin process crash due to resource constraints (e.g., low memory)

**Solution**
1. Check instance system logs for plugin errors:
   ```bash
   sudo journalctl -u cloudmonitor --since "1 hour ago"
   ```
2. Restart the CloudMonitor service:
   ```bash
   sudo /usr/local/cloudmonitor/wrapper/bin/cloudmonitor.sh restart
   ```
3. Validate network connectivity to CloudMonitor endpoints:
   ```bash
   curl -v https://metrichub-cms-cn-hangzhou.aliyuncs.com
   ```
   (Replace region as needed; ensure HTTP 200 or 403 response — not timeout)
4. If gaps persist, increase system resources or optimize workload to reduce contention

**Verification**
- Monitor the console for consistent metric updates every 15–60 seconds
- Confirm no error logs appear in `journalctl` output after restart
- Use `top` or `htop` to ensure the `cloudmonitor` process is active and consuming reasonable resources

## FAQ

**Q: How do I check if the CloudMonitor plugin is installed on my ECS instance?**  
A: Run the following command on the instance:  
```bash
sudo /usr/local/cloudmonitor/wrapper/bin/cloudmonitor.sh status
```  
If installed and running, it will return "CloudMonitor is running". If the path doesn’t exist, the plugin is not installed.

**Q: Is the CloudMonitor plugin free to use?**  
A: Yes. The CloudMonitor plugin for ECS host monitoring is a free service and does not incur additional charges.

**Q: Where can I view ECS performance metrics in the console?**  
A: Go to the CloudMonitor console at `https://cloudmonitornext.console.aliyun.com/`, then navigate to **Cloud Resources Monitoring > Host Monitoring**. Your ECS instances will be listed with real-time CPU, memory, disk, and network metrics.

**Q: Can I disable automatic installation of the CloudMonitor plugin for new instances?**  
A: Yes. In the **Host Monitoring** page of the CloudMonitor console, locate the **"Auto-install CloudMonitor on new ECS"** toggle in the settings area and switch it OFF. This only affects future instances.

**Q: What permissions does the CloudMonitor plugin require?**  
A: The plugin runs as a system service and requires standard read access to OS metrics (e.g., `/proc/stat`, `/proc/meminfo`). It does not require elevated IAM roles beyond the instance’s default permissions, as it uses internal endpoints for metric submission.