# ecs-cloud_assistant

Part of **ECS**

# ECS Cloud Assistant Troubleshooting Guide

## Problem Index

| Problem | Symptom | Severity | Solution Summary |
|------|------|---------|------------|
| Cloud Assistant Agent Not Running | Error: `ClientNotRunning` | High | Start or reinstall the Cloud Assistant agent service |
| Command Delivery Timeout | Error: `DeliveryTimeout` | Medium | Retry command execution and verify network connectivity |
| Instance Not in Running State | Error: `InstanceNotRunning` | High | Start the instance before executing Cloud Assistant commands |
| Network Connectivity Blocked | Error: `ClientNetworkBlocked` | High | Configure security group rules to allow outbound traffic on ports 443, 80, and 53 |
| Command Execution Timeout | Error: `ExecutionTimeout` | Medium | Increase the timeout duration beyond the default 60 seconds |
| Security Group Rules Blocking Access | Error: `SecurityGroupRuleDenied` | High | Update security group rules to allow access to Cloud Assistant service endpoints |
| Invalid Cron Expression | Error: `BadCronExpression` | Low | Correct the cron expression syntax according to scheduling documentation |

## Problem Details

### Problem 1: Cloud Assistant Agent Not Running

**Symptoms**
- Error message: `ClientNotRunning`
- Behavior: Commands fail to execute with agent status showing as stopped
- Context: Occurs when the Cloud Assistant agent service is not installed or has been stopped

**Root Cause**
The Cloud Assistant agent process is not running on the instance. This can happen if the service was manually stopped, failed to start during system boot, or was never installed.

**Solution**
1. Connect to the instance via VNC or SSH
2. Check if the Cloud Assistant agent process is running:
   ```bash
   ps aux | grep cloud-assistant
   ```
3. If the process is not running, start the service:
   ```bash
   sudo systemctl start cloud-assistant
   ```
4. Enable the service to start automatically on boot:
   ```bash
   sudo systemctl enable cloud-assistant
   ```
5. Verify the service status:
   ```bash
   sudo systemctl status cloud-assistant
   ```

**Verification**
- Execute a simple test command through the Cloud Assistant console
- Check that the command execution status shows as "Success"
- Confirm the agent process is running with `ps aux | grep cloud-assistant`

### Problem 2: Command Delivery Timeout

**Symptoms**
- Error message: `DeliveryTimeout`
- Behavior: Commands remain in "Sending" state and eventually fail
- Context: Typically occurs during initial command delivery from the Cloud Assistant service to the agent

**Root Cause**
The Cloud Assistant service cannot deliver the command to the agent within the expected timeframe. This is usually caused by network connectivity issues between the instance and Cloud Assistant service endpoints.

**Solution**
1. Verify the instance has outbound internet access:
   ```bash
   curl -I https://ecs.aliyuncs.com
   ```
2. Ensure security group rules allow outbound traffic on required ports:
   - TCP port 443 (HTTPS)
   - TCP port 80 (HTTP) 
   - UDP port 53 (DNS)
3. Check if the instance can resolve Cloud Assistant domains:
   ```bash
   nslookup ecs-cn-hangzhou.aliyuncs.com
   ```
4. Restart the Cloud Assistant agent service:
   ```bash
   sudo systemctl restart cloud-assistant
   ```
5. Retry the command execution after verifying connectivity

**Verification**
- Monitor the command execution status in the Cloud Assistant console
- Check Cloud Assistant agent logs for successful command receipt:
   ```bash
   sudo tail -f /var/log/cloud-assistant.log
   ```
- Confirm the command completes successfully within the timeout period

### Problem 3: Instance Not in Running State

**Symptoms**
- Error message: `InstanceNotRunning`
- Behavior: Command execution fails immediately with status "Failed"
- Context: Attempting to run Cloud Assistant commands on stopped or pending instances

**Root Cause**
Cloud Assistant commands can only be executed on instances that are in the "Running" state. The error occurs when commands are sent to instances that are stopped, starting, or in any non-running state.

**Solution**
1. Navigate to the ECS console and locate the target instance
2. Check the current instance state in the instance list
3. If the instance is stopped, start it:
   - Select the instance in the console
   - Click "Start" in the action menu
   - Wait for the instance status to change to "Running"
4. Once the instance is running, retry the Cloud Assistant command

**Verification**
- Confirm the instance status shows as "Running" in the ECS console
- Execute a simple Cloud Assistant command (e.g., `echo "test"`)
- Verify the command executes successfully and returns the expected output

### Problem 4: Network Connectivity Blocked

**Symptoms**
- Error message: `ClientNetworkBlocked`
- Behavior: Agent cannot communicate with Cloud Assistant service endpoints
- Context: Instances in restricted network environments or with overly restrictive security groups

**Root Cause**
The instance's network configuration blocks outbound connections to Cloud Assistant service endpoints. This includes missing security group rules, VPC route table misconfigurations, or local firewall restrictions.

**Solution**
1. Configure security group outbound rules to allow:
   - Destination: Any (`0.0.0.0/0`)
   - Protocol: TCP
   - Port range: 443 (HTTPS)
   - Protocol: TCP  
   - Port range: 80 (HTTP)
   - Protocol: UDP
   - Port range: 53 (DNS)
2. Verify VPC route tables have a default route to the internet gateway
3. Check local firewall settings (if applicable):
   ```bash
   # For firewalld
   sudo firewall-cmd --list-all
   
   # For iptables
   sudo iptables -L
   ```
4. Test connectivity to Cloud Assistant endpoints:
   ```bash
   telnet ecs-cn-hangzhou.aliyuncs.com 443
   ```

**Verification**
- Use the Cloud Assistant console to execute a test command
- Monitor network connectivity with tools like `tcpdump` or `netstat`
- Confirm successful command execution and result retrieval

### Problem 5: Command Execution Timeout

**Symptoms**
- Error message: `ExecutionTimeout`
- Behavior: Long-running commands fail after 60 seconds (default timeout)
- Context: Commands that require extended execution time, such as large file transfers or complex scripts

**Root Cause**
The command execution exceeds the default timeout period of 60 seconds. Cloud Assistant terminates commands that don't complete within the specified timeout window.

**Solution**
1. When creating the command in the Cloud Assistant console:
   - Locate the "Timeout" parameter field
   - Increase the timeout value to accommodate the expected execution duration
   - Maximum timeout can be set up to 6 hours (21600 seconds)
2. For API calls, specify the `Timeout` parameter:
   ```json
   {
     "CommandId": "your-command-id",
     "Timeout": 300
   }
   ```
3. Optimize long-running commands by breaking them into smaller, sequential commands
4. Implement progress tracking within scripts to provide intermediate feedback

**Verification**
- Execute the command with the increased timeout setting
- Monitor the command execution progress in the Cloud Assistant console
- Confirm the command completes successfully and returns the expected results

### Problem 6: Security Group Rules Blocking Access

**Symptoms**
- Error message: `SecurityGroupRuleDenied`
- Behavior: Specific security group rules block access to Cloud Assistant service IPs
- Context: Overly restrictive inbound or outbound security group rules

**Root Cause**
Security group rules explicitly deny access to Cloud Assistant service endpoints. The error message typically includes the specific security group ID and blocked IP addresses.

**Solution**
1. Identify the problematic security group from the error message
2. Navigate to the Security Groups section in the ECS console
3. Edit the security group rules to add outbound permissions:
   - Rule direction: Outbound
   - Action: Allow
   - Protocol type: All protocols (or specifically HTTPS/TCP)
   - Destination: `0.0.0.0/0` (or specific Cloud Assistant service IPs if known)
   - Priority: Set appropriately (lower numbers = higher priority)
4. If using network ACLs, ensure they also permit the required traffic
5. Apply the changes and wait a few minutes for propagation

**Verification**
- Attempt to execute a Cloud Assistant command
- Check that the command status progresses beyond the initial delivery phase
- Verify successful completion and result retrieval

### Problem 7: Invalid Cron Expression

**Symptoms**
- Error message: `BadCronExpression`
- Behavior: Scheduled commands fail to create or execute
- Context: Creating recurring Cloud Assistant commands with malformed cron expressions

**Root Cause**
The provided cron expression doesn't conform to the expected format or contains invalid values. Common issues include incorrect field counts, invalid characters, or out-of-range values.

**Solution**
1. Validate the cron expression format (5 fields for standard cron):
   - Minute (0-59)
   - Hour (0-23) 
   - Day of month (1-31)
   - Month (1-12)
   - Day of week (0-7, where 0 and 7 = Sunday)
2. Use online cron validators to test expressions before deployment
3. For Cloud Assistant-specific requirements:
   - Ensure GMT offset format is correct if specified
   - Verify hour values don't contain leading zeros
   - Confirm minute values are within 0-59 range
4. Example valid expressions:
   ```text
   # Run every day at 2:30 AM
   30 2 * * *
   
   # Run every Monday at 9:00 AM
   0 9 * * 1
   ```

**Verification**
- Create a test scheduled command with the corrected cron expression
- Verify the command is accepted without validation errors
- Monitor the first scheduled execution to confirm proper timing

## FAQ

**Q: How do I check if the Cloud Assistant agent is properly installed and running?**
A: Connect to your instance and run `sudo systemctl status cloud-assistant` to check the service status. You can also verify the agent process is running with `ps aux | grep cloud-assistant`. The agent should show as "active (running)" in the systemctl output.

**Q: What network ports and protocols does Cloud Assistant require?**
A: Cloud Assistant requires outbound connectivity on TCP port 443 (HTTPS), TCP port 80 (HTTP), and UDP port 53 (DNS). Ensure your security groups and network ACLs allow outbound traffic to these ports, particularly to Alibaba Cloud service endpoints.

**Q: How can I increase the command execution timeout beyond the default 60 seconds?**
A: In the Cloud Assistant console, you can specify a custom timeout value when creating commands. The timeout can be set from 10 seconds up to 6 hours (21600 seconds). For API calls, use the `Timeout` parameter in your request payload.

**Q: What permissions are needed to execute Cloud Assistant commands on ECS instances?**
A: The Cloud Assistant agent runs with root privileges on Linux instances and System privileges on Windows instances by default. You can specify alternative users when creating commands, but those users must exist on the target instance and have appropriate permissions for the commands being executed.

**Q: How do I troubleshoot Cloud Assistant commands that show "Success" but produce unexpected results?**
A: Check the command output in the Cloud Assistant console execution details. Even with exit code 0 (success), commands might produce unexpected results due to logic errors. Review the actual command output, verify environment variables, and ensure the working directory is correctly set for your operations.