# alinux-monitoring

Part of **ALINUX**

<!-- intent-backlink:auto -->

> 💡 **Path Selection**: This skill is one implementation path for [Diagnose and resolve system performance issues](../../intent/alinux-troubleshoot-performance/SKILL.md). If you're unsure which path to take, check the routing skill first.

# Alibaba Cloud Linux System Monitoring and Logging Console Guide

## Operations Overview

| Operation | Console Entry Path | Prerequisites | Description |
|----------|-------------------|--------------|-------------|
| View Kernel Crash Dump Logs | Console > Elastic Compute Service (ECS) > Instances > Connect to Instance | An ECS instance with >2 GB memory; Alibaba Cloud Linux 3 or Anolis OS 8 (kernel ≥5.10.134-14); kexec-tools ≥2.0.25.0.2 | Use kdumpctl in terminal to inspect boot logs after a kernel crash for diagnostics. |
| Perform Memory Diagnostics | Console > Elastic Compute Service (ECS) > Instances > Memory Diagnostics | Instance is running; OS is Alibaba Cloud Linux; root permissions | Run automated memory diagnostics on a selected ECS instance via the console. |
| Enable Monitoring and Logging | Console > CloudMonitor / Cloud Config / ActionTrail / Log Service SLS | CloudMonitor, Cloud Config, ActionTrail, and SLS services enabled; RAM permissions | Activate core monitoring, configuration auditing, operation auditing, and log collection features. |
| View System Health Status | System Management > Managed Instances > Node Health | SysOM component installed; instance is managed | Access health scores, anomaly analysis, and diagnostic reports for managed nodes. |
| Configure Exception Event Alert | Console > Exception Event Alerting > Policy Management | CloudMonitor enabled; Cloud Assistant Agent installed on ECS | Create alert policies for system exceptions (e.g., node downtime) and set up notifications. |
| Configure Alert Policies | CloudMonitor Console > Exception Event Alerting > Policy Management | CloudMonitor service enabled; ECS instance running and managed | Set up alert policies and event subscriptions for abnormal system conditions like high CPU usage. |

## Operation Steps

### View Kernel Crash Dump Logs

**Navigation**: Console > Elastic Compute Service (ECS) > Instances > Connect to Instance

**Prerequisites**:
- An Elastic Compute Service (ECS) instance with more than 2 GB of memory
- Alibaba Cloud Linux 3 or Anolis OS 8 with kernel version 5.10.134-14 or later
- kexec-tools version 2.0.25.0.2 or later

1. Connect to your ECS instance via the **Connect to Instance** feature in the ECS console.
   - Element: **Connect to Instance** (button) — located in the instance actions column
   - Notes: Ensure you have SSH or VNC access configured.

2. In the terminal session, run the kdump status command.
   - Element: **sudo kdumpctl status** (text_input) — main content area
   - Notes: This checks if kdump is active and ready to capture crashes.

3. If kdump is not operational, start it using the start command.
   - Element: **sudo kdumpctl start** (text_input) — main content area

4. Trigger a kernel crash for testing (do not use in production).
   - Element: **echo c | sudo tee /proc/sysrq-trigger** (text_input) — main content area
   - Notes: Warning: This command crashes the kernel immediately and should not be used in production environments.

5. After the instance restarts, reconnect and read the crash log.
   - Element: **sudo kdumpctl readlog** (text_input) — main content area
   - Notes: Important: Run this only after a crash-triggered restart, not after a normal system restart. Boot logs are cleared on full restarts.

### Perform Memory Diagnostics

**Navigation**: Console > Elastic Compute Service (ECS) > Instances > Memory Diagnostics

**Prerequisites**:
- Instance is running
- Instance operating system is Alibaba Cloud Linux
- Root administrator permissions

1. In the ECS instance list, select the target instance.
   - Element: **Select Instance** (checkbox) — instance list area

2. Click the **Memory Diagnostics** button in the Actions column.
   - Element: **Memory Diagnostics** (button) — Actions column
   - Notes: This button is only available when the instance status is "Running".

3. In the diagnostic dialog that appears, click **Start Diagnostics**.
   - Element: **Start Diagnostics** (button) — diagnostic configuration panel
   - Notes: The diagnostic process may take several minutes. Do not close the browser tab.

### Enable Monitoring and Logging

**Navigation**: Console > CloudMonitor > Alert Management / Cloud Config > Resource Compliance / ActionTrail > Event Tracking / Log Service SLS > Logstore Management

**Prerequisites**:
- CloudMonitor service is enabled
- Cloud Config (Configuration Audit) is activated
- ActionTrail (Operation Audit) is enabled
- Log Service (SLS) is accessible
- RAM user has required permissions

1. Log in to the Alibaba Cloud Console.
   - Element: **Console** (link) — top navigation bar

2. Navigate to CloudMonitor from the left-side menu.
   - Element: **CloudMonitor** (menu) — left navigation panel

3. Enable key monitoring metrics for cloud products.
   - Element: **Enable Key Metrics for Specified Cloud Products** (button) — Basic CloudMonitor section

4. Go to Cloud Config for compliance auditing.
   - Element: **Cloud Config** (menu) — left navigation panel

5. Activate the Level 2.0 compliance pre-check feature.
   - Element: **Enable Level 2.0 Cloud Pre-check** (button) — Cloud Config > Level 2.0 Pre-check

6. Access ActionTrail to extend event retention.
   - Element: **ActionTrail** (menu) — left navigation panel

7. Create a trail to deliver events to SLS or OSS.
   - Element: **Create Trail** (button) — ActionTrail > Event Tracking
   - Notes: Default retention is 90 days; creating a trail enables long-term storage.

8. Open Log Service (SLS) and enable log collection.
   - Element: **Enable Log Service** (button) — Log Service SLS > Logstore Management
   - Notes: You can customize log storage duration here.

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Alert Notification Methods | checkbox | No | Phone, SMS, Email, DingTalk Bot, Alibaba Cloud App Notification | Select channels to receive alert notifications |
| Alert Blocklist | text_input | No | — | Enter names of metrics to suppress alerts for |
| Log Storage Duration | number_input | No | — | Set log retention in days (default: 30); maximum adjustable per SLS quotas |

### View System Health Status

**Navigation**: System Management > Managed Instances > Node Health

**Prerequisites**:
- SysOM component installed
- Instance is managed (enrolled in OS management)

1. In the left navigation panel, click **System Management**.
   - Element: **System Management** (menu) — left navigation panel

2. Select **Managed Instances** and search for your instance by ID or name.
   - Element: **Managed Instances** (dropdown) — main content area

3. In the Actions column, click **Node Health**.
   - Element: **Node Health** (link) — Actions column

4. In the anomaly analysis panel, click **Diagnose**.
   - Element: **Diagnose** (button) — Actions column

5. After diagnosis completes, click **View Diagnostic Report**.
   - Element: **View Diagnostic Report** (button) — Actions column

### Configure Exception Event Alert

**Navigation**: Console > Exception Event Alerting > Policy Management

**Prerequisites**:
- CloudMonitor service is enabled
- ECS instance is running and Cloud Assistant Agent is installed

1. Log in to the CloudMonitor console.
   - Element: **CloudMonitor console** (link) — top navigation
   - Notes: Opens in a new tab

2. Navigate to **Policy Management** under Exception Event Alerting.
   - Element: **Exception event alerting > Policy management** (menu) — left-side navigation pane

3. Click **Create policy**.
   - Element: **Create policy** (button) — Policy management page

4. Enter a name for the alert policy.
   - Element: **Policy Name** (text_input) — main content area

5. Select a cluster from the dropdown.
   - Element: **Cluster Name** (dropdown) — main content area

6. Select exception events (e.g., node downtime detection).
   - Element: **Add exception events for nodes and pods** (checkbox) — main content area

7. Move selected events to the right panel.
   - Element: **arrow** (button) — right panel

8. Ensure **Enable for This Edit** is checked.
   - Element: **Enable for This Edit** (checkbox) — main content area
   - Notes: Enabled by default

9. Click **Save** to create the policy.
   - Element: **Save** (button) — bottom of form

10. Set up alert contacts (if not already done).
    - Element: **Alert contacts** (menu) — left navigation
    - Element: **Create alert contact** (button) — top-right corner
    - Notes: Email must be verified before use

11. Create an alert contact group.
    - Element: **Alert contact groups** (menu) — left navigation
    - Element: **Create alert contact group** (button) — top-right corner

12. Go to **Event subscription** and click **Create subscription policy**.
    - Element: **Event subscription** (menu) — left navigation
    - Element: **Create subscription policy** (button) — top-right corner

13. Enter a name for the subscription policy.
    - Element: **Name** (text_input) — main content area

14. Select **System Event** as the subscription type.
    - Element: **System Event** (radio) — main content area

15. In Subscription Scope, enter product code **sysom**.
    - Element: **Product** (text_input) — Subscription scope section
    - Notes: Must enter exact value "sysom"

16. Select **Alibaba Cloud O&M Platform for Operating Systems**.
    - Element: **Alibaba Cloud O&M Platform for Operating Systems** (dropdown) — Subscription scope section

17. Choose event types and levels (default: all).
    - Element: **event type and event level** (dropdown) — Subscription scope section

18. In Notification Settings, select **Create notification configuration**.
    - Element: **Create notification configuration** (dropdown) — Notification settings section

19. Enter a notification policy name.
    - Element: **policy name** (text_input) — Notification settings section

20. Select the previously created contact group.
    - Element: **Contact group** (dropdown) — Notification settings section

21. Click **Submit** to save the subscription.
    - Element: **Submit** (button) — bottom of form

22. Test the configuration using debug.
    - Element: **Debug event subscription** (button) — top-right corner
    - Element: **Product** (text_input) — enter "sysom" in debug dialog
    - Element: **OK** (button) — bottom of dialog to send test notification

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Policy Name | text_input | Yes | — | Enter a descriptive name for the alert policy |
| Cluster Name | dropdown | Yes | — | Select a managed cluster |
| Add exception events for nodes and pods | checkbox | No | — | Select multiple events (e.g., node downtime detection) |
| Name | text_input | No | — | Name for the subscription policy |
| Subscription Type | radio | Yes | System Event | Must select System Event |
| Product | text_input | Yes | — | Enter "sysom" exactly |
| Subscription scope | dropdown | Yes | Alibaba Cloud O&M Platform for Operating Systems | Select the correct platform |
| Event type | dropdown | No | — | Filter by event type if needed |
| Event level | dropdown | No | — | Filter by severity level |
| Notification settings | dropdown | Yes | Create notification configuration | Required selection |
| Contact group | dropdown | Yes | — | Must reference an existing alert contact group |

### Configure Alert Policies

**Navigation**: CloudMonitor Console > Exception Event Alerting > Policy Management

**Prerequisites**:
- CloudMonitor service is enabled
- ECS instance is running and managed

1. In the left navigation, select **Exception Event Alerting** > **Policy Management**.
   - Element: **Exception Event Alerting** (menu) — left navigation bar

2. Click **Create Policy**.
   - Element: **Create Policy** (button) — Policy Management page

3. Enter policy name and select cluster.
   - Element: **Policy Name** (text_input) — main page
   - Element: **Cluster Name** (dropdown) — main page

4. Select **Node Downtime Detection** as an exception event.
   - Element: **Add exception events for nodes and pods** (checkbox) — main page
   - Notes: Use arrow button to move selections to "Selected" list

5. Ensure **Enable for This Edit** is checked and click **Save**.
   - Element: **Save** (button) — page bottom
   - Notes: Enabled by default

6. Go to **Alert Contacts** and create a contact.
   - Element: **Alert Contacts** (menu) — left sidebar
   - Element: **Create Contact** (button) — Alert Contacts page
   - Notes: Email must be activated before use

7. Create a contact group.
   - Element: **Create Contact Group** (button) — Alert Contact Groups page

8. Navigate to **Event Subscription** and click **Create Subscription Policy**.
   - Element: **Create Subscription Policy** (button) — Event Subscription page

9. Enter subscription name, select **System Event**, and input product code **sysom**.
   - Element: **sysom** (text_input) — Subscription Scope
   - Notes: Select "Alibaba Cloud Operating System Intelligent O&M Platform"

10. In Notification Configuration, click **Create Notification Configuration** and select contact group.
    - Element: **Create Notification Configuration** (button) — Notification Configuration dropdown

11. Click **Submit**.
    - Element: **Submit** (button) — page bottom

12. Test with **Debug Event Subscription**.
    - Element: **Debug Event Subscription** (button) — Subscription Policy page

13. Enter **sysom** in the product field.
    - Element: **sysom** (text_input) — product field in dialog

14. Select your subscription and click **OK**.
    - Element: **OK** (button) — dialog bottom

| Parameter | Type | Required | Options/Values | Description |
|-----------|------|----------|----------------|-------------|
| Policy Name | text_input | Yes | — | Name of the alert policy |
| Cluster Name | dropdown | Yes | — | Select a managed cluster |
| Add exception events for nodes and PODs | checkbox | No | Node Downtime Detection, Other Exception Events | Multi-select for event types |
| Name | text_input | Yes | — | Subscription policy name |
| Subscription Type | dropdown | Yes | System Event | Fixed option |
| Product | text_input | Yes | — | Enter "sysom" |
| Event Type | checkbox | No | — | Optional filter |
| Event Level | checkbox | No | — | Optional filter |
| Contact Group | dropdown | Yes | — | Must select an existing group |

## FAQ

Q: Where can I find the memory diagnostics feature for my ECS instance?
A: In the ECS console, go to Instances, select your running Alibaba Cloud Linux instance, and look for the "Memory Diagnostics" button in the Actions column. It is only visible when the instance is in the "Running" state.

Q: What happens if I trigger a kernel crash using sysrq-trigger in production?
A: Never use `echo c | sudo tee /proc/sysrq-trigger` in production—it causes an immediate kernel panic and system crash. Only use it in test environments for kdump validation.

Q: Can I modify an alert policy after it has been created?
A: Yes. In the CloudMonitor console under Exception Event Alerting > Policy Management, you can edit existing policies by clicking the Edit button next to the policy name.

Q: Why don’t I see the "Node Health" option for my instance?
A: The "Node Health" link only appears if your instance is enrolled in the OS management service (i.e., "managed") and has the SysOM agent installed. Check under System Management > Managed Instances.

Q: Do I need to pay for memory diagnostics or kernel crash log analysis?
A: Memory diagnostics include a free quota of 10 runs per instance per month. Kernel crash analysis using kdumpctl is free and included with Alibaba Cloud Linux at no additional cost.

## Pricing & Billing

### Billing Model
- Memory Diagnostics: Free tier with usage-based billing beyond quota
- Exception Event Alerting: Pay-per-alert-trigger model
- Core monitoring and logging services (CloudMonitor, ActionTrail, SLS): Free tiers available with usage-based overages

### Price Reference
| Service | Unit Price |
|--------|------------|
| System Event Alert | ¥0.001 per alert trigger |

### Free Tier
- Memory Diagnostics: 10 free diagnostics per instance per month
- Exception Event Alerting: 100 free alert triggers per month
- Basic CloudMonitor, Cloud Config, ActionTrail, and SLS: Include free quotas as per official documentation

### Billing Notes
- Alert notifications are billed per actual trigger count; minimum unit is 1 alert.
- Delivering ActionTrail events to SLS or OSS incurs additional storage and ingestion fees based on SLS pricing.
- The kdumpctl tool and related kernel crash analysis features are provided at no extra cost on Alibaba Cloud Linux instances.