DaaS / Products / Automated ECS Operations and DR Setup

Automated ECS Operations and DR Setup

Provision and configure Alibaba Cloud Linux ECS instances at scale, use Cloud Assistant to remotely execute configuration and maintenance scripts across the fleet, and set up automated snapshot and disk backup policies for disaster recovery — forming a complete compute operations pipeline from provisioning to protection.

Products involved

Scenario

Use this workflow when deploying a scalable fleet of Alibaba Cloud Linux (Alinux) instances that require zero-touch baseline configuration and automated disaster recovery. It is ideal for CI/CD pipelines, batch clusters, or stateless services where consistent performance tuning and guaranteed point-in-time data protection must be enforced programmatically.

Integration steps

  1. Provision Alinux fleet (alinux-manage-lifecycle): Launch instances with Cloud Assistant pre-enabled.
  2. ``bash aliyun ecs RunInstances --RegionId cn-hangzhou --InstanceType ecs.c7.large \ --ImageId aliyun_3_x64_20G_alibase_20231215.vhd --Amount 5 \ --SecurityGroupId sg-xxx --VSwitchId vsw-xxx --InstanceName "alinux-prod" ``

  3. Define configuration script (ecs-execute-instances): Create a reusable Cloud Assistant command for Alinux tuning.
  4. ``bash aliyun ecs CreateCommand --Type RunShellScript --Name "alinux-tune" \ --CommandContent "echo 'vm.swappiness=10' >> /etc/sysctl.conf && sysctl -p && yum-config-manager --enable alinux3-plus" ``

  5. Execute across fleet: Push the command to target instances without SSH.
  6. ``bash aliyun ecs InvokeCommand --CommandId cmd-xxx --InstanceIds '["i-001","i-002","i-003"]' ``

  7. Create DR snapshot policy (ecs-manage-recovery): Define automated backup rules.
  8. ``bash aliyun ecs CreateAutoSnapshotPolicy --RegionId cn-hangzhou --Name "dr-daily" \ --RetentionDays 30 --TimePoints '["02:00"]' --RepeatWeekdays '[1,2,3,4,5,6,7]' ``

  9. Attach policy to disks: Bind the policy to system/data disks.
  10. ``bash aliyun ecs ApplyAutoSnapshotPolicy --RegionId cn-hangzhou --AutoSnapshotPolicyId sp-xxx \ --DiskIds '["d-001","d-002","d-003"]' ``

  11. Verify pipeline: Check execution logs and policy status.
  12. ``bash aliyun ecs DescribeInvocationResults --InvokeId inv-xxx aliyun ecs DescribeAutoSnapshotPolicyEx --AutoSnapshotPolicyId sp-xxx ``

Architecture

The control plane orchestrates three layers: (1) Provisioning (RunInstances) spins up Alinux nodes in a VPC. (2) Configuration routes payloads through the Cloud Assistant API; the pre-installed agent on each instance receives InvokeCommand requests, executes them locally, and returns stdout/stderr to the ECS console. (3) Protection hooks into the Block Storage service, where CreateAutoSnapshotPolicy schedules incremental disk captures. Snapshot metadata flows to the ECS API, while actual data is stored in isolated OSS-backed volumes, decoupling compute from backup storage.

Prerequisites

Common pitfalls

Typical questions