DaaS / Products / Closed-Loop Infrastructure Alert with Delivery Verification

Closed-Loop Infrastructure Alert with Delivery Verification

Combine full-stack ECS/RDS observability and analytics lake ingestion (skill 4) with EventBridge-driven multi-channel fan-out plus closed-loop delivery receipt tracking (skill 2): infrastructure anomalies trigger EventBridge rules that dispatch SMS/WhatsApp via Twilio and branded email via Resend to on-call teams, while returning delivery receipts, bounces, and opens are validated and correlated back to the originating incident in the analytics lake — enabling post-mortem queries like 'did the on-call engineer actually receive and open the P1 alert before the outage escalated?'

Products involved

Scenario

Use this integration when infrastructure anomalies in ECS/RDS require immediate, multi-channel on-call alerts with verifiable delivery tracking. It bridges real-time observability with closed-loop receipt validation, enabling post-incident audits to confirm whether P1 alerts were delivered, opened, and acknowledged before escalation.

Integration steps

  1. Instrument ECS/RDS Observability: Deploy CloudWatch metrics and RDS Performance Insights. Publish alarm state changes to EventBridge: aws events put-rule --name "InfraP1Alert" --event-pattern '{"source": ["aws.ecs", "aws.rds"], "detail-type": ["CloudWatch Alarm State Change"]}'.
  2. Configure EventBridge Fan-Out: Use eb-deliver-destinations to register API targets. Create destinations for Twilio (POST https://api.twilio.com/2010-04-01/Accounts/{SID}/Messages.json) and Resend (POST https://api.resend.com/emails).
  3. Dispatch Multi-Channel Alerts: Map the rule to twilio-send-notification and Resend. Inject incident metadata: {"To": "+1555...", "Body": "P1 ECS CPU > 90% | ID: ${detail.alarmName}", "StatusCallback": "https://your-ecs-endpoint/webhooks/twilio"}. For Resend, include {"to": ["[email protected]"], "subject": "P1 Alert", "tags": [{"name": "incident_id", "value": "${detail.alarmName}"}]}.
  4. Ingest & Validate Webhooks: Deploy an ECS task running twilio-handle-validation and resend-handle-events. Verify Twilio via X-Twilio-Signature and Resend via Authorization: Bearer <WEBHOOK_SECRET>. Reject unverified payloads with 401.
  5. Normalize & Correlate Receipts: Extract MessageSid/email_id, status, and timestamp. Stream normalized JSON to the DataWorks analytics lake: POST /api/v1/ingest/events with payload {incident_id, channel, status, timestamp, recipient}.
  6. Query Closed-Loop Status: Join dispatch logs with receipts in OpenSearch/Supabase. Run: SELECT incident_id, channel, status FROM alert_events WHERE status = 'opened' AND timestamp < escalation_time.

Architecture

ECS/RDS metrics trigger CloudWatch alarms, which emit events to EventBridge. EventBridge rules fan out to Twilio and Resend API destinations. Delivery receipts, opens, and bounces return as webhooks to an ECS-hosted validation service. Validated events are normalized and streamed into the DataWorks analytics lake, where they are indexed in OpenSearch/Supabase for correlation with original incident payloads.

Prerequisites

Common pitfalls

Typical questions

FAQ

Q: How does the closed-loop infrastructure alert system verify delivery confirmation? A: The platform triggers EventBridge rules to dispatch SMS, WhatsApp, and branded email alerts, then validates returned delivery receipts, bounces, and opens to confirm successful delivery. These engagement metrics are correlated back to the originating incident in an analytics lake, allowing teams to verify post-mortem whether on-call engineers actually received and opened critical notifications.