How to Automate EC2 Disk Space Monitoring with CloudWatch Alarms (AWS Linux)


2 views

When managing multiple EC2 instances running Amazon Linux, one glaring gap in AWS's native monitoring becomes apparent: CloudWatch doesn't track disk space usage by default. While it monitors CPU, memory, and network metrics out-of-the-box, storage monitoring requires custom implementation.

The most elegant approach leverages AWS Systems Manager (SSM) Agent, which comes pre-installed on Amazon Linux AMIs. This eliminates the need for:

  • Manual script deployment
  • Email server configuration
  • Individual server maintenance

First, ensure your instances have the proper IAM role with SSM permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:SendCommand",
        "ssm:CreateAssociation"
      ],
      "Resource": "*"
    }
  ]
}

Create a cron job that pushes disk space data to CloudWatch:

*/5 * * * * ~/monitor_disk.sh

Here's the monitor_disk.sh script:

#!/bin/bash
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
DISK_USAGE=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')

aws cloudwatch put-metric-data \
  --namespace "Custom/EC2" \
  --metric-name "DiskSpaceUtilization" \
  --dimensions "InstanceId=$INSTANCE_ID" \
  --value "$DISK_USAGE" \
  --unit "Percent"

Once metrics are flowing, configure alarms in CloudWatch:

aws cloudwatch put-metric-alarm \
  --alarm-name "High-Disk-Usage-${INSTANCE_ID}" \
  --alarm-description "Alarm when disk usage exceeds 80%" \
  --metric-name "DiskSpaceUtilization" \
  --namespace "Custom/EC2" \
  --statistic "Maximum" \
  --period 300 \
  --threshold 80 \
  --comparison-operator "GreaterThanThreshold" \
  --evaluation-periods 2 \
  --alarm-actions "arn:aws:sns:us-east-1:123456789012:DiskSpace-Alerts" \
  --dimensions "Name=InstanceId,Value=${INSTANCE_ID}"

For larger environments, consider a Lambda-based solution:

import boto3
import datetime

def lambda_handler(event, context):
    ec2 = boto3.resource('ec2')
    cloudwatch = boto3.client('cloudwatch')
    
    for instance in ec2.instances.all():
        ssm = boto3.client('ssm')
        command = "df --output=pcent / | sed -n '2s/%//p'"
        response = ssm.send_command(
            InstanceIds=[instance.id],
            DocumentName="AWS-RunShellScript",
            Parameters={'commands': [command]}
        )
        # Process response and push to CloudWatch

To monitor additional partitions, modify the script:

#!/bin/bash
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

df -h | awk 'NR>1 {
  mount_point = $NF
  gsub(/\//, "", mount_point)
  gsub(/ /, "_", mount_point)
  print "aws cloudwatch put-metric-data",
    "--namespace \"Custom/EC2\"",
    "--metric-name \"DiskUsage_" mount_point "\"",
    "--dimensions \"InstanceId=$INSTANCE_ID\"",
    "--value " $5 "%",
    "--unit \"Percent\""
}' | bash

While AWS CloudWatch provides excellent monitoring for CPU, memory, and network metrics, it surprisingly doesn't include disk space monitoring out of the box for EC2 instances. This creates a critical visibility gap, especially for production systems where disk space issues can cause sudden outages.

The most elegant solution is to use the unified CloudWatch agent, which can collect disk metrics without requiring custom scripts. Here's how to implement it:


# Install the CloudWatch agent
sudo yum install -y amazon-cloudwatch-agent

# Create configuration file
sudo vi /opt/aws/amazon-cloudwatch-agent/bin/config.json

Example configuration for disk monitoring:


{
  "metrics": {
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "disk": {
        "measurement": [
          "used_percent"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      }
    }
  }
}

After the agent starts sending metrics (visible in CloudWatch as DiskSpaceUtilization), create alarms:


aws cloudwatch put-metric-alarm \
  --alarm-name "High-Disk-Utilization" \
  --alarm-description "Alarm when disk usage exceeds 85%" \
  --metric-name DiskSpaceUtilization \
  --namespace CWAgent \
  --statistic Average \
  --period 300 \
  --threshold 85 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:disk-space-alerts

For environments where you can't install the CloudWatch agent, AWS Systems Manager offers another path:


# Create an SSM document to check disk space
aws ssm create-document \
  --content file://disk_check.json \
  --name "DiskSpaceCheck" \
  --document-type "Command"

Then set up a maintenance window to run this periodically and trigger alerts through EventBridge.

Remember these key points when implementing disk monitoring:

  • Monitor all mounted filesystems, not just root
  • Set different thresholds for different mount points (e.g., 90% for /var, 80% for /)
  • Include instance tags in your alarms for better identification
  • Consider log rotation policies if /var/log is filling up