When examining system logs on my CentOS 5 EC2 instance, I discovered an unexpected reboot occurred on April 6 that wasn't initiated by me. The last
command output shows:
reboot system boot 2.6.18-xenU-ec2- Wed Apr 6 15:48 (1+05:27)
No user sessions were active at the time, ruling out manual intervention.
AWS EC2 can automatically reboot instances under specific circumstances:
- Host system hardware failure or maintenance
- Underlying hypervisor issues
- Critical security updates requiring reboot (less common for CentOS 5)
Check AWS's event history with the CLI:
aws ec2 describe-instance-status --instance-id i-1234567890abcdef0 \
--include-all-instances \
--query 'InstanceStatuses[].Events[]'
First, verify system logs around the reboot time:
grep -i "reboot" /var/log/messages*
grep -i "shutdown" /var/log/messages*
For EC2-specific insights, check cloud-init logs:
cat /var/log/cloud-init.log
cat /var/log/cloud-init-output.log
To rule out malicious activity:
# Check cron jobs
cat /etc/crontab
ls -la /etc/cron.*
# Review sudo logs
grep -i sudo /var/log/secure*
# Verify authorized_keys
ls -la ~/.ssh/
For critical instances, consider:
- Setting instance termination protection
- Configuring CloudWatch alarms for unexpected state changes
- Using EC2 Auto Recovery for supported instance types
Example CloudWatch alarm setup via CLI:
aws cloudwatch put-metric-alarm \
--alarm-name "EC2-Reboot-Alarm" \
--metric-name StatusCheckFailed_Instance \
--namespace AWS/EC2 \
--statistic Maximum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-sns-topic \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0
The first smoking gun appears in the last
command output showing:
reboot system boot 2.6.18-xenU-ec2- Wed Apr 6 15:48 (1+05:27)
No matching SSH session precedes this reboot event, ruling out manual intervention.
AWS documentation confirms several automatic reboot scenarios:
- Host Maintenance: Hardware failures or hypervisor updates
- Instance Health Check Failure: Failed status checks trigger automatic recovery
- Spot Instance Termination: Though this would terminate, not reboot
Retrieve the exact reason through AWS CLI:
aws ec2 describe-instance-status --instance-id i-1234567890abcdef0 \
--query 'InstanceStatuses[].SystemStatus.Details[]'
Sample response indicating maintenance:
{
"Name": "reachability",
"Status": "passed",
"ImpairedSince": "2022-04-06T15:48:00Z"
}
Kernel Panic Analysis
Check for crashes in /var/log/messages
:
grep -i "kernel panic" /var/log/messages
Memory Threshold Monitoring
CentOS 5's aggressive OOM killer settings:
# Current OOM killer configuration
cat /proc/sys/vm/panic_on_oom
SSH Key Audit
Verify authorized_keys integrity:
stat -c %Y /home/*/.ssh/authorized_keys
find / -name "authorized_keys" -mtime -2
Hidden Process Detection
Check for unlogged reboots via utmp:
apt-get install utmpdump
utmpdump /var/log/wtmp | grep reboot
Configure CloudWatch alarms for reboot events:
aws cloudwatch put-metric-alarm \
--alarm-name "InstanceReboot" \
--metric-name StatusCheckFailed_System \
--namespace AWS/EC2 \
--statistic Maximum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:reboot-notify