AWS EC2: Technical Deep Dive into Reboot vs. Stop/Start Instance Operations


2 views

When working with Amazon EC2 instances, you'll encounter two distinct methods for restarting:

# Reboot operation
aws ec2 reboot-instances --instance-ids i-1234567890abcdef0

# Stop/Start sequence
aws ec2 stop-instances --instance-ids i-1234567890abcdef0
aws ec2 start-instances --instance-ids i-1234567890abcdef0

A reboot is essentially a software-level restart where:

  • The hypervisor sends an ACPI reset signal
  • No hardware resources are reallocated
  • The instance maintains its:
    • Private/public IP addresses (without Elastic IP)
    • Instance store volumes (if any)
    • Placement within the physical host

In contrast, stop/start involves:

1. Complete deallocation of virtual machine resources
2. Potential migration to new underlying hardware
3. New IP assignment (unless using Elastic IP)
4. Full instance initialization sequence

From my benchmarks (m5.large instances in us-east-1):

Operation Average Duration IP Retention
Reboot 45-60 seconds Yes
Stop/Start 3-5 minutes No*

*Without Elastic IP association

Use reboot-instances when:

# Example: Applying kernel updates
sudo yum update kernel -y
aws ec2 reboot-instances --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id)

Opt for stop/start when:

# Example: Changing instance type
aws ec2 stop-instances --instance-ids i-1234567890abcdef0
aws ec2 modify-instance-attribute --instance-id i-1234567890abcdef0 --instance-type m5.xlarge
aws ec2 start-instances --instance-ids i-1234567890abcdef0

Issue: Instance becomes unresponsive after reboot
Solution: Try stop/start to force hardware-level reset

Issue: Need to preserve ephemeral IP during maintenance
Solution: Always use reboot unless changing instance attributes


When you call ec2.rebootInstances(), AWS performs a soft reboot at the hypervisor level - essentially the equivalent of pressing the reset button on a physical server. This operation typically completes in 60-120 seconds. In contrast, stop/start operations (ec2.stopInstances() followed by ec2.startInstances()) involve:

# Python example using boto3
import boto3

ec2 = boto3.client('ec2')

# Fast reboot (hypervisor-level)
response = ec2.reboot_instances(InstanceIds=['i-1234567890abcdef0'])

# Full stop/start (instance lifecycle change)
ec2.stop_instances(InstanceIds=['i-1234567890abcdef0'])
waiter = ec2.get_waiter('instance_stopped')
waiter.wait(InstanceIds=['i-1234567890abcdef0'])
ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

Rebooting preserves all instance attributes including:

  • Public/private IP addresses (for non-Elastic IP cases)
  • Instance store volumes (ephemeral storage)
  • All in-memory processes and data

Stop/start operations fundamentally change the instance lifecycle:

  • Non-Elastic IP addresses are released back to the pool
  • Instance store volumes are erased (EBS volumes persist)
  • The instance may move to different underlying hardware

Reboot (ec2.rebootInstances):

  • Application-level issues needing OS restart
  • Kernel parameter changes requiring reboot
  • When IP persistence is critical

Stop/Start:

  • Changing instance type (e.g., t2.micro → t2.large)
  • Moving to different tenancy (dedicated vs. shared)
  • When you want a "clean slate" hardware state

The reboot operation executes through the Xen hypervisor's control plane, while stop/start triggers these AWS internal processes:

  1. Stop: Instance state saved to persistent storage
  2. Resource deallocation (compute, network)
  3. Start: New resource allocation from available capacity
  4. Storage reattachment and state restoration

Here's how you might implement intelligent recovery in Lambda:

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    instance_id = event['detail']['instance-id']
    
    # First try a reboot
    try:
        ec2.reboot_instances(InstanceIds=[instance_id])
        print(f"Soft reboot initiated for {instance_id}")
    except Exception as e:
        print(f"Reboot failed, attempting stop/start: {str(e)}")
        ec2.stop_instances(InstanceIds=[instance_id])
        waiter = ec2.get_waiter('instance_stopped')
        waiter.wait(InstanceIds=[instance_id])
        ec2.start_instances(InstanceIds=[instance_id])
Operation Average Duration IP Change Hardware Change
Reboot 90 sec No No
Stop/Start 4-7 min Yes* Possible

*Except when using Elastic IPs