June 30, 2012 became an infamous date for sysadmins worldwide when Linux servers started crashing en masse. The culprit? A leap second insertion triggered kernel panics through spinlock contention, particularly affecting Java applications and NTP implementations.
[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001] lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0
The error manifests when the kernel's timekeeping subsystem struggles with the discontinuous time jump. The futex
system calls enter infinite loops, consuming 100% CPU as processes contend for timing resources.
For systems still running but experiencing CPU spikes:
# Reset kernel's time_was_set flag
date -s now
For complete prevention before leap second events:
#!/usr/bin/perl -w
# fixtime.pl - Disable leap second handling
use strict;
my $adjtimex = './adjtimex';
$adjtimex = 'adjtimex' if system("which adjtimex >/dev/null 2>&1") == 0;
system("$adjtimex", "--print") == 0 or die "Cannot execute adjtimex";
if (@ARGV) {
system("$adjtimex", "--tick", "10000", "--dontzap") == 0
or die "Failed to adjust tick";
print "Leap second protection enabled\n";
} else {
print "Run with any argument to activate protection\n";
}
Modern solutions involve time smearing instead of discontinuous jumps:
# /etc/ntp.conf
server ntp.example.com iburst xleave
tinker step 0
For legacy systems, consider Marco Marongiu's 24-hour smear approach:
ntpd -x -g
- Dell PowerEdge M610 blades particularly vulnerable
- Custom 3.2.x kernels showed higher crash rates
- Virtualized environments experienced cascading failures
Monitoring revealed interesting patterns:
- Java applications were first to show symptoms
- IMAP servers frequently triggered the spinlock
- Systems with
kdump
often failed to capture logs
Service recovery typically required:
- Stop NTP daemon
- Apply time adjustment
- Restart affected applications
On June 30, 2012, numerous Linux servers running Debian Squeeze (kernel versions 3.1-3.2) experienced hard crashes during the leap second insertion. The crashes manifested as unresponsive systems with console blanking, often without triggering kdump. One crash dump revealed:
[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001] lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0
The issue stemmed from kernel timekeeping handling during leap second transitions, causing:
- CPU hogging futex loops in Java and userspace tools
- Spinlock contention between ntpd and other processes
- Kernel's internal time_was_set variable getting stuck
The simplest solution for affected systems:
date -s now
This command resets the kernel's time_was_set variable and breaks the futex loops. For systems with GNU date installed, it's been strace-verified to work as intended.
For future leap seconds, consider Marco Marongiu's ntpd smearing approach:
ntpd -x
This spreads the leap second adjustment over 24 hours instead of a single jump. Alternative implementation using adjtimex:
#!/usr/bin/perl
use strict;
use warnings;
# fixtime.pl - leap second removal tool
my $adjtimex = (-x './adjtimex') ? './adjtimex' : 'adjtimex';
my $mode = shift || 'check';
my $status = `$adjtimex --print`;
if ($status =~ /status: (\d+)/) {
my $current = $1;
if ($mode eq 'check') {
print "Leap second status: $current\n";
exit;
}
system("$adjtimex --status ".($current & ~0x40));
}
After the event, most systems recovered automatically once the leap second passed. The primary observed impact was temporary VPN (OpenVPN) disconnections during the transition period. Key lessons:
- Disable console blanking for better crash diagnostics
- Test kdump configuration under leap second conditions
- Monitor Java applications closely during time transitions
For detailed technical analysis, see the FastMail.FM postmortem and Marco Marongiu's smearing solution.