Leap Second Bug in Linux: Debugging Kernel Spinlock Crashes and NTP Workarounds

June 30, 2012 became an infamous date for sysadmins worldwide when Linux servers started crashing en masse. The culprit? A leap second insertion triggered kernel panics through spinlock contention, particularly affecting Java applications and NTP implementations.

[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001]  lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0

The error manifests when the kernel's timekeeping subsystem struggles with the discontinuous time jump. The futex system calls enter infinite loops, consuming 100% CPU as processes contend for timing resources.

For systems still running but experiencing CPU spikes:

# Reset kernel's time_was_set flag
date -s now

For complete prevention before leap second events:

#!/usr/bin/perl -w
# fixtime.pl - Disable leap second handling
use strict;

my $adjtimex = './adjtimex';
$adjtimex = 'adjtimex' if system("which adjtimex >/dev/null 2>&1") == 0;

system("$adjtimex", "--print") == 0 or die "Cannot execute adjtimex";

if (@ARGV) {
    system("$adjtimex", "--tick", "10000", "--dontzap") == 0
        or die "Failed to adjust tick";
    print "Leap second protection enabled\n";
} else {
    print "Run with any argument to activate protection\n";
}

Modern solutions involve time smearing instead of discontinuous jumps:

# /etc/ntp.conf
server ntp.example.com iburst xleave
tinker step 0

For legacy systems, consider Marco Marongiu's 24-hour smear approach:

ntpd -x -g

Dell PowerEdge M610 blades particularly vulnerable
Custom 3.2.x kernels showed higher crash rates
Virtualized environments experienced cascading failures

Monitoring revealed interesting patterns:

Java applications were first to show symptoms
IMAP servers frequently triggered the spinlock
Systems with kdump often failed to capture logs

Service recovery typically required:

Stop NTP daemon
Apply time adjustment
Restart affected applications

On June 30, 2012, numerous Linux servers running Debian Squeeze (kernel versions 3.1-3.2) experienced hard crashes during the leap second insertion. The crashes manifested as unresponsive systems with console blanking, often without triggering kdump. One crash dump revealed:

[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001]  lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0

The issue stemmed from kernel timekeeping handling during leap second transitions, causing:

CPU hogging futex loops in Java and userspace tools
Spinlock contention between ntpd and other processes
Kernel's internal time_was_set variable getting stuck

The simplest solution for affected systems:

date -s now

This command resets the kernel's time_was_set variable and breaks the futex loops. For systems with GNU date installed, it's been strace-verified to work as intended.

For future leap seconds, consider Marco Marongiu's ntpd smearing approach:

ntpd -x

This spreads the leap second adjustment over 24 hours instead of a single jump. Alternative implementation using adjtimex:

#!/usr/bin/perl
use strict;
use warnings;

# fixtime.pl - leap second removal tool
my $adjtimex = (-x './adjtimex') ? './adjtimex' : 'adjtimex';

my $mode = shift || 'check';

my $status = `$adjtimex --print`;
if ($status =~ /status: (\d+)/) {
    my $current = $1;
    if ($mode eq 'check') {
        print "Leap second status: $current\n";
        exit;
    }
    system("$adjtimex --status ".($current & ~0x40));
}

After the event, most systems recovered automatically once the leap second passed. The primary observed impact was temporary VPN (OpenVPN) disconnections during the transition period. Key lessons:

Disable console blanking for better crash diagnostics
Test kdump configuration under leap second conditions
Monitor Java applications closely during time transitions

For detailed technical analysis, see the FastMail.FM postmortem and Marco Marongiu's smearing solution.

ServerDevWorker

Leap Second Bug in Linux: Debugging Kernel Spinlock Crashes and NTP Workarounds

Related Articles