Leap Second Bug in Linux: Debugging Kernel Spinlock Crashes and NTP Workarounds


3 views

June 30, 2012 became an infamous date for sysadmins worldwide when Linux servers started crashing en masse. The culprit? A leap second insertion triggered kernel panics through spinlock contention, particularly affecting Java applications and NTP implementations.

[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001]  lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0

The error manifests when the kernel's timekeeping subsystem struggles with the discontinuous time jump. The futex system calls enter infinite loops, consuming 100% CPU as processes contend for timing resources.

For systems still running but experiencing CPU spikes:

# Reset kernel's time_was_set flag
date -s now

For complete prevention before leap second events:

#!/usr/bin/perl -w
# fixtime.pl - Disable leap second handling
use strict;

my $adjtimex = './adjtimex';
$adjtimex = 'adjtimex' if system("which adjtimex >/dev/null 2>&1") == 0;

system("$adjtimex", "--print") == 0 or die "Cannot execute adjtimex";

if (@ARGV) {
    system("$adjtimex", "--tick", "10000", "--dontzap") == 0
        or die "Failed to adjust tick";
    print "Leap second protection enabled\n";
} else {
    print "Run with any argument to activate protection\n";
}

Modern solutions involve time smearing instead of discontinuous jumps:

# /etc/ntp.conf
server ntp.example.com iburst xleave
tinker step 0

For legacy systems, consider Marco Marongiu's 24-hour smear approach:

ntpd -x -g
  • Dell PowerEdge M610 blades particularly vulnerable
  • Custom 3.2.x kernels showed higher crash rates
  • Virtualized environments experienced cascading failures

Monitoring revealed interesting patterns:

  • Java applications were first to show symptoms
  • IMAP servers frequently triggered the spinlock
  • Systems with kdump often failed to capture logs

Service recovery typically required:

  1. Stop NTP daemon
  2. Apply time adjustment
  3. Restart affected applications

On June 30, 2012, numerous Linux servers running Debian Squeeze (kernel versions 3.1-3.2) experienced hard crashes during the leap second insertion. The crashes manifested as unresponsive systems with console blanking, often without triggering kdump. One crash dump revealed:

[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358
[3161000.864001]  lock: ffff88083fc0d740, .magic: dead4ead, .owner: imapd/24737, .owner_cpu: 0

The issue stemmed from kernel timekeeping handling during leap second transitions, causing:

  • CPU hogging futex loops in Java and userspace tools
  • Spinlock contention between ntpd and other processes
  • Kernel's internal time_was_set variable getting stuck

The simplest solution for affected systems:

date -s now

This command resets the kernel's time_was_set variable and breaks the futex loops. For systems with GNU date installed, it's been strace-verified to work as intended.

For future leap seconds, consider Marco Marongiu's ntpd smearing approach:

ntpd -x

This spreads the leap second adjustment over 24 hours instead of a single jump. Alternative implementation using adjtimex:

#!/usr/bin/perl
use strict;
use warnings;

# fixtime.pl - leap second removal tool
my $adjtimex = (-x './adjtimex') ? './adjtimex' : 'adjtimex';

my $mode = shift || 'check';

my $status = `$adjtimex --print`;
if ($status =~ /status: (\d+)/) {
    my $current = $1;
    if ($mode eq 'check') {
        print "Leap second status: $current\n";
        exit;
    }
    system("$adjtimex --status ".($current & ~0x40));
}

After the event, most systems recovered automatically once the leap second passed. The primary observed impact was temporary VPN (OpenVPN) disconnections during the transition period. Key lessons:

  • Disable console blanking for better crash diagnostics
  • Test kdump configuration under leap second conditions
  • Monitor Java applications closely during time transitions

For detailed technical analysis, see the FastMail.FM postmortem and Marco Marongiu's smearing solution.