How to Capture Kernel Panic Logs After a System Hang in Linux (Gentoo 2.6.x)


2 views

When a Linux server experiences a kernel panic and hangs, the most frustrating part is often the lack of post-mortem debugging information. This becomes particularly problematic in production environments where root cause analysis is critical for preventing recurrence. In your case with Gentoo Linux running a 2.6.x kernel, the hard reboot effectively wiped any volatile memory containing the panic details.

The default syslog daemon (including syslog-ng) typically writes to disk buffers, which may not get flushed during a panic. Kernel ring buffer messages (viewable via dmesg) are also volatile and lost on reboot. This creates a perfect storm for disappearing crash information.

Here are several approaches to ensure you capture kernel panic information:

1. Kernel SysRq Magic Key

Enable the magic SysRq key to force a crash dump:

# Add to /etc/sysctl.conf
kernel.sysrq = 1

# For immediate effect
echo 1 > /proc/sys/kernel/sysrq

During a hang, you can then trigger a crash dump with Alt+SysRq+c (on local console).

2. Netconsole for Remote Logging

Configure netconsole to send kernel messages to a remote syslog server:

# Load module with parameters
modprobe netconsole netconsole=@192.168.1.100/eth0,@192.168.1.200/00:11:22:33:44:55

# Make persistent in /etc/modprobe.d/netconsole.conf
options netconsole netconsole=@192.168.1.100/eth0,@192.168.1.200/00:11:22:33:44:55

3. Kdump Configuration

For more detailed crash analysis, set up kdump:

# Install kdump tools
emerge sys-apps/kexec-tools

# Configure in /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31

For systems without remote logging capabilities, consider serial console logging or USB-based solutions:

# Enable console logging to USB
dmesg -n 7
dmesg > /mnt/usb/debug.log

After implementing any solution, test your setup by manually triggering a panic:

# Trigger a kernel panic (for testing only!)
echo c > /proc/sysrq-trigger

Remember to perform this test during a maintenance window as it will crash your system.


When your colocated Gentoo server hard locks during a kernel panic, the standard logging mechanisms often fail precisely when you need them most. The 2.6.x kernel series (while stable) presents particular challenges for crash forensics because it predates some modern persistence mechanisms.

The fundamental issue lies in the buffer management of the kernel ring buffer (kmsg):

# Kernel message buffers exist in volatile memory
dmesg | grep -i "buffer"
[    0.000000] DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
[    0.000000] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations

Implement these settings in /etc/syslog-ng/syslog-ng.conf:

options {
    chain_hostnames(off);
    flush_lines(0);
    use_fqdn(no);
    owner("root");
    group("adm");
    perm(0640);
    stats_freq(0);
    bad_hostname("^gconfd$");
    # Critical persistence settings:
    flush_timeout(1000);  # ms before writing to disk
    log_fifo_size(2000);  # bigger buffer for crashes
};

source src {
    system();
    internal();
    # Add kernel logging source
    kmsg();
};

destination paniclog {
    file("/var/log/kernel-panic.log"
        owner(root)
        group(adm)
        perm(0600)
        # Ensure synchronous writes:
        fsync(yes)
        sync_lines(1)
    );
};

filter f_kernel { facility(kern); };
log { source(src); filter(f_kernel); destination(paniclog); };

For 2.6.x kernels, these sysctl settings help:

# /etc/sysctl.conf additions:
kernel.panic = 30          # Wait 30 seconds before auto-reboot
kernel.panic_on_oops = 1   # Treat oops as panic
kernel.sysrq = 1           # Enable magic SysRq keys

When local storage fails, network logging saves the day. Configure in /etc/rc.local:

# Set up netconsole logging
modprobe netconsole netconsole=@192.168.1.100/eth0,@192.168.1.200/00:11:22:33:44:55

After recovering logs, use these forensic tools:

# Install crash analysis utilities
emerge -av crash kdump-tools

# Basic analysis example:
crash /usr/lib/debug/lib/modules/uname -r/vmlinux /var/crash/127.0.0.1-2024-02-15-13:00:00/vmcore

# Common commands within crash shell:
crash> log
crash> bt
crash> kmem -i

Implement these proactive checks in your monitoring system:

#!/bin/bash
# Check for impending oops conditions
OOPCOUNT=$(dmesg | grep -c "Oops")
PANICCOUNT=$(dmesg | grep -c "Kernel panic")
if [ $OOPCOUNT -gt 0 ] || [ $PANICCOUNT -gt 0 ]; then
    /usr/local/bin/emergency-log-save.sh
    reboot
fi