When working with Debian 7 Wheezy (3.2.0-4-amd64 kernel), you might encounter this critical system message:
kernel:[81927.464687] Uhhuh. NMI received for unknown reason 31 on CPU 3.
kernel:[81927.464743] Do you have a strange power saving mode enabled?
kernel:[81927.464791] Dazed and confused, but trying to continue
Non-Maskable Interrupts (NMIs) are hardware-level signals that can't be ignored by the kernel. Reason code 31 typically indicates:
- Hardware failure (CPU/motherboard)
- Firmware bugs
- Overheating issues
- Power supply problems
First, check your system logs for patterns:
# View recent kernel messages
dmesg | grep -i nmi
# Check for hardware errors in system logs
grep -i error /var/log/syslog
Try these kernel boot parameters in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nmi_watchdog=0"
Update GRUB and reboot:
update-grub
reboot
For production servers, consider these measures:
# Install kernel debug tools
apt-get install linux-image-$(uname -r)-dbg
# Check CPU microcode status
apt-get install intel-microcode
dmesg | grep microcode
Run hardware diagnostics:
# Install stress testing tools
apt-get install stress mprime
# Test CPU stability (run as root)
stress --cpu 4 --timeout 3600
When working with Debian Wheezy (Linux 3.2.0-4-amd64), I encountered a particularly stubborn kernel error:
kernel:[81927.464687] Uhhuh. NMI received for unknown reason 31 on CPU 3.
kernel:[81927.464743] Do you have a strange power saving mode enabled?
kernel:[81927.464791] Dazed and confused, but trying to continue
This Non-Maskable Interrupt (NMI) error would inevitably lead to system reboots, creating significant instability.
Non-Maskable Interrupts are hardware-level signals that even the kernel can't ignore. They typically indicate severe hardware issues or critical system states. The "reason 31" code is particularly interesting as it falls under the "unknown reasons" category in the Linux kernel source (arch/x86/kernel/nmi.c).
Key scenarios that can trigger NMIs:
- Hardware watchdog timeouts
- Memory parity errors
- CPU thermal throttling
- Power supply issues
First, let's collect system information:
# Check CPU temperature
sensors
# Verify power supply status (if supported)
cat /sys/class/power_supply/*/status
# Examine kernel ring buffer
dmesg | grep -i nmi
# Check for hardware errors in mcelog
mcelog --ascii
1. Disabling Power Saving Features:
# Temporarily disable CPU idle states
for i in /sys/devices/system/cpu/cpu*/cpuidle/state*/disable; do
echo 1 > $i
done
2. Adjusting Watchdog Settings:
# Check active watchdogs
ls -l /dev/watchdog*
# Try disabling software watchdog
echo 0 > /proc/sys/kernel/nmi_watchdog
3. BIOS-Level Fixes:
- Disable C-states in BIOS
- Update to latest BIOS version
- Disable Turbo Boost if unstable
Add these to your GRUB configuration:
# Prevent CPU from entering deep C-states
processor.max_cstate=1 intel_idle.max_cstate=0
# Disable specific power management features
idle=poll nmi_watchdog=0
Create a simple monitoring script:
#!/bin/bash
LOG_FILE="/var/log/nmi_monitor.log"
while true; do
if dmesg | grep -q "NMI received"; then
echo "[$(date)] NMI detected" >> $LOG_FILE
lscpu >> $LOG_FILE
sensors >> $LOG_FILE
fi
sleep 60
done
If software solutions fail, consider:
- Testing with different RAM modules
- Checking CPU socket connections
- Verifying power supply voltage stability
- Testing with a different motherboard