Server crashes without logs are like crime scenes without fingerprints. From your description, the system fails so hard that even kernel logging stops mid-stream. The /var/log/messages
cutoff suggests either a kernel panic or hardware failure that prevents disk writes.
Your dmesg
output reveals critical RAID issues during boot:
[ 3.624786] EXT3-fs (md1): error: couldn't mount because of unsupported optional features (240)
[ 3.627095] EXT2-fs (md1): error: couldn't mount because of unsupported optional features (244)
[ 3.630284] EXT4-fs (md1): INFO: recovery required on readonly filesystem
This indicates filesystem corruption on your RAID array. The "not clean" status suggests improper shutdowns, which could be both a symptom and cause of crashes.
Install these packages immediately:
sudo yum install -y sysstat crash kernel-devel mcelog
Enable kdump for capturing crash context:
sudo yum install -y kexec-tools
sudo systemctl enable kdump
sudo systemctl start kdump
Configure /etc/kdump.conf
:
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
Create a monitoring script (monitor.sh
):
#!/bin/bash
while true; do
echo "$(date) - $(cat /proc/meminfo | grep MemFree)" >> /var/log/mem_monitor.log
dmesg -T | tail -n 20 >> /var/log/dmesg_monitor.log
mdadm --detail /dev/md* >> /var/log/raid_status.log
sleep 30
done
Check for hardware errors:
sudo mcelog --ascii
sudo smartctl -a /dev/sda
For kernel module issues:
sudo lsmod | grep md_mod
sudo modinfo md_mod
Check array consistency:
sudo mdadm --detail --scan
sudo mdadm --examine /dev/sd[a-c]
Force a resync if needed:
sudo mdadm --manage /dev/md1 --action=resync
Add these to /etc/default/grub
:
GRUB_CMDLINE_LINUX="raid=noautodetect crashkernel=auto nmi_watchdog=0 softlockup_panic=1"
Remember to update GRUB:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
The dmesg output reveals critical filesystem issues with your RAID array configuration. The key indicators are:
[ 3.624786] EXT3-fs (md1): error: couldn't mount because of unsupported optional features (240)
[ 3.627095] EXT2-fs (md1): error: couldn't mount because of unsupported optional features (244)
[ 3.630284] EXT4-fs (md1): INFO: recovery required on readonly filesystem
First, verify your RAID array status with these commands:
cat /proc/mdstat
mdadm --detail /dev/md1
mdadm --detail /dev/md2
To capture future crashes, configure kdump:
yum install kexec-tools
systemctl enable kdump.service
systemctl start kdump.service
Verify configuration with:
kdumpctl status
Install and configure sysstat for historical data:
yum install sysstat
sed -i 's/^HISTORY=.*/HISTORY=28/' /etc/sysconfig/sysstat
systemctl enable sysstat
systemctl start sysstat
For the EXT4 filesystem errors shown in your logs:
umount /dev/md1
fsck.ext4 -f /dev/md1
mount /dev/md1
Add these to /etc/sysctl.conf to improve stability:
vm.panic_on_oom = 1
kernel.panic = 10
kernel.sysrq = 1
Configure journald for persistent logs:
mkdir /var/log/journal
chown root:systemd-journal /var/log/journal
chmod 2755 /var/log/journal
systemctl restart systemd-journald
Install lm_sensors for hardware monitoring:
yum install lm_sensors
sensors-detect
systemctl start lm_sensors
systemctl enable lm_sensors
Create a monitoring script at /usr/local/bin/raid_monitor.sh:
#!/bin/bash
RAID_STATUS=$(mdadm --detail /dev/md1 | grep "State :" | awk '{print $3}')
if [ "$RAID_STATUS" != "clean" ]; then
logger -t RAID "Degraded array detected, attempting repair"
mdadm --manage /dev/md1 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdc1
fi
Make it executable and add to cron:
chmod +x /usr/local/bin/raid_monitor.sh
(crontab -l ; echo "*/15 * * * * /usr/local/bin/raid_monitor.sh") | crontab -