What we're seeing goes beyond typical filesystem corruption. When power loss causes inodes to mutate (files becoming directories or vice versa) and alters permissions on unchanged files, this suggests deep filesystem metadata corruption. Unlike traditional HDDs where corruption typically affects recently written sectors, SSDs exhibit unique failure modes due to:
- Flash translation layer (FTL) inconsistencies
- Partial page programming effects
- Write amplification during unexpected power cycles
For ext4 filesystems, implement these safeguards in /etc/fstab
:
UUID=your-uuid / ext4 defaults,data=journal,commit=30,barrier=1,noatime 0 1
Critical mount options explanation:
data=journal
: Journals both metadata AND file contents (performance impact but maximum safety)commit=30
: Forces sync every 30 seconds instead of default 5barrier=1
: Ensures proper write ordering (especially crucial for SSDs)
Disable write caching (temporary solution until proper UPS implementation):
# Check current cache status
hdparm -W /dev/sda
# Disable write cache
hdparm -W0 /dev/sda
# Make persistent (add to rc.local)
echo 'hdparm -W0 /dev/sda' >> /etc/rc.local
For critical systems, consider implementing a read-only root with overlayfs:
mount -t overlay overlay -o lowerdir=/ro,upperdir=/rw,workdir=/work /merged
For PostgreSQL, enforce strict durability settings in postgresql.conf
:
fsync = on
full_page_writes = on
synchronous_commit = on
wal_level = replica
Create a pre-shutdown hook script (/etc/systemd/system/postgresql-powerfail.service
):
[Unit]
Description=PostgreSQL emergency flush
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
ExecStop=/usr/bin/runuser -l postgres -c '/usr/bin/psql -c "CHECKPOINT;"'
[Install]
WantedBy=shutdown.target
When hardware UPS isn't available, implement basic power monitoring:
#!/bin/bash
# Monitor battery status and trigger safe shutdown
while true; do
if [[ $(cat /sys/class/power_supply/BAT0/status) == "Discharging" ]] &&
(( $(cat /sys/class/power_supply/BAT0/capacity) < 10 )); then
sync
/sbin/reboot -h now
fi
sleep 60
done
Implement automated fsck on boot by creating /etc/initramfs-tools/scripts/init-premount/fsck_force
:
#!/bin/sh
PREREQ=""
prereqs() { echo "$PREREQ"; }
case "$1" in
prereqs) prereqs; exit 0;;
esac
for DEVICE in $(lsblk -o KNAME -lpn); do
if [ -z "$(blkid -s TYPE -o value $DEVICE)" ]; then continue; fi
fsck -A -C -T -t noopts=_netdev -a $DEVICE || fsck -y $DEVICE
done
Make executable and update initramfs:
chmod +x /etc/initramfs-tools/scripts/init-premount/fsck_force
update-initramfs -u
When dealing with post-power-loss corruption on SSDs, we're observing anomalies that transcend typical filesystem issues. Unlike traditional storage corruption limited to recently modified files, we're seeing:
# Example of corrupted inode structure (hypothetical debug output)
$ ls -lai /var/www/html
12345 drwxr-xr-x 2 root root 4096 Jan 1 00:00 index.php # File became directory
67890 -rw-r--r-- 1 root root 0 Jan 1 00:00 assets/ # Directory became file
Consumer-grade SSDs often exhibit three critical vulnerabilities during power loss:
- Volatile write caches not properly flushed
- FTL (Flash Translation Layer) mapping tables corruption
- Partial page programming in NAND cells
This explains why even unchanged files get corrupted - the metadata structures in FTL may reference wrong physical blocks.
1. Filesystem Mount Options
Add these to /etc/fstab for critical partitions:
/dev/sda1 / ext4 defaults,data=journal,barrier=1,noauto_da_alloc 0 1
/dev/sda2 /var/lib/postgresql ext4 defaults,data=journal,nodelalloc 0 2
2. SSD Hardware Configuration
Disable volatile cache (caution: impacts performance):
# For Kingston drives shown in example
sudo hdparm -W0 /dev/sda
sudo hdparm -J0 /dev/sda # Disable write cache flushing
# Make persistent via udev rule:
echo 'ACTION=="add", SUBSYSTEM=="block", ATTRS{model}=="KINGSTON*", RUN+="/sbin/hdparm -W0 /dev/%k"' | sudo tee /etc/udev/rules.d/99-ssd-safety.rules
3. PostgreSQL-Specific Protections
# postgresql.conf critical settings:
wal_level = replica
synchronous_commit = on
full_page_writes = on
fsync = on
Create a failsafe initramfs script to verify critical partitions:
#!/bin/sh
# /usr/share/initramfs-tools/scripts/init-premount/fsck_ssd
case "$1" in
prereqs)
echo ""
exit 0
;;
esac
fsck -y -f /dev/disk/by-label/rootfs || {
logger -t fsck_ssd "Critical filesystem errors detected"
mount -o remount,ro / || emergency_shell
}
Implement proactive monitoring with smartmontools:
# smartd configuration (/etc/smartd.conf)
/dev/sda -a -o on -S on -n standby -s (S/../.././02|L/../../7/03) -W 4,35,40 \
-m admin@example.com -M exec /usr/local/bin/ssd_alert
The accompanying alert script:
#!/bin/bash
# /usr/local/bin/ssd_alert
MEDIA_WEAROUT_INDICATOR=$(smartctl -A /dev/$1 | awk '/Media_Wearout_Indicator/ {print $4}')
if [ $MEDIA_WEAROUT_INDICATOR -lt 20 ]; then
systemctl start emergency-readonly.service
fi
For mission-critical deployments, consider:
- Power-loss-protected (PLP) enterprise SSDs (e.g., Intel DC series)
- Hardware RAID controllers with battery-backed cache
- Distributed filesystems with checksumming (ZFS, Btrfs)