Troubleshooting Buffer I/O Errors and Lost Async Page Writes on HP MSA2040 Storage in CentOS 7

When working with HP MSA2040 SAN storage mounted on CentOS 7 systems, you might encounter persistent kernel errors like:

Buffer I/O error on dev sda, logical block 886865171, lost async page write
blk_update_request: I/O error, dev sda, sector 7094923416

These messages indicate the system is failing to write data to specific logical blocks on your storage device. The errors typically appear in clusters, suggesting either filesystem corruption or underlying hardware issues.

Before attempting repairs, gather critical system information:

# Check block devices
lsblk

# Verify volume groups
pvs
vgs
lvs

# Examine SMART data (for physical disks)
smartctl -a /dev/sda

In this case, the SAN storage shows:

Vendor: HP
Product: MSA 2040 SAN
User Capacity: 10,239,998,951,424 bytes
SMART Health Status: OK

XFS filesystems require specific repair approaches:

# Dry run first
xfs_repair -n /dev/sda

# Force log zeroing (caution: potential data loss)
xfs_repair -L /dev/sda

The repair process may hang when encountering severe corruption. If xfs_repair stalls, consider these alternatives:

# Check mount options in /etc/fstab
# Ensure you're using proper SAN-optimized options like:
noatime,nodiratime,nobarrier,logbufs=8

The TCP window shrinking errors in kernel logs:

Peer ... unexpectedly shrunk window ... (repaired)

Suggest potential Fibre Channel or network issues between the server and MSA2040. Check:

# FC link status
cat /sys/class/fc_host/host*/port_state

# FC statistics
cat /sys/class/fc_host/host*/statistics/tx_frames
cat /sys/class/fc_host/host*/statistics/rx_frames

When standard repairs fail, deeper investigation is needed:

# Monitor I/O errors in real-time
dmesg -wH

# Check block layer errors
cat /sys/block/sda/stat

# Force device rescan
echo 1 > /sys/block/sda/device/rescan

For persistent issues, consider:

# Lower queue depth
echo 64 > /sys/block/sda/queue/nr_requests

# Test raw device access
dd if=/dev/sda of=/dev/null bs=1M count=100 skip=886865171

Access the MSA2040 management interface to verify:

Controller firmware version
Disk health status in the array
FC port error counters
Cache battery status

Check for known issues with G210 firmware and consider upgrading if newer versions are available.

If data accessibility is critical and repairs fail:

# Attempt readonly mount
mount -o ro,norecovery /dev/sda /mnt/rescue

# Use ddrescue for critical data extraction
ddrescue -b 1M /dev/sda /mnt/backup/image.img /mnt/backup/logfile.log

For complete storage reinitialization as last resort:

# Wipe filesystem signature
wipefs -a /dev/sda

# Create new XFS filesystem with optimal alignment
mkfs.xfs -f -d su=64k,sw=8 /dev/sda

The kernel messages showing "Buffer I/O error on dev sda" with lost async page writes indicate serious storage communication issues. When working with enterprise storage like HP MSA2040, these errors typically point to one of several potential problems:

// Sample error pattern you might see in /var/log/messages
kernel: blk_update_request: I/O error, dev sda, sector 7094923416
kernel: Buffer I/O error on dev sda, logical block 886865171, lost async page write

Before attempting repairs, verify the physical and logical connection status:

# Check storage device health
smartctl -a /dev/sda

# Verify multipath status (if applicable)
multipath -ll

# Check dmesg for recent errors
dmesg | grep -i error

XFS filesystems require special handling when encountering I/O errors:

# First attempt dry run repair
xfs_repair -n /dev/sda

# For more aggressive repair (if metadata is corrupted)
xfs_repair -L /dev/sda

For HP MSA2040 specifically, examine these aspects:

Check controller logs through the management interface
Verify fiber channel connections and switch zoning
Confirm LUN presentation and masking settings

When facing persistent I/O errors:

# Try resetting the SCSI device
echo 1 > /sys/block/sda/device/reset

# Alternative: Re-scan the SCSI bus
echo "- - -" > /sys/class/scsi_host/hostX/scan

For deeper investigation, consider capturing I/O patterns:

# Monitor real-time I/O errors
iostat -x 1

# Check block layer errors
cat /sys/block/sda/stat

Remember to check for firmware updates for both your server's HBA and the MSA2040 controllers, as many storage issues are resolved in later firmware versions.

ServerDevWorker

Troubleshooting Buffer I/O Errors and Lost Async Page Writes on HP MSA2040 Storage in CentOS 7

Related Articles