Diagnosing Inconsistent MD5/SHA1 Checksums on Aging Hard Drives: Read Error Analysis and Solutions


7 views

When working with large files (particularly multi-GB archives) on aging storage hardware, you might encounter a troubling scenario where repeated checksum operations yield different results. Here's what I observed with a 32GB HDD containing a 5GB tar file:

$ md5sum large_file.tar
d41d8cd98f00b204e9800998ecf8427e  large_file.tar
$ md5sum large_file.tar
5eb63bbbe01eeed093cb22bb8f5acdc3  large_file.tar

The inconsistent checksums strongly suggest read errors during file access. To confirm this:

  1. Test different hashing algorithms (SHA variants show same inconsistency)
  2. Compare with smaller files (checksums remain stable)
  3. Monitor disk I/O errors via SMART tools

Before replacing hardware, run these diagnostics:

# Check for filesystem errors
fsck /dev/sdX

# SMART status check
smartctl -H /dev/sdX

# Bad block scan
badblocks -v /dev/sdX > bad_sectors.txt

# Alternative read method for checksum
dd if=large_file.tar bs=1M | md5sum

If immediate replacement isn't possible, consider:

# Use ddrescue for problematic reads
ddrescue -d -r3 /dev/sdX large_file.tar rescue.log

# Checksum verification through alternative methods
cat large_file.tar | pv | md5sum

# Force read retries (slower but more reliable)
hdparm --read-sector 0 /dev/sdX

These signs indicate impending drive failure:

  • Increasing reallocated sector count in SMART data
  • Timeout errors in kernel logs (dmesg)
  • More than 0.1% read errors during full-disk scan

When working with a 5GB tar file on an aging 32GB HDD, I encountered a perplexing issue: repeated md5sum and sha1sum operations returned different hash values for the identical file. Smaller files consistently produced matching checksums, but this large file spanning most of the disk showed variability.


# Example of inconsistent outputs:
$ md5sum large_file.tar
d41d8cd98f00b204e9800998ecf8427e  large_file.tar
$ md5sum large_file.tar
5eb63bbbe01eeed093cb22bb8f5acdc3  large_file.tar

This behavior strongly suggests underlying disk media problems. Here's how to systematically verify:

  1. SMART Status Check:
    
    $ smartctl -a /dev/sdX | grep -i "reallocated\|pending\|uncorrectable"
      5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       42
      197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       17
    
  2. Badblocks Scan:
    
    $ sudo badblocks -v /dev/sdX > bad_blocks.log
    Checking blocks 0 to 625142447
    Checking for bad blocks (read-only test): 
    Pass completed, 27 bad blocks found.
    

Before condemning the drive, try these verification techniques:

  • Read Test with dd:
    
    $ dd if=large_file.tar bs=1M | md5sum
    
  • Filesystem-Level Check:
    
    $ sudo fsck -vcf /dev/sdX
    Phase 1: Check inodes, blocks, and sizes
    Inode 18432 has EXTENTS_FL but invalid i_block...
    

When encountering checksum inconsistencies:


# 1. Create disk image (skip errors)
$ ddrescue -d /dev/sdX disk.img recovery.log

# 2. Verify image checksum stability
$ md5sum disk.img
$ md5sum disk.img  # Should match

# 3. Extract data from stable image
$ tar -xvf disk.img --skip-failed-files

For critical data storage:

  • Implement ZFS with checksumming:
    
    $ zpool create -f tank mirror /dev/sda /dev/sdb
    $ zfs set checksum=sha256 tank
    
  • Schedule regular scrubs:
    
    $ zpool scrub tank