Diagnosing Inconsistent MD5/SHA1 Checksums on Aging Hard Drives: Read Error Analysis and Solutions

When working with large files (particularly multi-GB archives) on aging storage hardware, you might encounter a troubling scenario where repeated checksum operations yield different results. Here's what I observed with a 32GB HDD containing a 5GB tar file:

$ md5sum large_file.tar
d41d8cd98f00b204e9800998ecf8427e  large_file.tar
$ md5sum large_file.tar
5eb63bbbe01eeed093cb22bb8f5acdc3  large_file.tar

The inconsistent checksums strongly suggest read errors during file access. To confirm this:

Test different hashing algorithms (SHA variants show same inconsistency)
Compare with smaller files (checksums remain stable)
Monitor disk I/O errors via SMART tools

Before replacing hardware, run these diagnostics:

# Check for filesystem errors
fsck /dev/sdX

# SMART status check
smartctl -H /dev/sdX

# Bad block scan
badblocks -v /dev/sdX > bad_sectors.txt

# Alternative read method for checksum
dd if=large_file.tar bs=1M | md5sum

If immediate replacement isn't possible, consider:

# Use ddrescue for problematic reads
ddrescue -d -r3 /dev/sdX large_file.tar rescue.log

# Checksum verification through alternative methods
cat large_file.tar | pv | md5sum

# Force read retries (slower but more reliable)
hdparm --read-sector 0 /dev/sdX

These signs indicate impending drive failure:

Increasing reallocated sector count in SMART data
Timeout errors in kernel logs (dmesg)
More than 0.1% read errors during full-disk scan

When working with a 5GB tar file on an aging 32GB HDD, I encountered a perplexing issue: repeated md5sum and sha1sum operations returned different hash values for the identical file. Smaller files consistently produced matching checksums, but this large file spanning most of the disk showed variability.


# Example of inconsistent outputs:
$ md5sum large_file.tar
d41d8cd98f00b204e9800998ecf8427e  large_file.tar
$ md5sum large_file.tar
5eb63bbbe01eeed093cb22bb8f5acdc3  large_file.tar

This behavior strongly suggests underlying disk media problems. Here's how to systematically verify:

SMART Status Check:


$ smartctl -a /dev/sdX | grep -i "reallocated\|pending\|uncorrectable"
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       42
  197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       17

Badblocks Scan:


$ sudo badblocks -v /dev/sdX > bad_blocks.log
Checking blocks 0 to 625142447
Checking for bad blocks (read-only test): 
Pass completed, 27 bad blocks found.

Before condemning the drive, try these verification techniques:

Read Test with dd:
```
$ dd if=large_file.tar bs=1M | md5sum
```

Filesystem-Level Check:


$ sudo fsck -vcf /dev/sdX
Phase 1: Check inodes, blocks, and sizes
Inode 18432 has EXTENTS_FL but invalid i_block...

When encountering checksum inconsistencies:


# 1. Create disk image (skip errors)
$ ddrescue -d /dev/sdX disk.img recovery.log

# 2. Verify image checksum stability
$ md5sum disk.img
$ md5sum disk.img  # Should match

# 3. Extract data from stable image
$ tar -xvf disk.img --skip-failed-files

For critical data storage:

Implement ZFS with checksumming:


$ zpool create -f tank mirror /dev/sda /dev/sdb
$ zfs set checksum=sha256 tank

Schedule regular scrubs:
```
$ zpool scrub tank
```

ServerDevWorker

Diagnosing Inconsistent MD5/SHA1 Checksums on Aging Hard Drives: Read Error Analysis and Solutions

Related Articles