Ext4 File Recovery: How to Clone Files with Bad Blocks While Preserving Valid Data


10 views

When dealing with storage media failures on Linux systems, encountering files with bad blocks is particularly challenging when:

  • The file spans across both healthy and corrupted sectors
  • Standard copy operations fail with I/O errors
  • You need to preserve the maximum possible valid data

For ext4 filesystems, these tools prove most effective:

sudo apt-get install ddrescue debugfs e2fsprogs

The most reliable approach uses GNU ddrescue to map and recover good blocks:

# First pass - quick recovery of good sectors
ddrescue -v /path/to/bad/file /path/to/new/file rescue.map

# Second pass - attempt difficult sectors with retries
ddrescue -v -d -r3 /path/to/bad/file /path/to/new/file rescue.map

For more precise ext4 file recovery:

# Identify file's inode number
ls -i /path/to/bad/file

# Use debugfs to extract valid blocks
debugfs /dev/sdX -R "dump <inode> /recovered/file"

When dealing with partially recovered files:

  • For text files: Use strings command to extract readable content
  • For binary files: Attempt opening with specialized viewers (e.g., hex editors)
  • For databases: Most have repair utilities (e.g., MySQL's myisamchk)

Prevent future issues by maintaining block maps:

# Generate block map for important files
filefrag -v /important/file > file.blockmap

# Storage in multiple locations
cp file.blockmap /safe/location/
scp file.blockmap backup-server:/backups/

Always validate recovered files:

# Compare checksums where possible
sha256sum original_file recovered_file

# For critical binaries:
ldd recovered_binary | grep "not found"

When dealing with damaged storage media, file recovery becomes particularly tricky when bad blocks occur mid-file on ext4 filesystems. Unlike complete drive failures where tools like ddrescue can create full images, partial file recovery requires more surgical approaches.

The most effective tools for this scenario are:

1. ddrescue - for low-level data extraction
2. debugfs - for ext4-specific operations
3. badblocks - for sector verification
4. hdparm - for drive health checks

1. Initial Assessment

sudo badblocks -v /dev/sdX > bad_sectors.txt
sudo smartctl -a /dev/sdX

2. Creating a Recovery Map

Use ddrescue to create a recovery map file first:

ddrescue -n /dev/sdX recovered_file.img recovery.log
ddrescue -r 3 /dev/sdX recovered_file.img recovery.log

3. Targeted File Extraction

For ext4 filesystems, we can use debugfs to locate the exact file blocks:

debugfs /dev/sdX
debugfs: ls -l /path/to/damaged/file
debugfs: stat /path/to/damaged/file

4. Partial File Reconstruction

Once we have the block information, we can manually extract good portions:

dd if=/dev/sdX of=good_part1.bin bs=4096 skip=[start_block] count=[good_blocks]
dd if=/dev/sdX of=good_part2.bin bs=4096 skip=[post_bad_block] count=[remaining_good_blocks]

For complex cases, consider:

# Force read attempts on bad sectors
ddrescue --direct --max-retries=3 --retry-rim=5 /dev/sdX output.img mapfile.log

# Combine with filesystem-aware tools
debugfs -R "dump /path/to/file recovered_file" /dev/sdX

Here's a sample bash script to automate partial recovery:

#!/bin/bash
DEVICE="/dev/sdX"
FILEPATH="/important/file.dat"
OUTPUT="recovered_file.dat"

# Get file block information
BLOCKS=$(debugfs -R "stat ${FILEPATH}" ${DEVICE} 2>/dev/null | grep "Blocks:" | cut -d: -f2)

# Convert to dd parameters
START_BLOCK=$(echo ${BLOCKS} | awk '{print $1}')
BLOCK_COUNT=$(echo ${BLOCKS} | wc -w)
BLOCK_SIZE=4096  # Typical ext4 block size

# Recovery attempt
dd if=${DEVICE} of=${OUTPUT} bs=${BLOCK_SIZE} \
   skip=${START_BLOCK} count=${BLOCK_COUNT} \
   conv=noerror,sync iflag=direct

After recovery, verify file integrity:

file recovered_file.dat
md5sum original_if_available.dat recovered_file.dat
  • Implement regular filesystem checks (fsck)
  • Use LVM snapshots for critical files
  • Monitor SMART attributes regularly
  • Consider using Btrfs or ZFS for better error detection