Optimizing dd Command Performance: The Impact of Block Size (bs) on Read/Write Throughput


3 views

The `bs` parameter in `dd` determines the number of bytes transferred in a single operation. Through extensive benchmarking across different storage devices (MMC and HDD), we observe significant performance variations based on block size selection.

# MMC Card (SanDisk Extreme Pro)
dd if=/dev/sdc of=/dev/null bs=4 count=250000000
→ 12MB/s (250M operations)
dd if=/dev/sdc of=/dev/null bs=1M count=1000
→ 14.1MB/s (1k operations)

Storage controllers have native transfer sizes (typically 4KB for modern SSDs). Matching `bs` to these values reduces:

  • System call overhead (visible in sys time metrics)
  • Interrupt coalescing requirements
  • DMA setup operations

For general use:

# Safe defaults for most modern systems
dd if=/dev/sdX of=/dev/sdY bs=4M iflag=direct oflag=direct

When maximum throughput matters:

# Performance test with varying block sizes
for bs in 512 1K 4K 64K 1M 4M; do
  echo "Testing bs=$bs"
  dd if=/dev/zero of=testfile bs=$bs count=1K oflag=direct
  sync; rm testfile
done

Combine block size with:

  • `oflag/direct`: Bypasses page cache
  • `conv=fdatasync`: Ensures physical write completion
  • `iflag/nocache`: Prevents read caching

Example production-grade backup command:

dd if=/dev/sda bs=4M iflag=fullblock | 
pv -s $(blockdev --getsize64 /dev/sda) | 
dd of=/dev/sdb bs=4M oflag=direct conv=fdatasync

While default 512B blocks work, optimized sizes (typically 1MB-4MB) can yield 2-3x throughput improvements. The exact sweet spot requires benchmarking on your specific hardware stack.


The dd command's block size (bs) parameter determines how much data is read/written in a single operation. Through extensive testing across different hardware configurations, we've observed significant performance variations:

# Sample benchmark command structure
time dd if=/dev/sdX of=/dev/null bs=Y count=Z

Our tests reveal several consistent patterns across storage devices:

MMC Card Performance

  • bs=4: 12MB/s (surprisingly close to maximum throughput)
  • bs≥5: 14.1-14.3MB/s (peak performance)
  • bs=1: Drops to 3.1MB/s (worst case)

HDD Performance

  • bs=10: 29.9MB/s
  • bs=512: 95.3MB/s (default value)
  • bs=1M: 97.6MB/s (optimal for this hardware)

Smaller block sizes significantly increase system CPU time:

# bs=1 (HDD)
real    5m41.463s
user    0m56.000s
sys     4m44.340s

# bs=1M (HDD)
real    0m10.792s
user    0m0.008s
sys     0m1.144s

Based on these benchmarks, we recommend:

  • For quick operations: Use bs=1M as a safe default
  • For maximum throughput: Test bs values from 4K to 16M
  • When copying between devices: Match block sizes (bs and obs)

This script helps find optimal block size for your hardware:

#!/bin/bash
DEVICE=$1
TEST_FILE=/tmp/dd_test.img

# Create test file
dd if=/dev/zero of=$TEST_FILE bs=1M count=1024

echo "Testing optimal block size for $DEVICE"
echo "Block Size,MB/s"

for bs in 512 1K 4K 16K 64K 256K 1M 4M 16M
do
    sync
    echo 3 > /proc/sys/vm/drop_caches
    speed=$(dd if=$TEST_FILE of=$DEVICE bs=$bs 2>&1 | grep -o '[0-9.]\+ MB/s')
    echo "$bs,$speed"
done

rm $TEST_FILE

The performance impact stems from:

  • System call overhead (read/write operations)
  • Device I/O scheduler behavior
  • Filesystem block size alignment
  • DMA buffer sizes