Optimizing Sparse File Copy Performance: Benchmarking cp, dd, rsync, and virt-sparsify for QCOW2 VM Images


3 views

When working with QCOW2 virtual machine images, we often encounter sparse files - files that appear large but only consume disk space for actually written blocks. A 200GB VM image might only use 16GB physical storage, making efficient copying crucial for storage optimization.

Here's a detailed comparison of various copy methods tested on RHEL/CentOS 6.6 x64 systems:

# Original file inspection
ls -lhs srcFile
16G -rw-r--r-- 1 qemu qemu 201G Feb  4 11:50 srcFile
cp --sparse=always srcFile dstFile

Results: 200GB max/26GB actual (10GB bloat), Time: 1:02

dd if=srcFile of=dstFile iflag=direct oflag=direct bs=4M conv=sparse

Results: 200GB max/21GB actual (5GB bloat), Time: 2:02

rsync --ignore-existing -aS srcFile dstFile

Results: 200GB max/26GB actual (10GB bloat), Time: 24:49

virt-sparsify srcFile dstFile

Results: 200GB max/16GB actual (0 bloat), Time: 17:37

Testing different block sizes reveals interesting performance characteristics:

4K:   5:54.64, 56%, 7.3GB
8K:   3:43.25, 58%, 7.3GB
16K:  2:23.20, 59%, 7.3GB
32K:  1:49.25, 62%, 7.3GB
64K:  1:33.62, 64%, 7.3GB
128K: 1:40.83, 55%, 7.4GB
256K: 1:22.73, 64%, 7.5GB
512K: 1:44.84, 74%, 7.6GB
1M:   1:16.59, 70%, 7.9GB
2M:   1:21.58, 66%, 8.4GB
4M:   1:17.52, 69%, 9.5GB
8M:   1:10.92, 76%, 12GB
16M:  1:17.09, 78%, 16GB
32M:  2:54.10, 90%, 22GB

For raw speed: Use cp with --sparse=always when time is critical and some bloat is acceptable.

For balanced performance: dd with bs=4M offers good speed with moderate bloat.

For perfect sparse copies: virt-sparsify is the only method that preserves the exact original sparse structure.

For network transfers: rsync provides reliable copying over networks despite being slower.

For critical operations, consider a two-phase approach:

# Phase 1: Fast initial copy
cp --sparse=always srcFile dstFile.tmp

# Phase 2: Optimize sparse structure
virt-sparsify dstFile.tmp dstFile
rm dstFile.tmp

When working with virtual machine images like QCOW2 files, we often encounter sparse files - files that appear large but actually consume less physical storage. The challenge comes when copying these files while maintaining their sparse nature efficiently. Here's what I've discovered through extensive testing on RHEL/CentOS 6.6 systems.

I tested several common copy methods on a 200GB QCOW2 image (16GB allocated) to evaluate both speed and storage efficiency:

# Original file stats
ls -lhs srcFile 
16G -rw-r--r-- 1 qemu qemu 201G Feb  4 11:50 srcFile

1. cp Command - Fastest Performance

cp --sparse=always srcFile dstFile
# Results:
# - Copied as 200GB max/26GB actual
# - 10GB bloat
# - Time: 1:02 (mm:ss)

2. dd Command - Best Balance

dd if=srcFile of=dstFile iflag=direct oflag=direct bs=4M conv=sparse
# Results:
# - Copied as 200GB max/21GB actual  
# - 5GB bloat
# - Time: 2:02 (mm:ss)

3. virt-sparsify - Perfect Size Preservation

virt-sparsify srcFile dstFile
# Results:
# - Perfect 200GB max/16GB actual copy
# - 0 bloat
# - Time: 17:37 (mm:ss)

Through testing different block sizes with dd, I found significant variations in both performance and storage efficiency. The optimal block size appears to be between 64K-1M for most cases:

# Block size tests on 7.3GB sparse/200GB file:
64K:  1:33.62, 64%, 7.3GB
1M:   1:16.59, 70%, 7.9GB
4M:   1:17.52, 69%, 9.5GB
16M:  1:17.09, 78%, 16GB

Based on these results, here's my suggested approach for different scenarios:

# For fastest copy with moderate bloat:
cp --sparse=always srcFile dstFile

# For best balance between speed and size:
dd if=srcFile of=dstFile bs=1M conv=sparse iflag=direct oflag=direct

# For perfect sparse preservation:
virt-sparsify srcFile dstFile

When using dd, I recommend starting with bs=1M as it provides good performance with reasonable storage overhead. The direct I/O flags (iflag=direct,oflag=direct) help maintain consistent performance by bypassing cache.

For automated systems where storage efficiency is critical, virt-sparsify is the clear winner despite its longer runtime, as it perfectly maintains the sparse structure of the original file.