Severe KVM Disk Performance Issues: Diagnosing and Fixing Qcow2+Virtio Slow Writes (0.5-3MB/s Case Study)


1 views

During my KVM guest setup, I encountered bizarre disk performance issues where a RAID-backed qcow2 volume showed:

  • Host performance: 120MB/s (measured via dd if=/dev/zero of=/host/test bs=64k count=16000 oflag=direct)
  • Guest performance: Initially 0.5-3MB/s (same test inside VM)
Host:
- Ubuntu 12.04 LTS
- qemu-kvm 1.0+noroms-0ubuntu13
- libvirt 0.9.8-2ubuntu17.1
- Deadline IO scheduler
- Ext4 on RAID1 (4k aligned)

Guest:
- Virtio disk driver
- noop scheduler
- 4GB RAM
- Minimal Ubuntu 12.04 install

Initial symptoms: The performance would mysteriously fluctuate between terrible (0.5MB/s) and acceptable (26MB/s) without configuration changes.

Key observations:

  1. Writeback caching masked the issue (but unsafe for production)
  2. Virtio configuration appeared correct:
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/path/to/image.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    

After extensive testing, the permanent fix involved abandoning qcow2 entirely:

# Host preparation:
pvcreate /dev/md0
vgcreate vg_kvm /dev/md0
lvcreate -L 20G -n lv_vm1 vg_kvm

# Libvirt XML configuration:
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native'/>
  <source dev='/dev/vg_kvm/lv_vm1'/>
  <target dev='vda' bus='virtio'/>
</disk>
Configuration Write Speed Safety
qcow2 + virtio (default) 0.5-26MB/s Safe
qcow2 + writeback 80MB/s Risk of corruption
LVM raw + virtio 135MB/s Safe
  • For production systems requiring both performance and safety, LVM+raw outperforms qcow2
  • Always verify storage performance at both host and guest levels
  • Consider using direct benchmarks: fio --filename=/testfile --direct=1 --rw=randwrite --bs=4k --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=4 --time_based --group_reporting --name=test

Final performance after migration to LVM raw volumes matched the underlying RAID array's capabilities, proving the bottleneck was indeed the qcow2 implementation in this specific environment.


During a recent KVM guest deployment, I encountered baffling disk performance issues where qcow2 images showed write speeds 40-240x slower than the underlying RAID array. Here's how I diagnosed and ultimately solved this problem.


# Host performance benchmark (120MB/s expected)
dd if=/dev/zero of=/mnt/kvm_storage/testfile oflag=direct bs=64k count=16000

# Guest performance benchmark (initially 0.5-3MB/s)
time dd if=/dev/zero of=/tmp/test oflag=direct bs=64k count=16000

My setup included:

  • Ubuntu 12.04 LTS on both host and guest
  • Virtio drivers properly installed
  • 4GB RAM allocation to guest
  • Host using deadline scheduler, guest using noop
  • qcow2 images on mirrored RAID1 array

Several factors significantly impacted performance:


# Cache modes tested:
<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2' cache='writeback'/>
  <!-- vs cache='none' -->
</disk>

# IO modes tested:
<driver name='qemu' type='qcow2' io='native' cache='none'/>

The solution came from switching to raw LVM volumes:


# Create raw volume:
lvcreate -L 20G -n vm_disk vg_kvm

# Libvirt configuration:
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native'/>
  <source dev='/dev/vg_kvm/vm_disk'/>
  <target dev='vda' bus='virtio'/>
</disk>
Configuration Write Speed
qcow2 + writeback cache 26.6 MB/s
qcow2 + no cache 0.5-3 MB/s
RAW + virtio + no cache 135 MB/s

While qcow2 offers convenient features like snapshots, its performance overhead can be substantial. For I/O intensive workloads:

  • Consider raw volumes when snapshots aren't critical
  • Always test with oflag=direct to bypass page cache
  • Monitor disk I/O queues with iostat -x 1 during tests
  • Try different cache modes before abandoning a format