When running virtualized environments, we typically expect near-native performance from KVM guests. However, your benchmark results showing 30-70% slower I/O performance in guests versus the host indicates a configuration issue worth investigating.
Your current setup uses:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/dev/vgkvmnode/lv2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
Several factors could contribute to the performance gap:
- Cache Settings: Missing cache configuration in the disk definition
- IO Threads: Not using dedicated I/O threads
- Virtio Queue Depth: Default queue settings might be suboptimal
- NUMA Alignment: Potential NUMA node misalignment
First, modify your disk configuration to include cache settings and IO threads:
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source file='/dev/vgkvmnode/lv2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
<iothreads>1</iothreads>
</disk>
Add these parameters to your guest's XML configuration for better performance:
<domain type='kvm'>
...
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
</hyperv>
</features>
<cpu mode='host-passthrough' check='none'/>
<memoryBacking>
<hugepages/>
</memoryBacking>
...
</domain>
On the host machine, consider these optimizations:
# Set virtio queue depth
echo 256 > /sys/block/vdX/queue/nr_requests
# Adjust swappiness
sysctl vm.swappiness=10
# Disable transparent hugepages for the VM process
echo never > /sys/kernel/mm/transparent_hugepage/enabled
After applying these changes, rerun your benchmarks to verify improvements. The most critical metrics to watch are:
- Single-threaded read/write operations
- Concurrent I/O performance
- Latency under load
If performance remains suboptimal, consider these alternatives:
# For raw performance:
<driver name='qemu' type='raw' cache='writeback'/>
# For safety with decent performance:
<driver name='qemu' type='qcow2' cache='writethrough'/>
# For LVM passthrough:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/vgkvmnode/lv2'/>
<target dev='vda' bus='virtio'/>
</disk>
Verify your virtio drivers are properly installed in the guest:
# On guest system:
lsmod | grep virtio
modprobe virtio_blk virtio_net virtio_pci virtio_ring virtio
When comparing raw storage performance between my KVM host and guest systems, I consistently observed 30-70% slower I/O operations in the guest environment. The host runs CentOS 6.3 with four 1TB SATA HDDs in software RAID10 configuration, while the guest uses LVM storage on virtio.
I conducted comprehensive tests using both iozone and dd to measure different aspects of storage performance:
# Single process read test
dd if=/dev/vgkvmnode/lv2 of=/dev/null bs=1M count=1024 iflag=direct
# Concurrent read test
for i in {1..4}; do
dd if=/dev/vgkvmnode/lv2 of=/dev/null bs=1M count=1024 iflag=direct skip=$((i*1024)) &
done
The current virtio disk configuration in libvirt XML shows:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/dev/vgkvmnode/lv2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
After extensive testing, these areas showed potential for improvement:
- Virtio Queue Depth:
<driver name='qemu' type='raw' queues='4' ioeventfd='on'/>
- CPU Pinning:
<vcpu placement='static' cpuset='0-3'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> </cputune>
- Cache Mode:
<driver name='qemu' type='raw' cache='none'/>
For enterprise environments, these kernel parameters improved performance:
# Add to /etc/sysctl.conf
vm.dirty_ratio = 20
vm.dirty_background_ratio = 10
vm.swappiness = 10
After implementing these changes, concurrent write performance improved from 21.5MB/s to 38.7MB/s in the guest environment. The key was balancing virtio queues with available CPU cores and optimizing the host's I/O scheduler.