Optimizing KVM Virtual Machine Performance: CPU Feature Selection for Windows Guests on Ubuntu 12.04


10 views

The choice of CPU model in KVM virtualization can significantly impact guest performance. Based on your hardware configuration (Dell R910 with Intel Xeon E7-4870 processors running Ubuntu 12.04), we need to consider several technical factors when selecting the optimal CPU model for Windows guests.


# Current suboptimal configuration example:
/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu qemu32 -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 [...]

The available CPU models on your system show distinct performance profiles:

  • qemu32/qemu64: Basic emulation with lowest performance
  • kvm32/kvm64: KVM-optimized but still generic
  • Nehalem/Penryn: Model-specific optimizations for Intel architectures
  • host: Direct passthrough of physical CPU features

When migration isn't a concern (identical host hardware), -cpu host typically delivers the best performance by exposing all physical CPU features to the guest:


# Recommended configuration for maximum performance:
/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu host -enable-kvm -m 4096 \
-smp 4,sockets=4,cores=1,threads=1 [...]

This approach allows Windows guests to utilize:

  • Intel VT-x extensions for hardware virtualization
  • SSE4.2/AVX instruction sets
  • Advanced power management features
  • NUMA awareness (important for your 4-socket system)

Your observed 3x improvement when switching from qemu32 to Nehalem aligns with expected behavior. Here's why:


# qemu32 behavior (slow):
- Emulates basic 32-bit Pentium features
- No modern instruction sets
- No hardware optimization

# Nehalem behavior (faster):
- Includes SSE4.1/SSE4.2 instructions
- Better memory access patterns
- Support for hyperthreading

While CPU model selection is crucial, consider these complementary optimizations:


# Enable VirtIO drivers (as you mentioned planning):
-device virtio-blk-pci,drive=drive0,bootindex=0 \
-drive file=/path/to/image,if=none,id=drive0,cache=writeback,discard=unmap

# NUMA pinning for your 4-socket system:
-numa node,nodeid=0,cpus=0-9 \
-numa node,nodeid=1,cpus=10-19 \
-numa node,nodeid=2,cpus=20-29 \
-numa node,nodeid=3,cpus=30-39

To quantitatively measure improvements:


# Inside Windows guest, run from command prompt:
winsat disk -drive c
winsat cpu -encryption
winsat mem

Compare results across different CPU model configurations to validate your optimization choices.


When running Windows guests on KVM (specifically Ubuntu 12.04 with kernel 3.2.0-25), the choice of CPU model significantly affects performance. Based on real-world testing with Dell R910 servers (Intel Xeon E7-4870 processors), we observed that copying files within a Windows 2003 32-bit guest took 2 hours 40 minutes with -cpu qemu32, but only 40 minutes when switching to -cpu Nehalem.

The available CPU models can be listed using:

qemu-x86_64 -cpu ?
kvm -cpu ?model

Key models include:

  • qemu32/qemu64: Basic emulation (poorest performance)
  • kvm32/kvm64: Basic KVM-optimized
  • Nehalem: Intel Core i7 architecture
  • Penryn/Conroe: Older Intel Core 2 architectures
  • Opteron_G*: AMD processor families

For maximum performance when migration isn't a concern, use:

-cpu host

This passes through all host CPU features to the guest, enabling:

  • Full VT-x/AMD-V hardware acceleration
  • All available instruction sets (SSE, AVX, etc.)
  • Better cache utilization

For Windows guests specifically:

/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu host -enable-kvm -m 4096 \
-smp 4,sockets=4,cores=1,threads=1 [...]

Additional optimizations:

  1. Always use VirtIO drivers for storage and network
  2. Enable KSM (Kernel Samepage Merging)
  3. Consider CPU pinning for latency-sensitive workloads

If migration between hosts is required, use:

-cpu Nehalem,+aes,+ssse3,+sse4.1,+sse4.2,+avx

This provides a balance between performance and compatibility.

For reliable performance testing:

# Install Phoronix Test Suite
sudo apt-get install phoronix-test-suite
# Run filesystem benchmark
phoronix-test-suite benchmark pts/iozone