Optimizing KVM Virtual Machine Performance: CPU Feature Selection for Windows Guests on Ubuntu 12.04


1 views

The choice of CPU model in KVM virtualization can significantly impact guest performance. Based on your hardware configuration (Dell R910 with Intel Xeon E7-4870 processors running Ubuntu 12.04), we need to consider several technical factors when selecting the optimal CPU model for Windows guests.


# Current suboptimal configuration example:
/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu qemu32 -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 [...]

The available CPU models on your system show distinct performance profiles:

  • qemu32/qemu64: Basic emulation with lowest performance
  • kvm32/kvm64: KVM-optimized but still generic
  • Nehalem/Penryn: Model-specific optimizations for Intel architectures
  • host: Direct passthrough of physical CPU features

When migration isn't a concern (identical host hardware), -cpu host typically delivers the best performance by exposing all physical CPU features to the guest:


# Recommended configuration for maximum performance:
/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu host -enable-kvm -m 4096 \
-smp 4,sockets=4,cores=1,threads=1 [...]

This approach allows Windows guests to utilize:

  • Intel VT-x extensions for hardware virtualization
  • SSE4.2/AVX instruction sets
  • Advanced power management features
  • NUMA awareness (important for your 4-socket system)

Your observed 3x improvement when switching from qemu32 to Nehalem aligns with expected behavior. Here's why:


# qemu32 behavior (slow):
- Emulates basic 32-bit Pentium features
- No modern instruction sets
- No hardware optimization

# Nehalem behavior (faster):
- Includes SSE4.1/SSE4.2 instructions
- Better memory access patterns
- Support for hyperthreading

While CPU model selection is crucial, consider these complementary optimizations:


# Enable VirtIO drivers (as you mentioned planning):
-device virtio-blk-pci,drive=drive0,bootindex=0 \
-drive file=/path/to/image,if=none,id=drive0,cache=writeback,discard=unmap

# NUMA pinning for your 4-socket system:
-numa node,nodeid=0,cpus=0-9 \
-numa node,nodeid=1,cpus=10-19 \
-numa node,nodeid=2,cpus=20-29 \
-numa node,nodeid=3,cpus=30-39

To quantitatively measure improvements:


# Inside Windows guest, run from command prompt:
winsat disk -drive c
winsat cpu -encryption
winsat mem

Compare results across different CPU model configurations to validate your optimization choices.


When running Windows guests on KVM (specifically Ubuntu 12.04 with kernel 3.2.0-25), the choice of CPU model significantly affects performance. Based on real-world testing with Dell R910 servers (Intel Xeon E7-4870 processors), we observed that copying files within a Windows 2003 32-bit guest took 2 hours 40 minutes with -cpu qemu32, but only 40 minutes when switching to -cpu Nehalem.

The available CPU models can be listed using:

qemu-x86_64 -cpu ?
kvm -cpu ?model

Key models include:

  • qemu32/qemu64: Basic emulation (poorest performance)
  • kvm32/kvm64: Basic KVM-optimized
  • Nehalem: Intel Core i7 architecture
  • Penryn/Conroe: Older Intel Core 2 architectures
  • Opteron_G*: AMD processor families

For maximum performance when migration isn't a concern, use:

-cpu host

This passes through all host CPU features to the guest, enabling:

  • Full VT-x/AMD-V hardware acceleration
  • All available instruction sets (SSE, AVX, etc.)
  • Better cache utilization

For Windows guests specifically:

/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu host -enable-kvm -m 4096 \
-smp 4,sockets=4,cores=1,threads=1 [...]

Additional optimizations:

  1. Always use VirtIO drivers for storage and network
  2. Enable KSM (Kernel Samepage Merging)
  3. Consider CPU pinning for latency-sensitive workloads

If migration between hosts is required, use:

-cpu Nehalem,+aes,+ssse3,+sse4.1,+sse4.2,+avx

This provides a balance between performance and compatibility.

For reliable performance testing:

# Install Phoronix Test Suite
sudo apt-get install phoronix-test-suite
# Run filesystem benchmark
phoronix-test-suite benchmark pts/iozone