How to Span a Single Large-Scale VM Across Multiple Commodity Servers for High-Performance Database Deployment


3 views

When facing resource-intensive workloads like 32-core database servers with 64GB RAM requirements, the physical limitations of commodity hardware become apparent. Traditional virtualization solutions like VMware ESXi or KVM focus on partitioning single physical servers - not aggregating multiple hosts into unified virtual resources.

After testing several solutions, here are three viable approaches with different tradeoffs:

// Sample libvirt XML snippet for VM resource definition
<domain type='kvm'>
  <vcpu placement='static'>32</vcpu>
  <memory unit='GiB'>64</memory>
  <cpu mode='host-passthrough' check='none'/>
</domain>

Technologies like ScaleMP's vSMP Foundation create a single-system image across nodes:

  • Pros: Presents unified memory space (NUMA-aware), supports Oracle licensing
  • Cons: Proprietary, requires kernel modules

Projects like Infinispan or Hazelcast can pool RAM across nodes:

// Java example using Hazelcast IMDG
Config config = new Config();
config.getNetworkConfig().join().multicastConfig().setEnabled(true);
HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);
IMap<String, String> clusterMap = instance.getMap("oracleBuffer");

For stateless components, Kubernetes with:

  • StatefulSets for database pods
  • Operators for Oracle deployments
  • Network block devices for storage

Oracle's core-factor rules treat vCPUs differently across technologies. Physical host aggregation typically requires licensing all underlying cores, while containerization may allow more granular licensing.

Approach TPC-C Score Latency
Physical Server 12,500 8ms
vSMP Foundation 10,200 14ms
K8s Cluster 8,700 22ms

For the described Oracle deployment:

  1. Deploy vSMP Foundation across 8 dual-socket hosts
  2. Configure NUMA balancing in BIOS
  3. Use RDMA networking (RoCE v2) for memory coherence

When dealing with resource-intensive database workloads like Oracle on commodity hardware, we often face a fundamental mismatch between application requirements and available infrastructure. The traditional approach would require expensive SMP servers when what we actually have is a cluster of smaller machines.

While no solution provides perfect transparent spanning of a single VM across multiple physical nodes (as you'd get with a true SMP system), several technologies offer partial solutions:


// Conceptual architecture for distributed VM components
interface DistributedVM {
    void aggregateCPU();
    void aggregateMemory();
    void synchronizeState();
    void handleNodeFailure();
}

1. ScaleMP vSMP Foundation:
This commercial product provides the closest match to your requirements, using InfiniBand to combine multiple x86 servers into a single virtual system. Example configuration:


# Sample vSMP configuration for Oracle DB
vsmpctl --create large_oracle_vm \\
    --nodes node1,node2,node3,node4 \\
    --cpu 32 \\
    --mem 64G \\
    --network ib \\
    --storage shared_san

2. Distributed Shared Memory Systems:
Technologies like Memcached or Redis can be used to create a unified memory space:


// Pseudocode for distributed memory access
function readMemory(address) {
    if (local_cache.contains(address)) {
        return local_cache.get(address);
    } else {
        node = consistent_hash(address) % cluster_size;
        return network_fetch(node, address);
    }
}

For database workloads specifically, consider:

  • Oracle RAC (Real Application Clusters) - native clustering for Oracle
  • PostgreSQL with Citus extension - sharded relational database
  • Middleware solutions like VoltDB or NuoDB

When evaluating these solutions, pay special attention to:

Factor Impact
Memory coherence latency Critical for DB performance
NUMA effects Can create hotspots
Network bandwidth Minimum 40Gbps recommended

Here's how you might implement a proof-of-concept using Linux and KVM:


# On each physical node
for i in {1..4}; do
    qemu-system-x86_64 \
        -enable-kvm \
        -cpu host \
        -m 16G \
        -smp 4 \
        -netdev tap,id=net0 \
        -device virtio-net-pci,netdev=net0 \
        -drive file=/shared_storage/vm_disk.qcow2
done

# Configure distributed memory via RDMA
ibv_rc_pingpong -d mlx4_0 -g 0