Troubleshooting VMs Inter-Host Ping Failures in VMware vDS (ESXi 5.1) – Port Group and Network Segmentation Analysis


8 views

After migrating to VMware Distributed Switch (vDS) in our ESXi 5.1 environment, we encountered an unusual networking behavior where:

  • VMs residing on the same ESXi host could ping each other successfully
  • Cross-host VM communication attempts resulted in ping timeouts
  • vSphere Client showed all port groups and uplinks with normal (green) status indicators

First, let's verify the basic vDS configuration using PowerCLI:

# Connect to vCenter
Connect-VIServer -Server vcenter.example.com

# Get vDS information
Get-VirtualSwitch -Distributed | Select Name, NumPorts, Mtu

# Check port group settings
Get-VDPortgroup | Select Name, VlanConfiguration, NumPorts

# Verify VM network adapter bindings
Get-VM | Get-NetworkAdapter | Select VM, NetworkName, Type

Based on similar cases, these factors often contribute to the issue:

  • VLAN Mismatch: Verify VLAN IDs are consistent across all hosts
  • MTU Settings: Ensure uniform MTU configuration (usually 1500)
  • Security Policies: Check vDS security settings for MAC address changes
  • Physical Switch Configuration: Confirm trunk ports allow the required VLANs

Here's a structured approach to isolate the problem:

  1. Test connectivity between VM consoles using ping -t (Windows) or continuous ping (Linux)
  2. Capture packets simultaneously on source and destination VMs
  3. Verify ARP resolution with arp -a (Windows) or arp -n (Linux)
  4. Check vDS statistics for dropped packets

Here's a proper vDS setup script that avoids common pitfalls:

# Create distributed switch with proper MTU
New-VDSwitch -Name "Prod-vDS-01" -Location (Get-Datacenter "DC1") -Mtu 1500

# Add hosts to the vDS
Get-VMHost | ForEach-Object {
    Add-VDSwitchVMHost -VDSwitch "Prod-vDS-01" -VMHost $_
}

# Configure uplink port group
$vds = Get-VDSwitch "Prod-vDS-01"
$uplinkPG = New-VDPortgroup -Name "Uplink-PG" -VDSwitch $vds -VlanTrunkRange "1-4094"

# Create VM port group with correct VLAN
New-VDPortgroup -Name "VM-Network" -VDSwitch $vds -VlanId 100

Ensure your physical switch ports are properly configured as trunks. Here's a Cisco example:

interface GigabitEthernet1/0/1
 description ESXi01 Uplink
 switchport mode trunk
 switchport trunk allowed vlan 100,200,300
 switchport nonegotiate
 spanning-tree portfast trunk

When basic checks don't reveal the issue, try these advanced techniques:

  1. Enable vDS network health check in vCenter
  2. Verify VMkernel networking for vMotion interfaces
  3. Check for conflicting third-party firewall rules
  4. Review host physical NIC teaming policy

Working with VMware ESXi 5.1 and distributed switches presents some interesting networking challenges. The scenario where VMs on the same host can communicate while cross-host communication fails is particularly puzzling when the vDS topology appears normal with all green indicators.

Before diving deep into vDS configuration, let's verify physical connectivity:

# Check physical NIC status on ESXi hosts
esxcli network nic list
# Verify VLAN trunking on physical switches
show interface trunk

A common oversight is mismatched VLAN configurations between physical switches and vDS port groups.

The distributed switch requires careful attention to several key settings:

# Sample PowerCLI commands to verify vDS settings
Get-VDSwitch | Select Name,MTU,NumPorts
Get-VDPortgroup | Select Name,VlanConfiguration,TeamingPolicy

The default settings during vDS creation might not account for your specific network architecture. Pay special attention to:

  • Uplink VLAN tagging
  • Load balancing algorithm
  • Failover order configuration

Here's my systematic approach to resolve this issue:

  1. Verify vDS uplink connectivity using esxcli network vswitch dvs vmware list
  2. Check VMkernel networking with esxcfg-vmknic -l
  3. Test with temporary port group with VLAN 0 (VLAN trunk disabled)
  4. Validate physical switch port configurations

In one case, the issue was caused by:

# ESXi host showing MTU 1500
esxcli network ip interface list
# While physical switch was configured for jumbo frames
show interface GigabitEthernet1/0/1

Adjusting either the vDS or physical switch MTU to match resolved the connectivity issue.

For complex environments, consider:

# Packet capture from ESXi hosts
pktcap-uw --switchport N -o /tmp/capture.pcap
# Check distributed port statistics
esxcli network vswitch dvs vmware portstats get -p PORTID

These tools can reveal dropped packets or misrouted traffic.

Before considering the issue resolved, verify:

  • Consistent VLAN tagging across all layers
  • Matching MTU settings
  • Correct teaming policy for your infrastructure
  • Proper physical switch port configurations (trunk/allowed VLANs)