Optimizing Windows TCP Window Scaling for High-Latency Transfers: Solving Premature Throughput Plateau


2 views

When dealing with high-latency network transfers (100-160ms RTT) between Windows clients and Linux servers, we observed throughput plateauing at just 5-6Mbps despite having 1Gbps bandwidth. The key symptom: explicit window size adjustments (iperf -w1M) would boost performance to 70-130Mbps, while default settings failed to scale.

Packet captures revealed fundamental differences in window scaling behavior:

// Windows default behavior (64KB window)
[  5] 0.0-10.0 sec  6.55 MBytes  5.48 Mbits/sec

// Linux default behavior (85KB window)  
[  5] 0.0-10.8 sec   142 MBytes   110 Mbits/sec

// Windows with forced 1MB window
[  4] 0.0-18.3 sec   196 MBytes  89.6 Mbits/sec

Windows TCP stack exhibits two problematic behaviors:

  • Conservative initial window scaling (64KB default vs Linux's 85KB)
  • Premature cessation of window growth during slow-start phase

Recommended AWS Linux server settings (sysctl.conf):

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216

Create a .reg file with these adjustments:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"TcpWindowSize"=dword:00100000
"GlobalMaxTcpWindowSize"=dword:00100000
"Tcp1323Opts"=dword:00000003
"EnablePMTUDiscovery"=dword:00000001
"EnablePMTUBHDetect"=dword:00000000
"TcpMaxDupAcks"=dword:00000002
"SackOpts"=dword:00000001
"TCPNoDelay"=dword:00000001

When iperf shows limitations, try NTttcp for more accurate measurements:

# Server
ntttcpr -m 1,0,server.ip

# Client  
ntttcp -s -m 1,0,server.ip -t 60

For Intel NICs (particularly 82579V), ensure these settings:

  • Receive Buffers: 2048 (max)
  • Transmit Buffers: 2048 (max)
  • Disable: Large Send Offload (LSO), IPv6 Checksum Offloading
  • Enable: TCP Checksum Offloading (IPv4)

Use this PowerShell script to validate settings post-application:

Get-NetTCPSetting | Select SettingName, InitialCongestionWindow, CongestionProvider |
Format-Table -AutoSize

Get-NetOffloadGlobalSetting | Select *Enabled |
Format-List

netsh int tcp show global

When transferring large files (FTP/SVN/HTTP PUT/SCP) from Windows clients to Linux servers with 100-160ms latency, we consistently observed throughput plateaus at 2-5Mbit/s despite having 1Gbit/s synchronous bandwidth. Initial iperf tests revealed:

# Default Windows behavior
iperf -c 1.2.3.4 → 5.48 Mbits/sec
# Manual window adjustment 
iperf -w1M -c 1.2.3.4 → 89.6 Mbits/sec

Wireshark analysis showed:

  • Windows SYN packets advertise 64KB window with scale factor 1
  • Linux responds with 14KB window and scale factor 9 (512x multiplier)
  • Linux-to-Linux transfers achieve 110Mbits/sec without manual tuning

The core issue stems from Windows' conservative default TCP window scaling behavior. Key registry modifications can improve performance:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"TcpWindowSize"=dword:00100000
"GlobalMaxTcpWindowSize"=dword:00100000
"Tcp1323Opts"=dword:00000003
"EnablePMTUDiscovery"=dword:00000001
"EnablePMTUBHDetect"=dword:00000000
"TcpMaxDupAcks"=dword:00000002
"SackOpts"=dword:00000001

For Intel NICs (particularly 82579V chipsets):

  1. Update to latest drivers (v12.10.28.0+)
  2. Configure advanced settings:
    • Receive Buffers: 2048
    • Transmit Buffers: 2048
    • Disable Large Send Offload (LSO)
    • Enable Receive Side Scaling

Using NTttcp instead of iperf often yields better results:

# Server
ntttcpr -m 1,0,1.2.3.5
# Client 
ntttcp -s -m 1,0,1.2.3.5 -t 10

Typical NTttcp output shows 8MB/s throughput (vs 0.7MB/s with default iperf).

Ensure proper kernel parameters in /etc/sysctl.conf:

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216  
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1

To test configurations without production impact:

# Linux test server
dd if=/dev/zero | nc -l 5000 > /dev/null

# Windows test client
# PowerShell 5.1+:
1..10 | % { $sw = [Diagnostics.Stopwatch]::StartNew()
  $data = [byte[]]::new(1MB)
  $stream = [System.Net.Sockets.TcpClient]::new("1.2.3.4",5000).GetStream()
  $sw.Stop()
  $stream.Write($data,0,$data.Length)
  Write-Host "Latency: $($sw.ElapsedMilliseconds)ms"
  $stream.Close()
}

For GPO deployment of optimal settings:

<GroupPolicy xmlns="urn:schemas-microsoft-com:unattend">
  <Computer>
    <NetworkList>
      <Settings pass="specialize">
        <Component name="Microsoft-Windows-TCPIP" processorArchitecture="amd64">
          <TcpWindowSize>1048576</TcpWindowSize>
          <Tcp1323Opts>3</Tcp1323Opts>
          <EnableTCPChimney>false</EnableTCPChimney>
        </Component>
      </Settings>
    </NetworkList>
  </Computer>
</GroupPolicy>