How to Prevent Azure from Closing Idle TCP Connections on Windows Server 2008 R2 VM


10 views

When running stateful applications on Azure VMs, many developers encounter unexpected TCP connection drops after periods of inactivity. The root cause typically stems from either:

  • Azure's platform-level network timeouts
  • Windows Server TCP stack behavior
  • Combination of both

Azure's networking infrastructure includes idle timeout mechanisms to conserve resources:

// Typical Azure TCP timeout values:
- Load Balancer: 4 minutes (default, adjustable)
- Platform-level NAT: 4 minutes (fixed)
- VM Network Stack: Varies by OS configuration

For Windows Server 2008 R2 VMs, implement these registry tweaks:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"KeepAliveTime"=dword:000927c0  // 1 hour in milliseconds
"KeepAliveInterval"=dword:000003e8  // 1 second
"TcpMaxDataRetransmissions"=dword:00000014  // 20 retries

Apply these changes and restart the server or affected services.

For proprietary applications where you can't modify the code:

  1. Implement a TCP keepalive wrapper using PowerShell:
# PowerShell TCP Keepalive Monitor
while($true) {
    $connection = Test-NetConnection -ComputerName yourserver -Port yourport
    if (-not $connection.TcpTestSucceeded) {
        Start-Process "YourApp.exe" -ArgumentList "/reconnect"
    }
    Start-Sleep -Seconds 30
}

Create a persistent tunnel using autossh:

autossh -M 0 -N -f -L 3306:localhost:3306 user@yourserver

This maintains an SSH tunnel that automatically reconnects if dropped.

To confirm your changes worked:

netstat -ano | findstr "ESTABLISHED"
wireshark (filter: tcp.port == yourport and tcp.flags.syn == 1)

Monitor for TCP keepalive packets (ACK flags) every configured interval.


When running proprietary client-server applications on Azure VMs, you might encounter unexpected TCP connection drops after periods of inactivity. This behavior stems from multiple layers in the Azure/Windows networking stack:

  • Azure's default Load Balancer has a 4-minute idle timeout (even for single instances)
  • Windows Server TCP stack has its own keepalive mechanisms
  • Azure NSG (Network Security Group) flow timeout settings

First, confirm where the timeout is occurring by running these PowerShell commands on your VM:

# Check TCP keepalive parameters
Get-NetTCPConnection -State Established | Select-Object -Property *
 
# Check current keepalive settings (default is 2 hours)
netsh interface tcp show global

Option 1: Registry Modification (Windows-level fix)

Adjust the TCP keepalive parameters at the OS level:

# Set keepalive time to 5 minutes (300000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" 
    -Name "KeepAliveTime" -Value 300000 -Type DWord

# Set keepalive interval to 1 minute (60000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" 
    -Name "KeepAliveInterval" -Value 60000 -Type DWord

# Restart TCP/IP stack
Restart-Service -Name Dnscache -Force

Option 2: Application-Level Workaround

If you can't modify the proprietary app, create a simple PowerShell watchdog:

# PersistentConnectionWatcher.ps1
$targetPort = 12345 # Your application port
$interval = 60 # Seconds between checks

while($true) {
    $conn = Get-NetTCPConnection -LocalPort $targetPort -State Established
    if(-not $conn) {
        Write-Host "Connection lost - $(Get-Date)"
        # Add reconnection logic here if possible
    }
    Start-Sleep -Seconds $interval
}

Option 3: Azure NSG Flow Timeout Adjustment

For VMs with Network Security Groups attached:

# Azure CLI command to modify flow timeout
az network nsg rule update \
    --resource-group YourResourceGroup \
    --nsg-name YourNSGName \
    --name YourRuleName \
    --set idleTimeoutInMinutes=30

Verify your changes with Wireshark or this PowerShell snippet:

# Monitor TCP connections continuously
while($true) {
    Get-NetTCPConnection -LocalPort YourAppPort | 
        Select-Object LocalAddress,LocalPort,RemoteAddress,RemotePort,State,@{Name="Time";Expression={Get-Date}}
    Start-Sleep -Seconds 10
}

Remember that these changes might affect other applications on the same VM. Always test in a staging environment first.