How to Prevent Azure from Closing Idle TCP Connections on Windows Server 2008 R2 VM


2 views

When running stateful applications on Azure VMs, many developers encounter unexpected TCP connection drops after periods of inactivity. The root cause typically stems from either:

  • Azure's platform-level network timeouts
  • Windows Server TCP stack behavior
  • Combination of both

Azure's networking infrastructure includes idle timeout mechanisms to conserve resources:

// Typical Azure TCP timeout values:
- Load Balancer: 4 minutes (default, adjustable)
- Platform-level NAT: 4 minutes (fixed)
- VM Network Stack: Varies by OS configuration

For Windows Server 2008 R2 VMs, implement these registry tweaks:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"KeepAliveTime"=dword:000927c0  // 1 hour in milliseconds
"KeepAliveInterval"=dword:000003e8  // 1 second
"TcpMaxDataRetransmissions"=dword:00000014  // 20 retries

Apply these changes and restart the server or affected services.

For proprietary applications where you can't modify the code:

  1. Implement a TCP keepalive wrapper using PowerShell:
# PowerShell TCP Keepalive Monitor
while($true) {
    $connection = Test-NetConnection -ComputerName yourserver -Port yourport
    if (-not $connection.TcpTestSucceeded) {
        Start-Process "YourApp.exe" -ArgumentList "/reconnect"
    }
    Start-Sleep -Seconds 30
}

Create a persistent tunnel using autossh:

autossh -M 0 -N -f -L 3306:localhost:3306 user@yourserver

This maintains an SSH tunnel that automatically reconnects if dropped.

To confirm your changes worked:

netstat -ano | findstr "ESTABLISHED"
wireshark (filter: tcp.port == yourport and tcp.flags.syn == 1)

Monitor for TCP keepalive packets (ACK flags) every configured interval.


When running proprietary client-server applications on Azure VMs, you might encounter unexpected TCP connection drops after periods of inactivity. This behavior stems from multiple layers in the Azure/Windows networking stack:

  • Azure's default Load Balancer has a 4-minute idle timeout (even for single instances)
  • Windows Server TCP stack has its own keepalive mechanisms
  • Azure NSG (Network Security Group) flow timeout settings

First, confirm where the timeout is occurring by running these PowerShell commands on your VM:

# Check TCP keepalive parameters
Get-NetTCPConnection -State Established | Select-Object -Property *
 
# Check current keepalive settings (default is 2 hours)
netsh interface tcp show global

Option 1: Registry Modification (Windows-level fix)

Adjust the TCP keepalive parameters at the OS level:

# Set keepalive time to 5 minutes (300000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" 
    -Name "KeepAliveTime" -Value 300000 -Type DWord

# Set keepalive interval to 1 minute (60000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" 
    -Name "KeepAliveInterval" -Value 60000 -Type DWord

# Restart TCP/IP stack
Restart-Service -Name Dnscache -Force

Option 2: Application-Level Workaround

If you can't modify the proprietary app, create a simple PowerShell watchdog:

# PersistentConnectionWatcher.ps1
$targetPort = 12345 # Your application port
$interval = 60 # Seconds between checks

while($true) {
    $conn = Get-NetTCPConnection -LocalPort $targetPort -State Established
    if(-not $conn) {
        Write-Host "Connection lost - $(Get-Date)"
        # Add reconnection logic here if possible
    }
    Start-Sleep -Seconds $interval
}

Option 3: Azure NSG Flow Timeout Adjustment

For VMs with Network Security Groups attached:

# Azure CLI command to modify flow timeout
az network nsg rule update \
    --resource-group YourResourceGroup \
    --nsg-name YourNSGName \
    --name YourRuleName \
    --set idleTimeoutInMinutes=30

Verify your changes with Wireshark or this PowerShell snippet:

# Monitor TCP connections continuously
while($true) {
    Get-NetTCPConnection -LocalPort YourAppPort | 
        Select-Object LocalAddress,LocalPort,RemoteAddress,RemotePort,State,@{Name="Time";Expression={Get-Date}}
    Start-Sleep -Seconds 10
}

Remember that these changes might affect other applications on the same VM. Always test in a staging environment first.