When running stateful applications on Azure VMs, many developers encounter unexpected TCP connection drops after periods of inactivity. The root cause typically stems from either:
- Azure's platform-level network timeouts
- Windows Server TCP stack behavior
- Combination of both
Azure's networking infrastructure includes idle timeout mechanisms to conserve resources:
// Typical Azure TCP timeout values:
- Load Balancer: 4 minutes (default, adjustable)
- Platform-level NAT: 4 minutes (fixed)
- VM Network Stack: Varies by OS configuration
For Windows Server 2008 R2 VMs, implement these registry tweaks:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"KeepAliveTime"=dword:000927c0 // 1 hour in milliseconds
"KeepAliveInterval"=dword:000003e8 // 1 second
"TcpMaxDataRetransmissions"=dword:00000014 // 20 retries
Apply these changes and restart the server or affected services.
For proprietary applications where you can't modify the code:
- Implement a TCP keepalive wrapper using PowerShell:
# PowerShell TCP Keepalive Monitor
while($true) {
$connection = Test-NetConnection -ComputerName yourserver -Port yourport
if (-not $connection.TcpTestSucceeded) {
Start-Process "YourApp.exe" -ArgumentList "/reconnect"
}
Start-Sleep -Seconds 30
}
Create a persistent tunnel using autossh:
autossh -M 0 -N -f -L 3306:localhost:3306 user@yourserver
This maintains an SSH tunnel that automatically reconnects if dropped.
To confirm your changes worked:
netstat -ano | findstr "ESTABLISHED"
wireshark (filter: tcp.port == yourport and tcp.flags.syn == 1)
Monitor for TCP keepalive packets (ACK flags) every configured interval.
When running proprietary client-server applications on Azure VMs, you might encounter unexpected TCP connection drops after periods of inactivity. This behavior stems from multiple layers in the Azure/Windows networking stack:
- Azure's default Load Balancer has a 4-minute idle timeout (even for single instances)
- Windows Server TCP stack has its own keepalive mechanisms
- Azure NSG (Network Security Group) flow timeout settings
First, confirm where the timeout is occurring by running these PowerShell commands on your VM:
# Check TCP keepalive parameters
Get-NetTCPConnection -State Established | Select-Object -Property *
# Check current keepalive settings (default is 2 hours)
netsh interface tcp show global
Option 1: Registry Modification (Windows-level fix)
Adjust the TCP keepalive parameters at the OS level:
# Set keepalive time to 5 minutes (300000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
-Name "KeepAliveTime" -Value 300000 -Type DWord
# Set keepalive interval to 1 minute (60000 ms)
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
-Name "KeepAliveInterval" -Value 60000 -Type DWord
# Restart TCP/IP stack
Restart-Service -Name Dnscache -Force
Option 2: Application-Level Workaround
If you can't modify the proprietary app, create a simple PowerShell watchdog:
# PersistentConnectionWatcher.ps1
$targetPort = 12345 # Your application port
$interval = 60 # Seconds between checks
while($true) {
$conn = Get-NetTCPConnection -LocalPort $targetPort -State Established
if(-not $conn) {
Write-Host "Connection lost - $(Get-Date)"
# Add reconnection logic here if possible
}
Start-Sleep -Seconds $interval
}
Option 3: Azure NSG Flow Timeout Adjustment
For VMs with Network Security Groups attached:
# Azure CLI command to modify flow timeout
az network nsg rule update \
--resource-group YourResourceGroup \
--nsg-name YourNSGName \
--name YourRuleName \
--set idleTimeoutInMinutes=30
Verify your changes with Wireshark or this PowerShell snippet:
# Monitor TCP connections continuously
while($true) {
Get-NetTCPConnection -LocalPort YourAppPort |
Select-Object LocalAddress,LocalPort,RemoteAddress,RemotePort,State,@{Name="Time";Expression={Get-Date}}
Start-Sleep -Seconds 10
}
Remember that these changes might affect other applications on the same VM. Always test in a staging environment first.