When dealing with high-traffic web applications on Windows Server 2008 (especially in cloud environments like AWS EC2), the accumulation of TCP connections in TIME_WAIT state can become a serious bottleneck. The TIME_WAIT state is a normal TCP protocol mechanism where the local endpoint (our server) maintains the connection information for 2*MSL (Maximum Segment Lifetime) after closing the connection, typically 240 seconds by default on Windows.
In your case, several factors contribute to this issue:
- Ephemeral port exhaustion: Windows Server 2008 has a default dynamic port range of 49152-65535 (16383 ports). With ~85k connections in TIME_WAIT, you're exhausting available ports.
- Keep-alive settings: While keep-alive improves performance, it can exacerbate TIME_WAIT accumulation with high traffic.
- TCP stack limitations: Older Windows versions have less efficient TIME_WAIT handling compared to modern systems.
Add these registry settings under HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "TcpTimedWaitDelay"=dword:0000001e ; 30 seconds (hex 1E) "MaxUserPort"=dword:0000fffe ; 65534 (increase ephemeral ports) "StrictTimeWaitSeqCheck"=dword:00000001 ; Enable strict checking "MaxHashTableSize"=dword:00010000 ; Increase TCP hash table
After applying, reboot the server or run:
netsh int ipv4 set dynamicport tcp start=1025 num=64510 netsh int ipv4 set dynamicport udp start=1025 num=64510
Adjust your web server settings to better manage connections:
# In httpd.conf KeepAlive On KeepAliveTimeout 5 MaxKeepAliveRequests 100 <IfModule mpm_winnt_module> ThreadsPerChild 250 MaxConnectionsPerChild 10000 </IfModule> # In Tomcat's server.xml <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" maxThreads="500" acceptCount="1000" enableLookups="false" maxKeepAliveRequests="100" keepAliveTimeout="5000"/>
Consider implementing TCP connection pooling at application level. Here's a Java example using Apache HttpClient:
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager( RegistryBuilder.<ConnectionSocketFactory>create() .register("http", PlainConnectionSocketFactory.getSocketFactory()) .build()); cm.setMaxTotal(500); cm.setDefaultMaxPerRoute(100); RequestConfig config = RequestConfig.custom() .setConnectTimeout(5000) .setSocketTimeout(15000) .build(); CloseableHttpClient httpClient = HttpClients.custom() .setConnectionManager(cm) .setDefaultRequestConfig(config) .build();
Create a PowerShell script to monitor TIME_WAIT states:
# Get TIME_WAIT connections count $timeWaitCount = (netstat -ano | Select-String "TIME_WAIT").Count Write-Host "Current TIME_WAIT connections: $timeWaitCount" # Check ephemeral port usage $usedPorts = (netstat -ano | Select-String "TCP").Count $availablePorts = 65535 - 49152 $portUsage = ($usedPorts / $availablePorts) * 100 Write-Host "Ephemeral port usage: $portUsage%" # If critical, recycle application pool if ($portUsage -gt 90) { Import-Module WebAdministration Restart-WebAppPool -Name "YourAppPool" Write-Host "Application pool recycled due to high port usage" }
For AWS environments, consider these additional measures:
- Use a Network Load Balancer (NLB) instead of direct EC2 connections
- Implement Auto Scaling to distribute load across multiple instances
- Consider migrating to a newer Windows Server version with better TCP stack
- Use EC2 Launch Templates to ensure consistent TCP/IP configuration across instances
When dealing with high-traffic web servers on Windows Server 2008 (especially on AWS EC2), you might encounter a situation where thousands of TCP connections remain stuck in TIME_WAIT state. This occurs even after stopping your web server (Apache httpd + Tomcat 6.02 in this case), and can eventually lead to connection exhaustion.
Key indicators we're seeing:
- 69,250+ connections on port 80 in TIME_WAIT
- 15,000 additional connections on other ports
- TCPv4 Active Connections: 145K
- TCPv4 Passive Connections: 475K
- Connection failures and resets occurring
The default TIME_WAIT duration on Windows is 4 minutes (2*MSL), but several factors can prevent proper cleanup:
- Insufficient ephemeral port range for the connection volume
- TCP connection recycling not properly configured
- Kernel resources being exhausted
- Possible connection leaks in the web stack
First, let's check and modify the TCP/IP parameters:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"TcpTimedWaitDelay"=dword:0000001e
"MaxUserPort"=dword:0000fffe
"TcpNumConnections"=dword:00fffffe
This sets:
- TcpTimedWaitDelay to 30 seconds (0x1e)
- MaxUserPort to 65534 (0xfffe)
- TcpNumConnections to 16777214 (0xfffffe)
For automated management, create a PowerShell script to monitor and alert:
# TIME_WAIT connection monitor
$timeWaitCount = (netstat -ano | Select-String "TIME_WAIT").Count
$threshold = 50000
if ($timeWaitCount -gt $threshold) {
Write-EventLog -LogName Application -Source "TCP Monitor" -EntryType Warning -EventId 1001 -Message "TIME_WAIT connections exceeded threshold: $timeWaitCount"
# Optional: Increase dynamic port range temporarily
netsh int ipv4 set dynamicport tcp start=10000 num=55535
}
For Apache httpd, adjust these settings in httpd.conf:
KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 100
# For Tomcat in AJP connector
<Connector port="8009" protocol="AJP/1.3"
connectionTimeout="20000"
maxThreads="500"
tcpNoDelay="true"
socket.soLingerOn="true"
socket.soLingerTime="1"
socket.keepAlive="true" />
If you absolutely must clear TIME_WAIT connections without rebooting, you can use this (risky) approach:
# WARNING: This will drop all connections including established ones
netsh int ipv4 reset
netsh int ipv6 reset
netsh winsock reset
# Then restart the TCP/IP service
sc stop tcpip
sc start tcpip
Remember that this is disruptive and should only be used in emergencies.
Implement these architectural improvements:
- Upgrade to a newer Windows Server version with better TCP stack
- Consider using connection pooling at application level
- Implement proper connection termination in your application code
- Monitor TIME_WAIT connections as part of your regular health checks