When debugging connection resets in load-balanced environments, packet-level analysis becomes crucial. Your observation shows RST packets appearing with different source IPs depending on capture location - this is characteristic of middlebox interference.
Here's what's happening in your architecture:
Client (1.1.1.1) → Load Balancer (2.2.2.2) → Server (3.3.3.3)
The asymmetric RST visibility occurs because:
- When LB initiates reset: Source=LB_IP (2.2.2.2) visible to client
- Server sees the connection as client-originated (1.1.1.1)
Use this Python snippet to detect RST packets:
from scapy.all import * def detect_rst(pkt): if TCP in pkt and pkt[TCP].flags & 0x04: # RST flag print(f"RST from {pkt[IP].src} to {pkt[IP].dst}") sniff(filter="tcp", prn=detect_rst)
These settings often cause premature connection termination:
# Nginx example (too aggressive timeouts) proxy_connect_timeout 5m; proxy_send_timeout 5m; proxy_read_timeout 5m; keepalive_timeout 5m;
Configure servers to maintain connections:
# Linux sysctl settings echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
For comprehensive analysis:
# Client-side filter tcp.flags.reset == 1 && ip.src == your_lb_ip # Server-side filter tcp.flags.reset == 1 && tcp.port == your_app_port
For AWS ALB/ELB:
# Set idle timeout (default 60s) aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn your_arn \ --attributes Key=idle_timeout.timeout_seconds,Value=300
When dealing with load balancers (LBs) and TCP connections, one common issue is unexpected connection resets (RST packets) during idle periods. In your case, you observed:
- 3 backend servers behind an LB
- Connections dropping after 5 minutes of inactivity
- RST packets appearing in Wireshark captures
- Conflicting source IPs in client/server packet captures
Most LBs operate in one of these modes:
1. Transparent Proxy (Layer 4):
- Preserves original client IP
- Forwards packets unchanged
2. Application Proxy (Layer 7):
- Terminates TCP connection
- Creates new connection to backend
- Modifies packet headers
Your observations reveal an important behavior:
Client-side capture: Shows LB IP as RST source
Server-side capture: Shows client IP as RST source
This suggests your LB is:
- Receiving RST from backend server
- Forwarding it to client while maintaining the illusion of direct connection
Here's how to verify LB behavior using Python socket programming:
import socket
from time import sleep
def test_connection(host, port):
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(300) # 5 minute timeout
s.connect((host, port))
print("Connected - now idling...")
sleep(350) # Wait longer than timeout
s.send(b"PING") # Should fail if connection reset
except ConnectionResetError:
print("Connection was reset by peer")
To prevent unwanted resets:
# Server-side keepalive configuration (Linux example)
echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
For cloud LBs (AWS ALB example):
aws elbv2 modify-target-group-attributes \
--target-group-arn YOUR_TG_ARN \
--attributes Key=deregistration_delay.timeout_seconds,Value=600
Use this tcpdump command to monitor RST packets:
tcpdump -i any 'tcp[tcpflags] & (tcp-rst) != 0' -nn -v
Key things to verify:
- Sequence numbers match between client/server captures
- Timestamps correlate with idle timeout periods
- TTL values to identify network hops
Different LBs handle this differently:
LB Type | RST Behavior |
---|---|
AWS ALB | Sends TCP RST to client when backend fails |
Nginx | Can be configured with proxy_timeout |
HAProxy | Uses timeout server setting |