When network teams insist "our firewall has no idle timeout" but connections still drop after exactly 40 minutes, you're likely dealing with an unacknowledged session tracking limitation. Many enterprise firewalls implement hard-coded session timeouts despite vendor claims to the contrary.
Your initial configuration (tcp_keepalive_time=300, tcp_keepalive_intvl=300, tcp_keepalive_probes=30000
) worked because:
- The 5-minute keepalive interval prevented NAT/firewall session table expiration
- Extremely high probe count effectively made connections persistent
The problematic configuration (time=300,intvl=180,probes=10
) reveals several firewall behaviors:
# Current problematic sysctl settings
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 180
net.ipv4.tcp_keepalive_probes = 10
Firewalls often implement these invisible behaviors:
- TCP Middlebox Interference: Some firewalls strip or respond to keepalive packets themselves
- Asymmetric Session Tracking: Firewalls may track only client→server traffic as "activity"
- Proprietary Health Checks: Vendor-specific keepalive mechanisms override standard TCP
To confirm firewall interference:
# On Linux server:
tcpdump -ni any "tcp port 1025 and (tcp[13] & 0x7f != 0)"
# On client (if accessible):
tcpdump -ni any "tcp port 1025 and (tcp[13] & 0x7f != 0)"
Key findings to look for:
- Missing keepalive packets on client-side captures
- Unexpected RST packets after exactly 2400 seconds (40 min)
- Firewall-generated ACKs instead of client responses
When standard TCP keepalive fails:
1. Application-Level Keepalive (for Teradata/SSH):
# Teradata-specific heartbeat (requires client modification)
HEARTBEAT 30; -- Send empty query every 30 seconds
# SSH configuration option:
ServerAliveInterval 240
2. Firewall Policy Workarounds:
# iptables workaround for outbound connections
iptables -I OUTPUT -p tcp --dport 1025 -j ACCEPT
iptables -I INPUT -p tcp --sport 1025 -m state --state ESTABLISHED -j ACCEPT
Common firewall vendors with known TCP session issues:
Vendor | Default Timeout | Hidden Setting |
---|---|---|
Palo Alto | 30 min | tcp-timeout |
Cisco ASA | 60 min | timeout conn |
FortiGate | 3600s | set timeout-policy |
The most reliable solution is to coordinate with network teams to:
- Identify the actual session timeout value
- Configure keepalive intervals to 50-75% of that value
- Implement bidirectional application heartbeats
When our Teradata database connections started dropping like flies after exactly 40 minutes of inactivity, we initially suspected the firewall's idle timeout. But the network team insisted their firewall had no such timeout configured. This led us down a rabbit hole of TCP keepalive tuning and packet analysis.
Our first successful configuration used extremely aggressive keepalive settings:
# sysctl settings that maintained connections indefinitely net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 300 net.ipv4.tcp_keepalive_probes = 30000
This brute-force approach kept connections alive for days, but wasn't ideal for detecting dead clients.
When we tried more reasonable settings to balance connection maintenance and dead peer detection:
# More balanced keepalive configuration net.ipv4.tcp_keepalive_time = 300 # 5 minutes net.ipv4.tcp_keepalive_intvl = 180 # 3 minutes net.ipv4.tcp_keepalive_probes = 10
We expected:
- Active probes every 5 minutes for alive clients
- Connection termination after ~33 minutes for dead clients (300 + 9*180 seconds)
Instead, Wireshark showed zero keepalive packets traversing the firewall, and connections still dropped at ~40 minutes.
Several firewall behaviors could explain this:
- TCP Normalization: Some firewalls rewrite TCP options, potentially stripping keepalive capability
- Proxy Behavior: Stateful inspection firewalls may maintain their own connection tracking
- Silent ACKing: The firewall might respond to keepalives on behalf of clients
To isolate the issue, we recommend:
# Check if keepalive is actually enabled per socket ss -e -n -p | grep -A1 "1025" # Alternative: check via /proc cat /proc/net/tcp | grep -A1 "0401" # 0401 is hex for port 1025
Additionally, run simultaneous packet captures on both sides of the firewall to verify where packets disappear.
When OS-level keepalives fail, implement application heartbeat:
# Python example of application keepalive import socket import time def maintain_connection(sock): while True: try: sock.send(b'\x00') # Null byte heartbeat time.sleep(240) # Send before firewall timeout except socket.error: # Handle disconnection break
If you can identify the firewall type:
- Cisco ASA: Adjust TCP idle timeout with
timeout conn
- Palo Alto: Modify TCP timeout in security policy
- Check Point: Adjust 'keepalive' service settings
Based on our experience:
- Verify keepalives are actually being sent at the socket level
- Push for firewall configuration details or exception rules
- Consider application-level heartbeat as a fallback
- Document the 40-minute pattern as evidence for network teams