When dealing with network operations in Linux environments (specifically Debian Wheezy), processes occasionally hang indefinitely during read operations. This manifests in two primary scenarios:
// Common strace outputs showing the hang
$ strace -p 12089
Process 12089 attached - interrupt to quit
read(5,
$ strace -p 17527
Process 17527 attached - interrupt to quit
recvfrom(3,
The issue appears across different protocols and tools:
- Python scripts downloading from S3 (using urllib/urllib2)
- SVN operations with externals (svn:// protocol)
- Both Python 2.5 and 2.7 environments
Several important characteristics of this behavior:
// Network connections remain established
$ sudo lsof -i | grep 12089
python 12089 user 5u IPv4 809917771 0t0 TCP my.server.net:35427->185-201.amazon.com:https (ESTABLISHED)
// Timeouts don't always help
import socket
socket.setdefaulttimeout(60) # Still hangs sometimes
First, check fundamental network health:
# Check for packet drops
ifconfig | grep dropped
# Verify TCP keepalive settings
cat /proc/sys/net/ipv4/tcp_keepalive_time
cat /proc/sys/net/ipv4/tcp_keepalive_probes
cat /proc/sys/net/ipv4/tcp_keepalive_intvl
For S3 downloads, implement robust timeout handling:
import urllib2
import socket
class RobustS3Downloader:
def __init__(self, timeout=30, retries=3):
self.timeout = timeout
self.retries = retries
def download(self, url):
last_error = None
for attempt in range(self.retries):
try:
req = urllib2.Request(url)
# Set both socket and urllib2 timeouts
socket.setdefaulttimeout(self.timeout)
return urllib2.urlopen(req, timeout=self.timeout).read()
except (urllib2.URLError, socket.timeout) as e:
last_error = e
continue
raise last_error
For hanging SVN operations, consider these approaches:
# 1. Use timeout wrapper
timeout 300 svn up --non-interactive
# 2. Alternative protocols
svn checkout http://... instead of svn://
# 3. Check externals configuration
svn propget svn:externals .
When processes hang, gather more diagnostic data:
# Check TCP connection state
ss -tnp | grep <pid>
# Network buffer inspection
cat /proc/<pid>/net/tcp | grep -i "<port>"
# Kernel stack trace
echo w > /proc/sysrq-trigger
dmesg | tail -n 30
System-wide configuration changes that can help:
# Adjust TCP keepalive settings (add to /etc/sysctl.conf)
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 10
# Apply changes
sysctl -p
# Limit process memory to trigger failsafes earlier
ulimit -v 500000 # 500MB
Consider more robust HTTP clients:
# Using requests with proper timeout
import requests
try:
response = requests.get(s3_url, timeout=(10, 30)) # (connect, read)
response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Download failed: {e}")
When dealing with network operations in Linux (specifically Debian Wheezy), we've encountered processes that hang indefinitely during read operations. Through strace
analysis, we consistently see processes stuck at:
recvfrom(3,
or
read(5,
The issue manifests across multiple scenarios:
- Python scripts downloading from S3 (both with and without explicit timeouts)
- SVN operations via
subprocess.Popen
- Different network endpoints (Amazon S3, telecommunity.com)
Checking network connections of hung processes reveals established TCP connections:
lsof -i | grep <pid>
python 12089 user 5u IPv4 809917771 TCP my.server.net:35427->185-201.amazon.com:https (ESTABLISHED)
For Python network operations, implement comprehensive timeout protection:
import socket
import urllib2
# Global socket timeout
socket.setdefaulttimeout(60)
# Per-request timeout with urllib2
req = urllib2.Request(url, timeout=30)
try:
response = urllib2.urlopen(req)
except socket.timeout:
# Handle timeout
pass
For more robust solutions, consider using requests
with proper session management:
import requests
from requests.adapters import HTTPAdapter
s = requests.Session()
adapter = HTTPAdapter(max_retries=3,
pool_connections=10,
pool_maxsize=10)
s.mount('http://', adapter)
s.mount('https://', adapter)
try:
r = s.get(url, timeout=(3.05, 30))
except requests.exceptions.Timeout:
# Handle timeout
pass
Adjust TCP keepalive settings in /etc/sysctl.conf
:
# Enable TCP keepalive
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
# Reduce TIME_WAIT period
net.ipv4.tcp_fin_timeout = 30
Apply changes with sysctl -p
.
For SVN or other subprocess operations, implement timeouts:
import subprocess
import threading
def run_command(cmd, timeout_sec):
proc = subprocess.Popen(cmd)
timer = threading.Timer(timeout_sec, proc.kill)
try:
timer.start()
proc.communicate()
finally:
timer.cancel()
Implement a watchdog for long-running network operations:
import signal
class Timeout:
def __init__(self, seconds=1, error_message='Timeout'):
self.seconds = seconds
self.error_message = error_message
def handle_timeout(self, signum, frame):
raise TimeoutError(self.error_message)
def __enter__(self):
signal.signal(signal.SIGALRM, self.handle_timeout)
signal.alarm(self.seconds)
def __exit__(self, type, value, traceback):
signal.alarm(0)
# Usage:
try:
with Timeout(seconds=30):
# Network operation here
pass
except TimeoutError:
# Handle timeout
pass
Consider async solutions for network-bound operations:
import asyncio
import aiohttp
async def fetch(session, url):
try:
async with session.get(url, timeout=30) as response:
return await response.text()
except asyncio.TimeoutError:
print(f"Timeout occurred for {url}")
return None
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'http://example.com')
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())