When dealing with large file transfers (e.g., 160GB Oracle dumps) over high-speed WAN connections (100Mbps+) with latency (10ms+), traditional single-threaded TCP connections often underutilize available bandwidth due to:
- TCP's congestion control algorithms
- ACK delay constraints
- Window size limitations
Initial tests with default settings showed abysmal 5Mbps throughput. Adjusting window size improved this to 45Mbps, but the real breakthrough came when using parallel connections:
# Single connection test
iperf -c remote_host -t 60 -i 10
# Parallel connection test (4 streams)
iperf -c remote_host -t 60 -i 10 -P 4
The parallel test achieved 25Mbps per stream, saturating the 100Mbps link.
Even with optimized TCP settings (window size=256KB, MTU=1500), single FTP transfers capped at 20Mbps. Parallel FTP transfers hit disk I/O bottlenecks:
- Multiple concurrent disk seeks
- No sequential read/write patterns
- Excessive head movement on HDDs
We need tools that can:
- Split files into logical chunks
- Transfer chunks simultaneously
- Reassemble at destination
- Minimize disk thrashing
1. aria2 (Cross-platform)
aria2c --split=4 --max-connection-per-server=4 http://example.com/largefile.dmp
2. lftp (Linux-focused)
lftp -e "pget -n 4 -c ftp://example.com/largefile.dmp; quit"
3. Custom Python Implementation
import concurrent.futures
import requests
def download_chunk(url, start, end, chunk_num):
headers = {'Range': f'bytes={start}-'}
r = requests.get(url, headers=headers, stream=True)
with open(f'chunk_{chunk_num}', 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
def parallel_download(url, num_threads=4):
file_size = int(requests.head(url).headers['Content-Length'])
chunk_size = file_size // num_threads
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
futures = []
for i in range(num_threads):
start = i * chunk_size
end = start + chunk_size - 1 if i != num_threads -1 else ''
futures.append(executor.submit(download_chunk, url, start, end, i))
concurrent.futures.wait(futures)
When dealing with parallel transfers:
- Use --file-allocation=prealloc in aria2
- Consider ZFS or other advanced filesystems with good concurrent I/O
- For HDDs, increase read-ahead:
blockdev --setra 8192 /dev/sdX
- Use --enable-mmap in lftp for memory mapping
For enterprise environments:
- UDP-based protocols: Aspera, Tsunami UDP
- Application-layer solutions: BBCP, GridFTP
- Cloud services: AWS S3 multipart uploads
Calculate the break-even point using:
def should_ship_physically(file_size_gb, bandwidth_mbps, latency_ms, shipping_hours):
transfer_time = (file_size_gb * 8000) / bandwidth_mbps # in seconds
effective_bandwidth = (file_size_gb * 8) / (transfer_time/3600) # in Mbps
# If shipping is faster than 2x transfer time
return shipping_hours < (transfer_time/3600)/2
For our 160GB/100Mbps case: ~3.5 hours transfer vs. overnight shipping.
When transferring massive files like Oracle dumps (160GB+) over WAN links with 100Mbps bandwidth but 10ms latency, traditional single-threaded protocols hit fundamental TCP limitations. Our iperf tests revealed:
# Single-threaded iperf (default window size)
$ iperf -c remote_host
[ 3] 0.0-10.0 sec 5.00 MBytes 5.00 Mbits/sec
# With tuned window size
$ iperf -c remote_host -w 512K
[ 3] 0.0-10.0 sec 45.00 MBytes 45.00 Mbits/sec
# Multi-threaded (4 connections)
$ iperf -c remote_host -P 4
[SUM] 0.0-10.0 sec 100.00 MBytes 100.00 Mbits/sec
Even with optimal TCP settings (window scaling, MTU=1500), single-stream FTP transfers plateau at ~20Mbps due to:
- ACK clocking delays in high-latency paths
- TCP's congestion avoidance algorithms
- Disk seek penalties during concurrent transfers
These tools implement BitTorrent-like chunking for single-source transfers:
1. BBCP (Advanced Multi-Stream Copy)
# Sender side (8 threads, 16MB chunks)
bbcp -P 8 -s 16 -w 8M -S "bbcp -P 8 -s 16" /path/to/dump.dmp user@remote:/dest/
# Key flags:
# -P: parallel streams
# -s: chunk size (tune based on RTT)
# -w: socket buffer size
2. UFTP (UDP-based File Transfer)
# Server (uses multicast UDP)
uftpd -m 224.1.2.3 -p 10432 /export/path
# Client (8 threads)
uftpc -t 8 -m 224.1.2.3 -p 10432 -o /dest/file.dmp
3. Aspera FASP (Commercial Alternative)
Example config for 100Mbps link:
# aspera.conf
fasp.bw_max_target=100M
fasp.transfer_priority=10
fasp.udp.max_datagram_size=1472
When parallelizing transfers, mitigate disk contention with:
# Linux: Use ionice for non-blocking I/O
ionice -c 2 -n 0 bbcp [options]
# Windows: Enable SMB Direct (RDMA)
Set-SmbClientConfiguration -EncryptionDisabled $true -Force
Essential sysctl adjustments for Linux servers:
# /etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_slow_start_after_idle = 0
Calculate the break-even point:
# Python bandwidth comparison
physical_transfer_time = (drive_prep + shipping_hours)
wan_transfer_time = (file_size_gb * 8) / (effective_wan_mbps)
if physical_transfer_time < wan_transfer_time:
print("FedEx wins for files >", critical_size, "GB")