Optimizing Large File Transfers Over High-Latency WAN Links: Multithreaded TCP Solutions

When dealing with large file transfers (e.g., 160GB Oracle dumps) over high-speed WAN connections (100Mbps+) with latency (10ms+), traditional single-threaded TCP connections often underutilize available bandwidth due to:

TCP's congestion control algorithms
ACK delay constraints
Window size limitations

Initial tests with default settings showed abysmal 5Mbps throughput. Adjusting window size improved this to 45Mbps, but the real breakthrough came when using parallel connections:

# Single connection test
iperf -c remote_host -t 60 -i 10

# Parallel connection test (4 streams)
iperf -c remote_host -t 60 -i 10 -P 4

The parallel test achieved 25Mbps per stream, saturating the 100Mbps link.

Even with optimized TCP settings (window size=256KB, MTU=1500), single FTP transfers capped at 20Mbps. Parallel FTP transfers hit disk I/O bottlenecks:

Multiple concurrent disk seeks
No sequential read/write patterns
Excessive head movement on HDDs

We need tools that can:

Split files into logical chunks
Transfer chunks simultaneously
Reassemble at destination
Minimize disk thrashing

1. aria2 (Cross-platform)

aria2c --split=4 --max-connection-per-server=4 http://example.com/largefile.dmp

2. lftp (Linux-focused)

lftp -e "pget -n 4 -c ftp://example.com/largefile.dmp; quit"

3. Custom Python Implementation

import concurrent.futures
import requests

def download_chunk(url, start, end, chunk_num):
    headers = {'Range': f'bytes={start}-'}
    r = requests.get(url, headers=headers, stream=True)
    with open(f'chunk_{chunk_num}', 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

def parallel_download(url, num_threads=4):
    file_size = int(requests.head(url).headers['Content-Length'])
    chunk_size = file_size // num_threads
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = []
        for i in range(num_threads):
            start = i * chunk_size
            end = start + chunk_size - 1 if i != num_threads -1 else ''
            futures.append(executor.submit(download_chunk, url, start, end, i))
        
        concurrent.futures.wait(futures)

When dealing with parallel transfers:

Use --file-allocation=prealloc in aria2
Consider ZFS or other advanced filesystems with good concurrent I/O
For HDDs, increase read-ahead: blockdev --setra 8192 /dev/sdX
Use --enable-mmap in lftp for memory mapping

For enterprise environments:

UDP-based protocols: Aspera, Tsunami UDP
Application-layer solutions: BBCP, GridFTP
Cloud services: AWS S3 multipart uploads

Calculate the break-even point using:

def should_ship_physically(file_size_gb, bandwidth_mbps, latency_ms, shipping_hours):
    transfer_time = (file_size_gb * 8000) / bandwidth_mbps  # in seconds
    effective_bandwidth = (file_size_gb * 8) / (transfer_time/3600)  # in Mbps
    
    # If shipping is faster than 2x transfer time
    return shipping_hours < (transfer_time/3600)/2

For our 160GB/100Mbps case: ~3.5 hours transfer vs. overnight shipping.

When transferring massive files like Oracle dumps (160GB+) over WAN links with 100Mbps bandwidth but 10ms latency, traditional single-threaded protocols hit fundamental TCP limitations. Our iperf tests revealed:

# Single-threaded iperf (default window size)
$ iperf -c remote_host
[  3]  0.0-10.0 sec  5.00 MBytes  5.00 Mbits/sec

# With tuned window size
$ iperf -c remote_host -w 512K
[  3]  0.0-10.0 sec  45.00 MBytes  45.00 Mbits/sec

# Multi-threaded (4 connections)
$ iperf -c remote_host -P 4
[SUM]  0.0-10.0 sec  100.00 MBytes  100.00 Mbits/sec

Even with optimal TCP settings (window scaling, MTU=1500), single-stream FTP transfers plateau at ~20Mbps due to:

ACK clocking delays in high-latency paths
TCP's congestion avoidance algorithms
Disk seek penalties during concurrent transfers

These tools implement BitTorrent-like chunking for single-source transfers:

1. BBCP (Advanced Multi-Stream Copy)

# Sender side (8 threads, 16MB chunks)
bbcp -P 8 -s 16 -w 8M -S "bbcp -P 8 -s 16" /path/to/dump.dmp user@remote:/dest/

# Key flags:
# -P: parallel streams
# -s: chunk size (tune based on RTT)
# -w: socket buffer size

2. UFTP (UDP-based File Transfer)

# Server (uses multicast UDP)
uftpd -m 224.1.2.3 -p 10432 /export/path

# Client (8 threads)
uftpc -t 8 -m 224.1.2.3 -p 10432 -o /dest/file.dmp

3. Aspera FASP (Commercial Alternative)

Example config for 100Mbps link:

# aspera.conf
fasp.bw_max_target=100M
fasp.transfer_priority=10
fasp.udp.max_datagram_size=1472

When parallelizing transfers, mitigate disk contention with:

# Linux: Use ionice for non-blocking I/O
ionice -c 2 -n 0 bbcp [options]

# Windows: Enable SMB Direct (RDMA)
Set-SmbClientConfiguration -EncryptionDisabled $true -Force

Essential sysctl adjustments for Linux servers:

# /etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_slow_start_after_idle = 0

Calculate the break-even point:

# Python bandwidth comparison
physical_transfer_time = (drive_prep + shipping_hours)
wan_transfer_time = (file_size_gb * 8) / (effective_wan_mbps)

if physical_transfer_time < wan_transfer_time:
    print("FedEx wins for files >", critical_size, "GB")

ServerDevWorker

Optimizing Large File Transfers Over High-Latency WAN Links: Multithreaded TCP Solutions

1. BBCP (Advanced Multi-Stream Copy)

2. UFTP (UDP-based File Transfer)

3. Aspera FASP (Commercial Alternative)

Related Articles