Command Line Tools to Split PCAP Files by TCP Connection (With Memory-Efficient Solutions)


2 views

Working with large packet capture files (700MB+ in your case) presents unique challenges. Wireshark's "Follow TCP Stream" is excellent for analysis but becomes resource-intensive with big files. We need command-line alternatives that can:

  • Process files without loading entire captures into memory
  • Maintain original PCAP format for downstream tools
  • Handle complex TCP session scenarios

The most robust solution is tshark (Wireshark's CLI version). Here's how to split by TCP connection:

# First, list all TCP streams
tshark -r large_capture.pcap -T fields -e tcp.stream | sort -n | uniq > streams.txt

# Then extract each stream to separate files
while read stream; do
    tshark -r large_capture.pcap -Y "tcp.stream eq $stream" -w stream_${stream}.pcap
done < streams.txt

This method is memory-efficient as it processes the file twice without loading everything into RAM.

For simpler cases, Wireshark's editcap can work when combined with other tools:

# Extract packets for a specific TCP connection (IPs and ports)
editcap -F pcap -r large_capture.pcap connection1.pcap \
    "host 192.168.1.100 and host 192.168.1.200 and port 443 and port 54321"

This specialized tool handles TCP stream extraction natively:

tcpslice -r -w output_%n.pcap -s "host 10.0.0.1 and host 10.0.0.2" input.pcap

The %n in the filename gets replaced with sequential numbers for each matching stream.

For your 700MB file, here's how these tools compare:

Tool Memory Usage Output Format Complexity
tshark Low PCAP Medium
editcap Very Low PCAP Simple
tcpslice Low PCAP Medium
tcpflow High Raw data Simple

For programmatic control, Scapy offers flexibility:

from scapy.all import *

def split_tcp_streams(pcap_file):
    packets = rdpcap(pcap_file)
    streams = {}
    
    for pkt in packets:
        if TCP in pkt:
            stream_id = frozenset([
                (pkt[IP].src, pkt[TCP].sport),
                (pkt[IP].dst, pkt[TCP].dport)
            ])
            if stream_id not in streams:
                streams[stream_id] = []
            streams[stream_id].append(pkt)
    
    for i, stream in enumerate(streams.values()):
        wrpcap(f"stream_{i}.pcap", stream)

split_tcp_streams("large_capture.pcap")

This approach gives you complete control but requires more memory for large files.

For your 700MB file, the tshark method provides the best balance between functionality and memory efficiency. The two-pass approach ensures minimal RAM usage while maintaining full PCAP compatibility.


When analyzing network traffic, we often encounter massive PCAP files containing hundreds or thousands of TCP connections. Tools like Wireshark's "Follow TCP Stream" are useful but impractical for large captures (700MB+) due to memory constraints. We need efficient command-line alternatives that preserve the original PCAP format.

While tcpflow (http://www.circlemud.org/~jelson/software/tcpflow/) reconstructs TCP streams, it has two key drawbacks:

  1. Output files often exceed original PCAP size
  2. Results aren't in PCAP format (losing packet-level metadata)

Here are three production-tested approaches:

1. tcpdump with BPF Filters

Extract specific connections using BPF filters:

for conn in $(tshark -r input.pcap -T fields -e tcp.stream | sort -n | uniq); do
    tcpdump -r input.pcap -w stream_${conn}.pcap "tcp.stream eq ${conn}"
done

2. tcpslice

The tcpslice tool from tcpdump suite handles large files efficiently:

tcpslice -w separated_%c.pcap original.pcap

3. Python with Scapy

For custom processing:

from scapy.all import *

def split_by_stream(pcap_file):
    packets = rdpcap(pcap_file)
    streams = {}
    
    for p in packets:
        if TCP in p:
            stream_id = frozenset([
                (p[IP].src, p[TCP].sport),
                (p[IP].dst, p[TCP].dport)
            ])
            streams.setdefault(stream_id, []).append(p)
    
    for i, (key, stream) in enumerate(streams.items()):
        wrpcap(f"stream_{i}.pcap", stream)

split_by_stream("input.pcap")
  • For files under 1GB: tcpdump BPF approach works well
  • 1GB-5GB: tcpslice offers better memory management
  • 5GB+: Consider Scapy with lazy loading (PcapReader)

Always validate split files match original content:

mergecap -w merged.pcap stream_*.pcap
cmp merged.pcap original.pcap