Filtering Binary Data from HTTP Traffic Captures in tcpdump: A Clean Output Solution


2 views

When analyzing HTTP traffic with tcpdump, you'll often encounter binary artifacts like E.....@....... in your output. These represent raw packet headers that haven't been properly filtered from the display. While essential for low-level debugging, they clutter HTTP content analysis.

The standard HTTP monitoring command:

sudo tcpdump -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

already performs sophisticated filtering to exclude:

  • SYN/FIN packets
  • ACK-only segments
  • Empty payload packets

To completely eliminate binary artifacts, combine these approaches:

Method 1: ASCII-Only Output

sudo tcpdump -A -q 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | grep -a --line-buffered -v "^[0-9]"

Key improvements:

  • -q: Quiet mode reduces header verbosity
  • grep -a: Forces text interpretation
  • --line-buffered: Maintains real-time output

Method 2: Hex Dump Alternative

For protocol analysis needing both headers and content:

sudo tcpdump -XX 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

Filtering specific HTTP methods while cleaning output:

sudo tcpdump -A -s0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | awk '/HTTP/{x=1}x{print}/^\\r$/{x=0}'

Monitoring POST requests specifically:

sudo tcpdump -A -s0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | grep -a --line-buffered -i "POST"

For high-traffic servers, add these parameters:

sudo tcpdump -l -n -A -s256 -w /tmp/http_capture.pcap 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

Then process the capture file separately:

tcpdump -r /tmp/http_capture.pcap -A | perl -ne 'print if /HTTP/../^\\r$/'

When monitoring HTTP traffic with tcpdump, you'll often see unreadable binary data prepended to your HTTP messages. These artifacts represent TCP/IP header information that hasn't been properly filtered out. For example:

E.....@.......
....P..6.0.........D......
__..e=3...__HTTP/1.1 200 OK

The complex BPF (Berkeley Packet Filter) expression in your command does three crucial things:

tcp port 80                            // Filters HTTP traffic
((ip[2:2] - ((ip[0]&0xf)<<2))          // Calculates IP header length
- ((tcp[12]&0xf0)>>2)) != 0            // Calculates TCP payload length

To eliminate binary artifacts while keeping the HTTP content readable, consider these approaches:

Option 1: ASCII-only Output with -A Flag

The simplest solution is combining -A with our existing BPF filter:

sudo tcpdump -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

Option 2: Hex/ASCII Combination

For debugging purposes, use -X to show both hex and ASCII:

sudo tcpdump -X 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

Option 3: Advanced Header Skipping

Skip exactly 54 bytes (typical Ethernet+IP+TCP header size):

sudo tcpdump -A -s0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | \
awk '{if(NR>1) print; if($0 ~ /HTTP/) {getline; while($0 !~ /^$/) getline;}}'

Monitor specific HTTP methods while cleaning output:

sudo tcpdump -A -s0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0) and (tcp[((tcp[12]>>2)+12):4] = 0x504f5354 or tcp[((tcp[12]>>2)+12):4] = 0x47455420)' | \
grep -v -e '^E' -e '^ ..'

Extract only response bodies:

sudo tcpdump -A -s0 'tcp dst port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | \
sed -n '/HTTP\/1.1 200 OK/,/<\/html>/p'

For production environments, consider piping to other tools:

sudo tcpdump -l -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | \
grep --line-buffered -v -e '^E' -e '^ ..' | \
tee http_traffic.log