How to Detect Email Attachments Using Only Headers (MIME Content-Type Analysis)


2 views

When examining email headers for attachment detection, the key indicator is the Content-Type header. Emails with attachments typically use multipart/mixed as their content type. Here's the technical breakdown:

Content-Type: multipart/mixed; boundary="----=_NextPart_000_0000_1234567890"

A complete MIME message with attachment will show these characteristics:

  • Primary header shows multipart/mixed
  • Contains boundary parameters
  • Has multiple parts with their own content types

Python example using email library:

import email
from email import policy

def has_attachment(raw_headers):
    msg = email.message_from_string(raw_headers, policy=policy.default)
    return msg.is_multipart() and msg.get_content_type() == 'multipart/mixed'

Bash example using grep:

grep -q '^Content-Type: multipart/mixed;' email_headers.txt && echo "Has attachment"

While generally reliable, there are exceptions:

  • Inline images may use multipart/related
  • Alternative message formats use multipart/alternative
  • Some clients may omit proper MIME headers

For mail filtering systems like procmail:

:0
* ^Content-Type: multipart/mixed
{
    # Handle attachment processing
}

When processing large volumes:

  • Header-only parsing is significantly faster than full message parsing
  • Most MTA filters can process headers without loading message bodies
  • Consider caching results for repeated header checks



When building email processing systems, we often need to filter messages based on attachments. While most modern email clients provide direct attachment detection APIs, some legacy systems or restricted environments only expose email headers. Here's the technical approach to solve this.

The most reliable method is examining the Content-Type header. Emails with attachments use multipart MIME types:

Content-Type: multipart/mixed; boundary="----=_NextPart_000_0000_ABCDEF123456"

Key indicators in headers:

  • multipart/mixed: Contains attachments and text
  • multipart/related: Used for inline attachments (like embedded images)
  • multipart/alternative: Text/HTML versions (not attachments)

Here's Python code using the email library:

import email

def has_attachment(raw_headers):
    msg = email.message_from_string(raw_headers)
    content_type = msg.get('Content-Type', '').lower()
    
    if 'multipart/mixed' in content_type:
        return True
    if 'multipart/related' in content_type:
        # Check if parts have Content-Disposition: attachment
        for part in msg.walk():
            if part.get('Content-Disposition', '').startswith('attachment'):
                return True
    return False

1. Single-part messages with base64 encoded attachments might not show in headers

2. Some email clients use non-standard Content-Type values

3. Very large attachments may be split across multiple parts

For more robust detection, consider these additional checks:

if msg.is_multipart() or \
   'application/' in msg.get_content_type() or \
   msg.get('Content-Disposition', '').startswith('attachment'):
    return True

When processing large volumes of emails:

  • Parse only headers initially (don't load entire message)
  • Cache results for frequently accessed messages
  • Consider using regular expressions for simple header scanning
import re

def quick_attachment_check(headers_text):
    return bool(re.search(r'Content-Type:\s*multipart/(mixed|related)', headers_text, re.I))