When examining email headers for attachment detection, the key indicator is the Content-Type
header. Emails with attachments typically use multipart/mixed
as their content type. Here's the technical breakdown:
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0000_1234567890"
A complete MIME message with attachment will show these characteristics:
- Primary header shows
multipart/mixed
- Contains boundary parameters
- Has multiple parts with their own content types
Python example using email library:
import email
from email import policy
def has_attachment(raw_headers):
msg = email.message_from_string(raw_headers, policy=policy.default)
return msg.is_multipart() and msg.get_content_type() == 'multipart/mixed'
Bash example using grep:
grep -q '^Content-Type: multipart/mixed;' email_headers.txt && echo "Has attachment"
While generally reliable, there are exceptions:
- Inline images may use
multipart/related
- Alternative message formats use
multipart/alternative
- Some clients may omit proper MIME headers
For mail filtering systems like procmail:
:0
* ^Content-Type: multipart/mixed
{
# Handle attachment processing
}
When processing large volumes:
- Header-only parsing is significantly faster than full message parsing
- Most MTA filters can process headers without loading message bodies
- Consider caching results for repeated header checks
When building email processing systems, we often need to filter messages based on attachments. While most modern email clients provide direct attachment detection APIs, some legacy systems or restricted environments only expose email headers. Here's the technical approach to solve this.
The most reliable method is examining the Content-Type header. Emails with attachments use multipart MIME types:
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0000_ABCDEF123456"
Key indicators in headers:
- multipart/mixed: Contains attachments and text
- multipart/related: Used for inline attachments (like embedded images)
- multipart/alternative: Text/HTML versions (not attachments)
Here's Python code using the email library:
import email
def has_attachment(raw_headers):
msg = email.message_from_string(raw_headers)
content_type = msg.get('Content-Type', '').lower()
if 'multipart/mixed' in content_type:
return True
if 'multipart/related' in content_type:
# Check if parts have Content-Disposition: attachment
for part in msg.walk():
if part.get('Content-Disposition', '').startswith('attachment'):
return True
return False
1. Single-part messages with base64 encoded attachments might not show in headers
2. Some email clients use non-standard Content-Type values
3. Very large attachments may be split across multiple parts
For more robust detection, consider these additional checks:
if msg.is_multipart() or \
'application/' in msg.get_content_type() or \
msg.get('Content-Disposition', '').startswith('attachment'):
return True
When processing large volumes of emails:
- Parse only headers initially (don't load entire message)
- Cache results for frequently accessed messages
- Consider using regular expressions for simple header scanning
import re
def quick_attachment_check(headers_text):
return bool(re.search(r'Content-Type:\s*multipart/(mixed|related)', headers_text, re.I))