When working with Maildir format messages (like those generated by fakemail), each email exists as a separate file containing raw headers and body. The key challenge appears when dealing with quoted-printable encoded messages containing special characters and soft line breaks.
The example German email demonstrates two characteristic features of quoted-printable encoding:
1. Soft line breaks indicated by "=" at end of line
2. Special characters encoded as =XX hexadecimal sequences
3. UTF-8 content needing proper decoding
Here's a complete Python solution using the standard library:
import email
import email.policy
from email import policy
def read_maildir_message(filepath):
with open(filepath, 'rb') as f:
msg = email.message_from_binary_file(f, policy=policy.default)
# Handle multipart messages
if msg.is_multipart():
for part in msg.walk():
content_type = part.get_content_type()
if content_type == 'text/plain':
payload = part.get_payload(decode=True)
charset = part.get_content_charset() or 'utf-8'
return payload.decode(charset)
else:
payload = msg.get_payload(decode=True)
charset = msg.get_content_charset() or 'utf-8'
return payload.decode(charset)
For those working in Perl environments:
use Email::MIME;
use Encode;
sub parse_maildir_message {
my ($file) = @_;
open my $fh, '<:raw', $file or die $!;
my $email = Email::MIME->new(do { local $/; <$fh> });
my $body = $email->body;
my $charset = $email->charset || 'UTF-8';
return decode($charset, $body);
}
For quick inspection without writing code:
# Using munpack (part of mpack package)
munpack message_file
# Using reformime (from maildrop package)
reformime -e < message_file
# Using Python one-liner
python3 -c "import quopri; print(quopri.decodestring(open('message_file').read()).decode('utf-8'))"
Some additional considerations for production code:
- Malformed quoted-printable sequences
- Multiple character encodings in single message
- Very long lines (some clients don't properly soft-wrap)
- Mixed content types (HTML + plaintext)
- Messages without explicit charset declaration
Create test cases with these challenging patterns:
1. Lines with = at end but not soft breaks (e.g. "x=1")
2. Invalid hex sequences (=GH)
3. Multiple consecutive soft breaks
4. Different line ending styles (CRLF vs LF)
5. Encoded words in headers (=?utf-8?q?...)
When working with mail servers or testing email functionality, many developers use Maildir format for storing individual email messages. Each message is stored as a separate file with encoded content. The quoted-printable (QP) encoding is commonly used for non-ASCII characters and line breaks in email messages.
Linux provides several command-line utilities for handling quoted-printable encoding:
# Using formail (from procmail package)
formail -e < mailfile
# Using perl's MIME::QuotedPrint
perl -MMIME::QuotedPrint -e 'print decode_qp(join "", <>)' mailfile
# Using qprint (standalone decoder)
qprint -d < mailfile
For developers needing to process these files programmatically, here's a Python solution:
import email
import email.policy
from email import policy
def decode_maildir_message(filepath):
with open(filepath, 'rb') as f:
msg = email.message_from_binary_file(f, policy=policy.default)
# Handle multipart messages
if msg.is_multipart():
for part in msg.walk():
if part.get_content_type() == 'text/plain':
payload = part.get_payload(decode=True)
charset = part.get_content_charset() or 'utf-8'
return payload.decode(charset)
else:
payload = msg.get_payload(decode=True)
charset = msg.get_content_charset() or 'utf-8'
return payload.decode(charset)
# Usage example
print(decode_maildir_message('/path/to/mailfile'))
When processing real-world email files, you might encounter these scenarios:
# 1. Messages with soft line breaks (ending with =)
def fix_soft_linebreaks(text):
return text.replace('=\n', '')
# 2. Messages with multiple encodings
def decode_with_fallback(payload, charset):
try:
return payload.decode(charset)
except UnicodeDecodeError:
return payload.decode('latin-1') # Common fallback
For quick debugging or shell scripting:
#!/bin/bash
# Extract and decode the message body
cat mailfile | sed '1,/^$/d' | qprint -d
# Or using awk for simple cases
awk 'BEGIN{ORS="";}/^$/{body=1;next}body{print}' mailfile | qprint -d
Let's process the example message from the question:
import quopri
message = """Message-ID: <1317977606.4e8ebe06ceab7@myserver.local>
Date: Fri, 07 Oct 2011 10:53:26 +0200
Subject: Registrierung
From: me@me.com
To: tt99@example.com
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Hallo,
Sie haben sich auf Meinserver.de als Benutzer regist=
riert. Um Ihre
Registrierung abzuschlie=C3=9Fen, klicken Sie auf folg=
enden Link:
http://meinserver.de/benutzer/bestaetigen/3lk6lp=
ga1kcgcg484kc8ksg"""
# Extract body and decode
body = message.split('\n\n', 1)[1]
decoded = quopri.decodestring(body.replace('=\n', '')).decode('utf-8')
print(decoded)
The output will properly display the German characters and fix the line breaks.