Understanding Postfix Log Delay Metrics: Breakdown of delay= and delays= Parameters in Email Delivery Analysis


2 views


When analyzing Postfix mail delivery performance, the delay= and delays= values in log entries provide crucial timing insights. Let's examine this sample log entry:

delay=2.4, delays=0.18/0.01/1.4/0.81

The complete breakdown consists of:

  • Total delay (delay=2.4): Total time in seconds from message reception to final delivery
  • Four-part breakdown (delays=a/b/c/d):
    • a (0.18): Time before Postfix queue manager (mostly inbound SMTP processing)
    • b (0.01): Time in queue before delivery attempt
    • c (1.4): SMTP transaction time with remote server
    • d (0.81): Additional queue time if delivery was deferred

Here's how to parse these values programmatically in Python:

import re

log_entry = "delay=2.4, delays=0.18/0.01/1.4/0.81"
match = re.search(r'delay=([\d.]+),\s*delays=([\d.]+)/([\d.]+)/([\d.]+)/([\d.]+)', log_entry)

if match:
    total_delay = float(match.group(1))
    processing, queue, smtp, deferral = map(float, match.group(2,3,4,5))
    
    print(f"Total delivery time: {total_delay:.1f}s")
    print(f"SMTP processing took {smtp:.1f}s ({smtp/total_delay:.1%})")

Common patterns and their implications:

Pattern Potential Issue
High first value (a) Inbound connection/sender verification delays
High second value (b) Queue congestion or resource constraints
High third value (c) Recipient server performance issues
High fourth value (d) Deferred deliveries due to temporary failures

This Bash command aggregates delay statistics:

grep 'delay=' /var/log/mail.log | \
awk -F= '{print $2}' | \
awk -F, '{print $1,$2}' | \
awk '{total+=$1; a+=$2; b+=$3; c+=$4; d+=$5; count++} END {print "Avg total:",total/count,"\nAvg components:",a/count,b/count,c/count,d/count}'

For production monitoring, consider these thresholds:

  • Total delay > 5s: Investigate
  • Any component > 2s: Analyze specific phase
  • Queue time (b) > 1s: Check server load

In Postfix log analysis, the delay= and delays= fields provide crucial performance insights. The format appears as:

delay=2.4, delays=0.18/0.01/1.4/0.81

These values represent four distinct processing phases in seconds:

  1. First value (0.18): Time before message submission (queued but not yet processed)
  2. Second value (0.01): SMTP connection establishment time
  3. Third value (1.4): Message transmission duration
  4. Fourth value (0.81): Post-delivery processing time

Consider this complete log entry:

Jan 1 10:00:00 mail postfix/smtp[1234]: ABC1234567: to=<user@example.com>,
relay=mx.example.com[1.2.3.4]:25, delay=2.4, delays=0.18/0.01/1.4/0.81,
dsn=2.0.0, status=sent (250 OK)

Extract delay statistics from logs:

grep "delay=" /var/log/mail.log | awk -F'delay=' '{print $2}' | awk '{print $1}' | sort -n

Calculate average delivery time:

grep "status=sent" /var/log/mail.log | awk -F'delay=' '{print $2}' | awk -F, '{print $1}' | awk '{sum+=$1; count++} END {print "Average:",sum/count,"seconds"}'

Typical scenarios to watch for:

Pattern Potential Issue
High first value Queue congestion
High second value DNS/network issues
High third value Large messages or slow remote
High fourth value Local delivery processing

Python script to analyze delay patterns:

import re
from collections import defaultdict

logfile = '/var/log/mail.log'
delay_pattern = re.compile(r'delay=([\d.]+),\s*delays=([\d.]+)/([\d.]+)/([\d.]+)/([\d.]+)')

stats = defaultdict(lambda: {'count':0, 'total':0.0})

with open(logfile) as f:
    for line in f:
        match = delay_pattern.search(line)
        if match:
            total, pre, conn, xmit, post = map(float, match.groups())
            stats['pre_queue']['total'] += pre
            stats['pre_queue']['count'] += 1
            # Similar for other components...

for phase in stats:
    avg = stats[phase]['total']/stats[phase]['count']
    print(f"{phase}: {avg:.2f}s average")