Implementing Truly Unique Mail Identification in IMAP: Solutions for UID Collisions and Message Tracking


2 views

When building a webmail system with extended functionality like mail-associated notes and contacts, developers face a fundamental limitation in IMAP: the protocol only guarantees UID uniqueness within a single mailbox, not across the entire server. This becomes problematic when:

  • Mails are moved between folders
  • Implementing cross-folder features
  • Building long-term tracking systems

While the Message-ID header seems like a natural solution, it's unreliable because:

// Example of missing Message-ID in raw email
Received: by mail.example.com with SMTP id xyz123
Date: Wed, 01 Jan 2020 00:00:00 -0000
From: sender@example.com
To: recipient@example.com
Subject: Test email without Message-ID

Approximately 5-15% of legitimate emails lack this header according to various studies of email corpora.

Some IMAP server implementations provide solutions:

Dovecot's GUID Extension

Dovecot offers a proprietary GUID that remains constant across folder moves:

// Querying GUID in Dovecot
A01 UID FETCH 42 (X-GM-MSGID)
* 42 FETCH (X-GM-MSGID 1278455344230334865)

Microsoft Exchange Approach

Exchange uses the PR_ENTRYID MAPI property which persists through:

  • Folder moves
  • Server migrations
  • Cross-account operations

When you can't modify the server:

Composite Key Strategy

// Python example of generating a composite key
def generate_mail_key(mail):
    return f"{mail.mailbox.id}:{mail.uid}:{mail.internal_date.timestamp()}"

Content-Based Hashing

// JavaScript example using email content hashing
const crypto = require('crypto');

function getEmailFingerprint(email) {
    const hash = crypto.createHash('sha256');
    hash.update(email.headers + '\r\n\r\n' + email.body);
    return hash.digest('hex');
}

When storing mail references in a database:

-- SQL table design for mail tracking
CREATE TABLE mail_references (
    id SERIAL PRIMARY KEY,
    -- For Dovecot systems
    server_guid BIGINT,
    -- For generic IMAP
    mailbox_path VARCHAR(255),
    imap_uid INTEGER,
    -- Fallback content hash
    content_hash CHAR(64),
    -- Metadata
    notes TEXT,
    contacts JSONB
);

CREATE INDEX idx_mail_references_composite ON mail_references(mailbox_path, imap_uid);

When transitioning between identification methods:

# Bash script example for migrating tracking data
while read -r old_id mailbox uid; do
    new_id=$(imapcli get_guid "$mailbox" "$uid")
    psql -c "UPDATE mail_references SET server_guid = $new_id WHERE id = $old_id"
done < legacy-ids.txt

When building webmail integrations that require persistent tracking of emails (like attaching notes or contacts), developers often hit a fundamental limitation: IMAP UIDs are only guaranteed unique within a single mailbox folder, not across the entire server. This creates several challenges:

// Example of problematic scenario
// Mail in INBOX has UID 123
// Mail in Trash may also have UID 123
// No reliable way to distinguish between them at server level

The Message-ID header seems like a natural alternative, but:

  • Not all emails contain Message-ID headers (especially older or malformed messages)
  • Forwarded messages may share the same Message-ID
  • Some mailing list software modifies Message-IDs

1. Composite Key Approach

Combine multiple identifiers to create a reliable fingerprint:

function generateMailFingerprint(imapMessage) {
    const mailboxPath = imapMessage.mailbox; // Full mailbox path
    const uid = imapMessage.uid;
    const internalDate = imapMessage.internalDate.getTime();
    const size = imapMessage.size;
    
    return ${mailboxPath}|${uid}|${internalDate}|${size};
}

2. Server-Side Solutions

Some IMAP servers offer extensions:

  • Dovecot: X-GUID extension provides globally unique IDs
  • Microsoft Exchange: PR_SEARCH_KEY property
  • CYRUS: X-CYRUS-UID header

3. Content-Based Hashing

Create a SHA-256 hash of key message components:

const crypto = require('crypto');

function createContentHash(message) {
    const headers = [
        message.headers.get('from'),
        message.headers.get('to'),
        message.headers.get('date'),
        message.headers.get('subject')
    ].join('');
    
    const content = message.body.text + message.body.html;
    return crypto.createHash('sha256')
        .update(headers + content)
        .digest('hex');
}

When choosing your approach, consider:

  • Performance impact of content hashing on large mailboxes
  • Whether your solution needs to survive message moves between folders
  • Compatibility with existing message cache systems
  • Storage overhead for your tracking database

Here's a complete example using Node.js and IMAP:

const { ImapFlow } = require('imapflow');
const client = new ImapFlow({
    host: 'imap.example.com',
    port: 993,
    secure: true,
    auth: {
        user: 'user@example.com',
        pass: 'password'
    }
});

async function trackMessages() {
    await client.connect();
    
    // Get list of all mailboxes
    const mailboxes = await client.list();
    
    const messageTracker = new Map();
    
    for (const mailbox of mailboxes) {
        await client.mailboxOpen(mailbox.path);
        
        // Fetch all messages with envelope and headers
        for await (let message of client.fetch('1:*', {
            envelope: true,
            headers: ['message-id', 'date', 'from', 'to', 'subject'],
            bodyStructure: true
        })) {
            // Create composite key
            const trackingKey = ${mailbox.path}:${message.uid}:${message.envelope.date};
            
            // Store in tracking system
            messageTracker.set(trackingKey, {
                mailbox: mailbox.path,
                uid: message.uid,
                date: message.envelope.date,
                subject: message.envelope.subject,
                // Add your custom notes/contacts here
                annotations: {}
            });
        }
    }
    
    await client.logout();
    return messageTracker;
}