Base64 vs. Radix-64: Decoding Password Hash Encoding in Linux Shadow Files


2 views

When examining /etc/shadow entries like:

user1:$6$somesalt$HASHEDPASSWORD:...
user2:$5$anothersalt$DIFFERENTHASH:...

The hash components use a specialized encoding often mistaken for Base64. Linux actually uses a variant called Radix-64 (sometimes called "crypt(3) encoding") with a custom alphabet.

The encoding uses these 64 characters (in order):

./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

This differs from standard Base64 which uses:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Here's how to convert between raw bytes and crypt(3) encoding in Python:

import base64
import struct

CRYPT_ALPHABET = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

def bytes_to_crypt(data):
    output = []
    for i in range(0, len(data), 3):
        chunk = data[i:i+3]
        if len(chunk) < 3:
            chunk += b'\x00' * (3 - len(chunk))
            
        # Convert 3 bytes to 24-bit integer
        value = struct.unpack('>I', b'\x00' + chunk)[0]
        
        # Convert to 4 Radix-64 characters
        for _ in range(4):
            output.append(CRYPT_ALPHABET[value & 0x3f])
            value >>= 6
    
    return ''.join(output)

This encoding predates modern Base64 standards and was specifically designed for crypt(3) to:

  • Avoid shell metacharacters
  • Be typeable on all keyboards
  • Fit within the 13-character limit of traditional DES-based hashes

You can validate password hashes using Python's crypt module:

import crypt

def verify_password(stored_hash, password):
    return crypt.crypt(password, stored_hash) == stored_hash

# Example usage:
stored = "$6$somesalt$HASHEDPASSWORD"
print(verify_password(stored, "test123"))  # Returns True/False

With contemporary hashing algorithms (SHA-256/512), the representation follows this structure:

$id$salt$hash

Where both salt and hash are Radix-64 encoded. Common algorithm identifiers:

ID Algorithm
1 MD5
5 SHA-256
6 SHA-512

In modern Linux systems, password hashes are stored in /etc/shadow using a specific encoding format. The hash string consists of three main components:

$id$salt$hash

Where id indicates the hashing algorithm (1=MD5, 5=SHA-256, 6=SHA-512), salt is the random salt value, and hash is the actual password hash.

The hash components are encoded using a modified Base64 scheme (sometimes called crypt(3) encoding) with the following character set:

./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

This differs from standard Base64 in both character set and padding. The encoding process works like this:

1. Take the raw binary output from the hash function
2. Split into 3-byte (24-bit) chunks
3. Convert each chunk to four 6-bit values
4. Map each 6-bit value to corresponding character

Here's a Python implementation to decode a shadow hash:

import base64
import hashlib

def crypt_base64_decode(encoded_str):
    custom_b64 = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    std_b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    
    # Create translation table
    trans = str.maketrans(custom_b64, std_b64)
    translated = encoded_str.translate(trans)
    
    # Add padding if needed
    pad = len(encoded_str) % 4
    if pad:
        translated += '=' * (4 - pad)
    
    return base64.b64decode(translated)

# Example shadow entry
shadow_hash = "$6$somesalt$J8kgzKVa7ORks6uG6D2V7iQY9V7HLOQ0Bq8xU.zS5J3cXpJ5fWUc1r9sQeL2a3"
parts = shadow_hash.split('$')
salt = parts[2]
hash_part = parts[3]

# Decode the hash
decoded_bytes = crypt_base64_decode(hash_part)
print(f"Decoded hash bytes: {decoded_bytes.hex()}")

You can verify the encoding using OpenSSL:

openssl passwd -6 -salt somesalt yourpassword

This will output a hash in the same format as /etc/shadow, allowing you to compare the encoding.

While the encoding scheme itself doesn't affect security, understanding it is crucial for:

  • Password cracking research
  • Developing custom authentication systems
  • Forensic analysis of compromised systems

The modified Base64 encoding was chosen for historical compatibility with older Unix systems and to avoid characters that might cause issues in configuration files.