Prohibited Characters in Unix/Windows Filenames: Complete Developer’s Guide to Illegal Path Characters


2 views

When working with cross-platform applications, knowing which characters are forbidden in filenames is crucial for robust file handling. Unix-like systems (Linux, macOS) and Windows have different rulesets, making this particularly important for developers creating portable software.

Windows prohibits these characters in both filenames and directories:


< > : " / \\ | ? *

Additionally, filenames cannot end with a period or space. Reserved device names are also prohibited (CON, PRN, AUX, NUL, COM1-9, LPT1-9).

Unix-like systems only forbid two characters:


/ (forward slash) and \0 (null character)

However, while technically allowed, these characters are problematic and should be avoided:


! @ # $ % ^ & * ( ) [ ] { } ; ' : " , . < > ? \\ | ~  space

Here's a Python function to validate filenames across both systems:


import re

def is_valid_filename(filename, platform='both'):
    # Windows illegal chars + reserved names
    windows_illegal = r'[<>:"/\\|?*\x00-\x1f]'
    windows_reserved = [
        'CON', 'PRN', 'AUX', 'NUL', 
        'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9',
        'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'
    ]
    
    # Unix only forbids / and null
    unix_illegal = r'[/\x00]'
    
    if platform.lower() == 'windows':
        pattern = windows_illegal
    elif platform.lower() == 'unix':
        pattern = unix_illegal
    else:  # both
        pattern = f'({windows_illegal}|{unix_illegal})'
    
    if re.search(pattern, filename):
        return False
        
    if platform.lower() in ('windows', 'both'):
        if filename.upper() in windows_reserved:
            return False
        if filename.rstrip().endswith(('.', ' ')):
            return False
            
    return True

Some characters like % and * are technically allowed but can cause issues:

  • % is used in Windows environment variables
  • * is a wildcard in shell operations
  • Spaces require special handling in command-line operations

Best practice is to stick to alphanumerics, hyphens, and underscores for maximum compatibility:


safe_name = re.sub(r'[^\w\-.]', '_', original_name)

Additional considerations:

  • Windows has a 260-character path limit (extended to 32,767 with \\?\ prefix)
  • NTFS is case-preserving but case-insensitive
  • HFS+ (macOS) is case-insensitive by default
  • ext4 (Linux) is case-sensitive

When dealing with cross-platform file operations, developers must handle filename restrictions carefully. Unix-like systems (Linux, macOS) and Windows have different prohibited character sets:

// Unix/Linux prohibited characters:
const unixIllegalChars = ['/', '\0'];
// Windows prohibited characters: 
const windowsIllegalChars = ['<', '>', ':', '"', '/', '\\', '|', '?', '*', '\0'];

Some characters require special attention:

  • Null character (\0): Terminates strings in C-style systems
  • Forward slash (/): Directory separator in Unix
  • Backslash (\): Directory separator in Windows
  • Colon (:): Alternate data stream marker in NTFS

Here's a Python function to check filename validity across platforms:

import re

def is_valid_filename(filename, platform='both'):
    if platform.lower() == 'unix':
        pattern = r'[/\x00]'
    elif platform.lower() == 'windows':
        pattern = r'[<>:"/\\|?*\x00]'
    else:  # both
        pattern = r'[<>:"/\\|?*\x00]'
    
    if re.search(pattern, filename):
        return False
    if not filename or filename.strip() == '':
        return False
    return True

Windows has additional restrictions on reserved names:

windows_reserved = [
    'CON', 'PRN', 'AUX', 'NUL',
    'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9',
    'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'
]

When you need to include special characters:

  1. Use URL encoding (%20 for space)
  2. Replace with underscores or hyphens
  3. Maintain a mapping table for special characters
  • Stick to alphanumerics, underscores, hyphens, and periods
  • Keep filenames under 255 characters
  • Avoid spaces - use CamelCase or underscores instead
  • Be case-sensitive aware (Unix vs Windows)