When working with cross-platform applications, knowing which characters are forbidden in filenames is crucial for robust file handling. Unix-like systems (Linux, macOS) and Windows have different rulesets, making this particularly important for developers creating portable software.
Windows prohibits these characters in both filenames and directories:
< > : " / \\ | ? *
Additionally, filenames cannot end with a period or space. Reserved device names are also prohibited (CON, PRN, AUX, NUL, COM1-9, LPT1-9).
Unix-like systems only forbid two characters:
/ (forward slash) and \0 (null character)
However, while technically allowed, these characters are problematic and should be avoided:
! @ # $ % ^ & * ( ) [ ] { } ; ' : " , . < > ? \\ | ~ space
Here's a Python function to validate filenames across both systems:
import re
def is_valid_filename(filename, platform='both'):
# Windows illegal chars + reserved names
windows_illegal = r'[<>:"/\\|?*\x00-\x1f]'
windows_reserved = [
'CON', 'PRN', 'AUX', 'NUL',
'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9',
'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'
]
# Unix only forbids / and null
unix_illegal = r'[/\x00]'
if platform.lower() == 'windows':
pattern = windows_illegal
elif platform.lower() == 'unix':
pattern = unix_illegal
else: # both
pattern = f'({windows_illegal}|{unix_illegal})'
if re.search(pattern, filename):
return False
if platform.lower() in ('windows', 'both'):
if filename.upper() in windows_reserved:
return False
if filename.rstrip().endswith(('.', ' ')):
return False
return True
Some characters like % and * are technically allowed but can cause issues:
- % is used in Windows environment variables
- * is a wildcard in shell operations
- Spaces require special handling in command-line operations
Best practice is to stick to alphanumerics, hyphens, and underscores for maximum compatibility:
safe_name = re.sub(r'[^\w\-.]', '_', original_name)
Additional considerations:
- Windows has a 260-character path limit (extended to 32,767 with \\?\ prefix)
- NTFS is case-preserving but case-insensitive
- HFS+ (macOS) is case-insensitive by default
- ext4 (Linux) is case-sensitive
When dealing with cross-platform file operations, developers must handle filename restrictions carefully. Unix-like systems (Linux, macOS) and Windows have different prohibited character sets:
// Unix/Linux prohibited characters:
const unixIllegalChars = ['/', '\0'];
// Windows prohibited characters:
const windowsIllegalChars = ['<', '>', ':', '"', '/', '\\', '|', '?', '*', '\0'];
Some characters require special attention:
- Null character (\0): Terminates strings in C-style systems
- Forward slash (/): Directory separator in Unix
- Backslash (\): Directory separator in Windows
- Colon (:): Alternate data stream marker in NTFS
Here's a Python function to check filename validity across platforms:
import re
def is_valid_filename(filename, platform='both'):
if platform.lower() == 'unix':
pattern = r'[/\x00]'
elif platform.lower() == 'windows':
pattern = r'[<>:"/\\|?*\x00]'
else: # both
pattern = r'[<>:"/\\|?*\x00]'
if re.search(pattern, filename):
return False
if not filename or filename.strip() == '':
return False
return True
Windows has additional restrictions on reserved names:
windows_reserved = [
'CON', 'PRN', 'AUX', 'NUL',
'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9',
'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'
]
When you need to include special characters:
- Use URL encoding (%20 for space)
- Replace with underscores or hyphens
- Maintain a mapping table for special characters
- Stick to alphanumerics, underscores, hyphens, and periods
- Keep filenames under 255 characters
- Avoid spaces - use CamelCase or underscores instead
- Be case-sensitive aware (Unix vs Windows)