Cross-Platform Filename Character Restrictions: Allowed and Escaped Characters in Linux, Windows, and Unix Systems


2 views

Filename handling varies significantly across operating systems, creating compatibility challenges for developers working in cross-platform environments. Here's a technical breakdown of the core restrictions:

Unix-like systems (including Linux and macOS) have the most permissive filename rules:

  • Allowed: Almost all characters except forward slash (/) and null byte (\0)
  • Special handling needed: Spaces, newlines, tabs, and these special characters: !@#$%^&*()[]{};:'"|<>?~
  • Must escape: When using in shell commands, spaces and special characters require escaping with backslash (\) or quoting

# Example of escaping in bash
mv "file with spaces.txt" destination/
mv file\ with\ spaces.txt destination/

Windows has more restrictive filename rules:

  • Disallowed: < > : " / \ | ? * and control characters (ASCII 0-31)
  • Reserved names: CON, PRN, AUX, NUL, COM1-COM9, LPT1-LPT9
  • Length limit: 260 characters for full path (MAX_PATH)

// C# example of Windows filename validation
string invalidChars = new string(Path.GetInvalidFileNameChars());
string sanitized = Regex.Replace(input, $"[{Regex.Escape(invalidChars)}]", "");

When developing applications that need to work across platforms:

  • Stick to alphanumerics, dash (-), underscore (_), and period (.)
  • Avoid spaces - use underscores or camelCase instead
  • Implement proper escaping when constructing command strings

For robust cross-platform filename handling:


# Python example using pathlib for cross-platform paths
from pathlib import Path

safe_name = "data_file_2023"
p = Path("documents") / safe_name
p.write_text("content")

When dealing with user-provided filenames, always:

  1. Validate against blacklists for each target OS
  2. Normalize to a safe character set
  3. Implement proper escaping when passing to shell commands

File naming conventions vary significantly between operating systems, creating compatibility challenges for developers working in cross-platform environments. Here's a comprehensive breakdown:

Linux and Unix-based systems are generally permissive with filenames, with a few key restrictions:

  • Allowed: Almost all characters except forward slash (/) and null character (\0)
  • Must be escaped: Spaces, tabs, newlines, and special characters like *, ?, !, $ when used in shell commands
# Creating files with special characters in Linux
touch "file with spaces.txt"
touch \$special\$.txt
touch "quoted*character?.txt"

Windows has more restrictive filename rules:

  • Disallowed: < > : " / \ | ? * and control characters (ASCII 0-31)
  • Reserved names: CON, PRN, AUX, NUL, COM1-9, LPT1-9
  • Length limit: 260 characters for full path (MAX_PATH)
// C# example of Windows filename validation
bool IsValidWindowsFilename(string name) {
    var invalidChars = Path.GetInvalidFileNameChars();
    return !name.Any(c => invalidChars.Contains(c)) 
           && !Regex.IsMatch(name, @"^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$", RegexOptions.IgnoreCase);
}

When developing applications that need to work across systems:

  • Stick to alphanumerics, dashes (-), and underscores (_)
  • Avoid spaces (use dashes or camelCase instead)
  • Implement proper escaping for shell commands

Different shells handle escaping differently:

# Bash escaping examples
mv "file with spaces.txt" "new name.txt"  # Quotes for spaces
rm -- -filename-starting-with-dash.txt    # -- for special filenames
find . -name "*.txt" -exec rm {} \;       # Proper find command escaping

Most modern languages provide utilities for safe filename handling:

// Python example
import os

def safe_filename(name):
    keepcharacters = (' ','.','_','-')
    return "".join(c for c in name if c.isalnum() or c in keepcharacters).rstrip()

print(safe_filename("Bad/File\\Name*?.txt"))  # Outputs: BadFileName.txt

Modern systems generally support Unicode filenames, but with considerations:

  • Normalization forms may differ (NFD vs NFC)
  • Some legacy systems may have issues with non-ASCII
  • Maximum byte length limitations may apply