Solving rsync –iconv UTF-8 Character Encoding Issues When Syncing Between Linux and macOS


2 views

When using rsync between macOS (with HFS+/APFS normalization) and Linux (typically UTF-8 NFC), filenames with special characters (like é, ü, or ñ) get repeatedly deleted and recreated during incremental syncs. This occurs because macOS uses UTF-8 NFD (Normalization Form D) while most Linux systems use UTF-8 NFC (Normalization Form C).

Here's how to verify your rsync versions on both systems:

# On macOS:
rsync --version | head -n1

# On Linux:
ssh user@linux-server "rsync --version | head -n1"

The fundamental limitation stems from how rsync implements character set conversion:

  • Conversion only happens on the sending side
  • macOS's utf-8-mac charset is non-standard
  • Older rsync versions (pre-3.0) lack proper iconv support

Option 1: Initiate Sync from Correct Machine

For Linux → macOS transfer:

# Run this ON THE LINUX SERVER:
rsync -av --delete --iconv=UTF-8,utf-8-mac /source/path/ macuser@mac:/dest/path/

Option 2: Use Double Conversion Workaround

For macOS-initiated transfers:

# First convert to intermediate dir with NFC names
rsync -av --delete --iconv=utf-8-mac,utf-8 remote:source/ intermediate_dir/

# Then sync to final dest (no conversion needed)
rsync -av --delete intermediate_dir/ final_dest/

Option 3: Force Newer rsync Version

If managing both systems:

# On macOS (using Homebrew):
brew install rsync
alias rsync="/usr/local/bin/rsync"

To inspect filename encodings:

# On macOS:
ls | iconv -f utf-8-mac -t utf-8 | hexdump -C

# On Linux:
ls | hexdump -C

If rsync proves too problematic:

  1. Use tar pipes for filename preservation:
    ssh user@linux "cd /source && tar cf - ." | (cd /dest && tar xvf -)
  2. Consider unison for bidirectional sync:
    unison -auto -batch -unicode=true /local/path ssh://remote//remote/path

For production environments:

  • Standardize on NFC naming via pre-sync scripts
  • Implement filesystem monitoring instead of full rescans
  • Document encoding requirements for cross-platform teams

When syncing files between Linux (UTF-8 NFC) and macOS (UTF-8 NFD/utf-8-mac), filename encoding differences cause rsync to repeatedly delete and re-transfer files containing special characters (é, ü, ø, etc.). The --iconv option should solve this, but implementation quirks create unexpected behavior.

Linux typically uses UTF-8 Normalization Form C (NFC) while macOS uses NFD. Example:


# Linux (NFC):
"é" = U+00E9 

# macOS (NFD): 
"e" + U+0301

Solution 1: Initiate sync from the source machine


# When syncing FROM Linux TO Mac (run on Linux):
rsync -av --iconv=UTF-8,utf-8-mac /source/ user@mac:/dest/

# When syncing FROM Mac TO Linux (run on Mac):
rsync -av --iconv=utf-8-mac,UTF-8 /source/ user@linux:/dest/

Solution 2: Force newer rsync versions


# On Mac (via Homebrew):
brew install rsync
/usr/local/bin/rsync --version  # Verify ≥3.1.1

# On Linux (Debian/Ubuntu):
sudo apt install rsync
rsync --version | head -n1  # Verify ≥3.1.0

The --iconv implementation has two key limitations:

  1. The conversion capability must exist on the receiving side
  2. Older rsync versions (especially macOS's default 2.6.9) lack full UTF-8-MAC support

Create wrapper scripts to ensure consistent behavior:


#!/bin/bash
# mac2linux.sh
/usr/local/bin/rsync -avh --iconv=utf-8-mac,UTF-8 --delete "$@"

#!/bin/bash  
# linux2mac.sh
ssh user@linux "/usr/bin/rsync -avh --iconv=UTF-8,utf-8-mac --delete /remote/source/ /local/dest/"

Create test files with special characters:


# On Linux:
touch "Café_Test_ÆØÅ_Über_文件"

# On Mac:
touch "Renée_Test_ÆØÅ_Über_文件"  

Run sync in both directions and verify timestamps remain unchanged on subsequent runs.

If rsync issues persist, consider:


# Using tar with iconv:
ssh user@linux "cd /source && tar cf - --transform='flags=r;s|.*|&|'" | 
  (cd /dest && tar xvf - --numeric-owner)