How to Check and Convert Filesystem Encoding to UTF-8 in Linux (Ext3/Ext4)


2 views

Many developers encounter issues with special characters in filenames or file contents on Linux systems. The root cause often lies in filesystem encoding configuration. Unlike file contents which can have their own encoding, the filesystem itself has an encoding that affects how filenames are stored.


# Check current locale settings
locale
# Sample output:
# LANG=en_US.UTF-8
# LC_CTYPE="en_US.UTF-8"
# LC_ALL=

The filesystem encoding isn't actually a property of the filesystem itself (ext3, ext4, etc.), but rather how the system interprets character data. What matters is the system locale configuration.


# Check mounted filesystems and their options
mount | grep -i "type ext"
# For your specific case showing ext3:
/dev/sdb6 on / type ext3 (rw,relatime,errors=remount-ro)

To ensure UTF-8 encoding is used for all filesystem operations:


# Step 1: Install locales package if needed
sudo apt-get install locales

# Step 2: Configure system locale
sudo dpkg-reconfigure locales
# Select en_US.UTF-8 or your preferred UTF-8 locale

# Step 3: Update environment variables
echo 'export LANG=en_US.UTF-8' >> ~/.bashrc
echo 'export LC_ALL=en_US.UTF-8' >> ~/.bashrc
source ~/.bashrc

# Verify changes
locale

For files already created with wrong encoding, you can use convmv tool:


# Install convmv
sudo apt-get install convmv

# Convert filenames from ISO-8859-1 to UTF-8
convmv -f ISO-8859-1 -t UTF-8 --notest /path/to/files/*
  • Ext3/Ext4 filesystems don't inherently have encoding - it's the system interpretation
  • UTF-8 is recommended for modern systems supporting international characters
  • Backup important data before making encoding changes
  • Some legacy applications might require specific locale settings

For system-wide settings that persist across reboots:


# Edit locale configuration
sudo nano /etc/default/locale
# Add these lines:
LANG="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

After making these changes, you may need to restart your session or reboot for all applications to recognize the new encoding settings.


Filesystem encoding determines how file names and metadata are stored on disk. In modern Linux systems, UTF-8 has become the standard encoding, but older installations might still use legacy encodings like ISO-8859-1.

To verify your current filesystem encoding, you can use the following command:

locale | grep -E "LANG|LC_CTYPE"

For filesystem-specific encoding (especially for non-native filesystems), examine mount options:

mount | grep -E "/dev/sd[a-z][0-9]"

For ext3/ext4 filesystems (as shown in your fstab), follow these steps:

1. Backup Important Data

sudo rsync -aAXv / --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} /path/to/backup

2. Modify Locale Settings

Edit /etc/default/locale:

LANG="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

3. Reconfigure Locales

sudo dpkg-reconfigure locales

For the ext3 partition mounted at / (as per your fstab), you can add the utf8 option:

# Edit /etc/fstab
UUID=50d660f1... / ext3 relatime,errors=remount-ro,utf8 0 0

After reboot, verify with:

mount | grep " on / "

Should show utf8 in options.

For files with non-UTF-8 names, use convmv:

sudo apt-get install convmv
convmv -r -f old_encoding -t UTF-8 --notest /path/to/files

Here's a bash script to check and convert encoding:

#!/bin/bash
# Check current encoding
CURRENT_ENC=$(locale | grep LC_CTYPE | cut -d= -f2 | cut -d. -f2)
if [ "$CURRENT_ENC" != "UTF-8" ]; then
    echo "Converting system to UTF-8..."
    sudo sed -i 's/^LC_CTYPE=.*/LC_CTYPE=en_US.UTF-8/' /etc/default/locale
    sudo dpkg-reconfigure locales
fi
# Update fstab
sudo sed -i 's/errors=remount-ro/errors=remount-ro,utf8/' /etc/fstab
echo "Please reboot for changes to take effect"