How to Fix UTF-8 Character Display Issues in the ‘less’ Command on macOS


2 views

Many developers encounter encoding issues when piping UTF-8 output to the less command. While terminal emulators typically handle Unicode characters properly, the default configuration of less might display raw byte sequences instead of proper UTF-8 characters.

First, let's verify your current environment settings:

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

The key issue here is that your LC_CTYPE is set to "C", which tells the system to use ASCII rather than UTF-8 for character handling.

1. Temporary Solution: Using LESSCHARSET

You can explicitly tell less to use UTF-8 encoding:

$ echo -e '\xe2\x82\xac' | LESSCHARSET=utf-8 less

Or make it permanent by adding to your shell configuration:

export LESSCHARSET=utf-8

2. Permanent Solution: Fixing Locale Settings

A more thorough solution is to properly configure your locale settings:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Add these lines to your ~/.bash_profile or ~/.zshrc (depending on your shell).

3. Alternative: Using less with -r flag

For some cases, the raw control characters option might help:

$ echo -e '\xe2\x82\xac' | less -r

After making changes, test with various UTF-8 characters:

$ echo -e '\xe2\x82\xac \xf0\x9f\x98\x80 \xe0\xa4\xb9' | less

Should display: € ? ह

If you're still having issues, you might need to build less from source with proper UTF-8 support:

brew reinstall less --with-regex=pcre

Or on systems without Homebrew:

wget https://www.greenwoodsoftware.com/less/less-590.tar.gz
tar xvf less-590.tar.gz
cd less-590
./configure --with-regex=pcre
make
sudo make install

When working with UTF-8 encoded text in Mac Terminal, you might encounter display issues specifically with the less command. While direct terminal output shows characters correctly, piping through less renders them as escaped sequences.

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

The key issue lies in your locale settings where most variables are set to "C" instead of UTF-8 encoding.

Option 1: Force UTF-8 mode in less

echo -e '\xe2\x82\xac' | LESSCHARSET=utf-8 less

Option 2: Set environment variables before using less

export LC_CTYPE=en_US.UTF-8
export LC_ALL=en_US.UTF-8
echo -e '\xe2\x82\xac' | less

Add these lines to your shell configuration file (~/.bashrc, ~/.zshrc, etc.):

# Set locale to UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_ALL=en_US.UTF-8

# Default less options for UTF-8
export LESSCHARSET=utf-8

After making changes, verify with:

$ locale
$ echo -e '\xe2\x82\xac' | less

You should now see the euro symbol (€) displayed correctly in both direct output and when piped to less.

If the issue persists, consider:

  1. Updating your terminal emulator
  2. Using most as an alternative pager
  3. Checking font support in your terminal
  • Test with different UTF-8 characters: 日本語, русский, 中文
  • Check terminal encoding settings (Preferences → Encodings)
  • Try different fonts like Menlo or Monaco