Handling Filenames with Spaces in AWK: Robust Solutions for File Renaming Scripts


9 views

When working with shell scripts, filenames containing spaces can be particularly tricky to handle. The issue occurs because the shell performs word splitting on unquoted variables, treating spaces as delimiters between arguments.

Here's what's happening in your script:


#!/bin/bash
for i in $(ls);  # Word splitting occurs here
do
  FILENAME=echo $i | awk -F\\. '{print $1}';  # And again here
  echo $FILENAME
done

There are several approaches to solve this problem:

Method 1: Use find with null-terminated output


#!/bin/bash
while IFS= read -r -d '' file; do
    filename="${file%.*}"
    echo "$filename"
done < <(find . -maxdepth 1 -type f -name '*.wav' -print0)

Method 2: Use shell parameter expansion


#!/bin/bash
for file in *.wav; do
    filename="${file%.*}"
    echo "$filename"
done

Method 3: Properly quote variables


#!/bin/bash
for i in *; do
    FILENAME=echo "$i" | awk -F\\. '{print $1}';
    echo "$FILENAME"
done

The ls command is meant for human-readable output, not for scripting. It can produce unexpected results with special characters in filenames. Instead, use shell globbing or find as shown above.

Here's a more robust AWK solution that handles multiple dots in filenames:


#!/bin/bash
for file in *; do
    base=$(echo "$file" | awk 'BEGIN{FS=OFS="."} {NF--; print}')
    echo "$base"
done

Or using modern bash features:


#!/bin/bash
shopt -s nullglob
for file in *.*; do
    echo "${file%.*}"
done

Here's a complete script for changing extensions while handling spaces:


#!/bin/bash
shopt -s nullglob
for wavfile in *.wav; do
    newname="${wavfile%.wav}.mp3"
    echo "Converting: $wavfile → $newname"
    # ffmpeg -i "$wavfile" "$newname"  # Actual conversion command
done

When working with AWK in shell scripts, filenames containing spaces can cause unexpected behavior. The issue occurs because shell word splitting treats spaces as delimiters by default. Let's examine why this happens with a concrete example:


#!/bin/bash
for i in $(ls);
do
  FILENAME=$(echo $i | awk -F\\. '{print $1}');
  echo $FILENAME
done

With a file named 11237_712312955_2012-01-04 18_31_03.wav, this script outputs:

11237_712312955_2012-01-04
18_31_03

The problem stems from multiple layers of interpretation:

  1. The $(ls) expansion splits on whitespace
  2. The unquoted $i variable undergoes word splitting
  3. The command substitution $(echo...) adds another layer of interpretation

Here are several approaches to handle spaces in filenames correctly:

1. Using find with null-terminated output


find . -maxdepth 1 -type f -print0 | while IFS= read -r -d '' file; do
    filename=$(basename "$file" .wav)
    echo "$filename"
done

2. Proper quoting with globbing


for file in *; do
    [[ -f "$file" ]] || continue
    filename="${file%.*}"
    echo "$filename"
done

3. AWK-only solution with proper quoting


ls | awk '{
    gsub(/\.wav$/, "");
    print
}'
  • Always quote variables containing filenames: "$file"
  • Use find -print0 and read -d '' for reliable file handling
  • Prefer shell parameter expansion over external commands when possible
  • Avoid parsing ls output in scripts

Here's a production-ready script that handles all edge cases:


#!/bin/bash

# Process all .wav files in current directory
while IFS= read -r -d '' file; do
    # Remove .wav extension
    newname="${file%.wav}"
    
    # Only process if the file has .wav extension
    if [[ "$file" != "$newname" ]]; then
        echo "Original: $file"
        echo "New name: $newname"
        # mv -n "$file" "$newname"  # Uncomment to actually rename
    fi
done < <(find . -maxdepth 1 -type f -name '*.wav' -print0)

This solution handles:

  • Filenames with spaces
  • Filenames with newlines
  • Filenames with other special characters
  • Cases where extension doesn't exist