How to Remove Empty/Blank Lines (Including Whitespace) in Unix/Linux Files Using sed, awk, and grep


2 views

When processing text files in Unix/Linux systems, we often encounter files containing empty lines or lines with only whitespace characters. These can cause issues during data processing, log analysis, or configuration file parsing.

Here's a sample file (file.txt) demonstrating this issue:

Line:Text
1:
2:AAA
3:
4:BBB
5:
6:   CCC
7:  
8:DDD

Using grep

The simplest solution is using grep with the -v (invert match) option:

grep -v '^[[:space:]]*$' file.txt > cleaned_file.txt

Explanation:

  • ^ matches start of line
  • [[:space:]]* matches zero or more whitespace characters
  • $ matches end of line
  • -v inverts the match (shows non-matching lines)

Using sed

A more powerful alternative using sed:

sed '/^[[:space:]]*$/d' file.txt > cleaned_file.txt

For in-place editing (GNU sed):

sed -i '/^[[:space:]]*$/d' file.txt

Using awk

For more complex processing, awk is excellent:

awk 'NF' file.txt > cleaned_file.txt

Alternative awk solution preserving whitespace:

awk '!/^[[:space:]]*$/' file.txt > cleaned_file.txt

Removing Empty Lines While Preserving Line Numbers

If you need to maintain original line numbers with empty lines removed:

grep -v '^[[:space:]]*$' file.txt | nl -s: -w1

Processing Multiple Files

To process multiple files recursively:

find . -type f -name "*.txt" -exec sed -i '/^[[:space:]]*$/d' {} +

Handling Different File Encodings

For DOS-formatted files (CRLF line endings):

dos2unix file.txt
sed -i '/^[[:space:]]*$/d' file.txt

For large files (>1GB), consider these optimizations:

LC_ALL=C grep -v '^[[:space:]]*$' large_file.txt > cleaned_file.txt

Or using parallel processing:

parallel --pipepart --block 100M -a large_file.txt grep -v '^[[:space:]]*$' > cleaned_file.txt

When processing text files in Unix/Linux systems, empty lines or lines containing only whitespace characters can often interfere with data processing pipelines. These lines might appear harmless, but they can cause issues with line counting, data parsing, or when feeding the output to other commands.

In Unix/Linux, we typically consider two types of lines as "blank":

  • Completely empty lines (containing just a newline character)
  • Lines containing only whitespace characters (spaces, tabs)

The simplest way to remove empty lines is using grep:

grep -v '^$' file.txt

This removes completely empty lines but doesn't handle whitespace-only lines.

To remove both truly empty lines and whitespace-only lines, use:

grep -v '^[[:space:]]*$' file.txt

This command:

  • ^ - matches the start of line
  • [[:space:]]* - matches zero or more whitespace characters
  • $ - matches end of line
  • -v - inverts the match (shows non-matching lines)

Here are some other methods to achieve the same result:

Using sed

sed '/^[[:space:]]*$/d' file.txt

Using awk

awk 'NF' file.txt

The NF variable in awk represents the number of fields, which is zero for empty/whitespace-only lines.

Using perl

perl -ne 'print if !/^[[:space:]]*$/' file.txt

Let's process our sample file:

cat > file.txt << EOF
Line:Text

AAA

BBB
   CCC

DDD
EOF

Now let's clean it:

grep -v '^[[:space:]]*$' file.txt

The output will be:

Line:Text
AAA
BBB
   CCC
DDD

To modify the file directly (rather than just displaying the cleaned version), use:

With GNU sed

sed -i '/^[[:space:]]*$/d' file.txt

With BSD sed (macOS)

sed -i '' '/^[[:space:]]*$/d' file.txt

For very large files, some methods perform better than others:

  • awk 'NF' is generally the fastest
  • grep -v is slightly slower but very readable
  • sed solutions are typically the slowest for this operation

If you want to know how many blank lines were removed:

grep -c '^[[:space:]]*$' file.txt

The [[:space:]] character class includes:

  • Regular spaces
  • Tabs (\t)
  • Carriage returns (\r)
  • Form feeds (\f)
  • Vertical tabs (\v)

This makes it more robust than just checking for spaces.

You can chain this operation with other text processing. For example, to remove blank lines and then sort:

grep -v '^[[:space:]]*$' file.txt | sort

Removing empty or whitespace-only lines is a common task in Unix/Linux text processing. The grep -v '^[[:space:]]*$' solution provides a good balance of readability and functionality, while awk 'NF' offers better performance for large files.