How to Remove Line Breaks and Whitespace in AWK for Proper URL Encoding in Bash Scripts


3 views

When working with curl requests in bash scripts, we often need to pass multi-line text as URL parameters. The default behavior of AWK preserves line breaks which breaks URL encoding. Here's a typical problematic case:

# Original AWK command preserving line breaks
var2=$(awk 'NR>=38 && NR<=39' file.txt)

We need to modify the AWK command to:

  1. Remove line breaks completely
  2. Replace spaces with plus signs
  3. Handle special characters properly
#!/bin/bash

file=myfile.txt

# Improved AWK command with line break removal
var1=$(awk 'NR==30{print $2}' "$file")
var2=$(awk 'NR>=38 && NR<=39 {printf "%s", $0}' "$file" | tr '\n' ' ' | sed 's/ /+/g')

curl "http://example.com/send_sms?phone=$var1&text=$var2"

For more complex text processing, consider these methods:

# Method 1: Using ORS in AWK
var2=$(awk 'BEGIN {ORS="+"} NR>=38 && NR<=39 {print}' file.txt | sed 's/+$//')

# Method 2: Pure AWK solution
var2=$(awk 'NR>=38 && NR<=39 {gsub(/ /,"+"); printf "%s", $0}' file.txt)

# Method 3: Using xargs
var2=$(awk 'NR>=38 && NR<=39' file.txt | xargs | tr ' ' '+')

For URLs containing special characters beyond spaces:

# URL encode all special characters
var2=$(awk 'NR>=38 && NR<=39' file.txt | jq -sRr @uri)

# Or using perl
var2=$(awk 'NR>=38 && NR<=39' file.txt | perl -pe 's/([^\w ])/sprintf("%%%02X", ord($1))/ge; s/ /+/g')

When implementing this in production environments:

#!/bin/bash

file="${1:-myfile.txt}"
[ -f "$file" ] || { echo "Error: File not found"; exit 1; }

# Safe variable handling with error checking
var1=$(awk 'NR==30{print $2; exit}' "$file") || exit 1
var2=$(awk 'NR>=38 && NR<=39 {gsub(/ /,"+"); printf "%s", $0}' "$file") || exit 1

curl -G \
  --data-urlencode "phone=$var1" \
  --data-urlencode "text=$var2" \
  "http://example.com/send_sms"

When working with AWK to process text for CURL requests, line breaks and whitespace can become major obstacles. The original script shows a common scenario where we need to:

var2=$(awk 'NR>=38 && NR<=39' $file)
var3=${var2// /+}

While this converts spaces to plus signs, it doesn't handle the newline characters between lines 38 and 39. Let's explore robust solutions.

Here are three effective approaches to eliminate newlines:

Method 1: AWK's ORS (Output Record Separator)

var2=$(awk 'NR>=38 && NR<=39 {printf "%s ", $0}' $file | sed 's/ $//')

Method 2: Using tr Command

var2=$(awk 'NR>=38 && NR<=39' $file | tr '\n' ' ')

Method 3: Pure AWK String Concatenation

var2=$(awk 'NR>=38 && NR<=39 {s=s $0 "+"} END {sub(/\+$/, "", s); print s}' $file)

Here's the improved script with proper whitespace handling:

#!/bin/bash

cd /root/Msgs/TESTNEW/new

file=myfile.txt

var1=$(awk '(NR==30){print $2}' $file)
var2=$(awk 'NR>=38 && NR<=39 {s=s $0 " "} END {sub(/ $/, "", s); print s}' $file)
var3=${var2// /+}

curl "http:///power_sms/send_sms.php?username=&password=&phoneno=$var1&text=$var3"

Consider this input file (myfile.txt):

...
30: 987654321
...
38: Hello world
39: This is a test message
...

With our solution, the output becomes:

Hello+world+This+is+a+test+message

Always test for:

  • Empty lines in your range
  • Special characters that might need URL encoding
  • Very long messages that might exceed URL length limits