How to Remove Line Breaks and Whitespace in AWK for Proper URL Encoding in Bash Scripts


24 views

When working with curl requests in bash scripts, we often need to pass multi-line text as URL parameters. The default behavior of AWK preserves line breaks which breaks URL encoding. Here's a typical problematic case:

# Original AWK command preserving line breaks
var2=$(awk 'NR>=38 && NR<=39' file.txt)

We need to modify the AWK command to:

  1. Remove line breaks completely
  2. Replace spaces with plus signs
  3. Handle special characters properly
#!/bin/bash

file=myfile.txt

# Improved AWK command with line break removal
var1=$(awk 'NR==30{print $2}' "$file")
var2=$(awk 'NR>=38 && NR<=39 {printf "%s", $0}' "$file" | tr '\n' ' ' | sed 's/ /+/g')

curl "http://example.com/send_sms?phone=$var1&text=$var2"

For more complex text processing, consider these methods:

# Method 1: Using ORS in AWK
var2=$(awk 'BEGIN {ORS="+"} NR>=38 && NR<=39 {print}' file.txt | sed 's/+$//')

# Method 2: Pure AWK solution
var2=$(awk 'NR>=38 && NR<=39 {gsub(/ /,"+"); printf "%s", $0}' file.txt)

# Method 3: Using xargs
var2=$(awk 'NR>=38 && NR<=39' file.txt | xargs | tr ' ' '+')

For URLs containing special characters beyond spaces:

# URL encode all special characters
var2=$(awk 'NR>=38 && NR<=39' file.txt | jq -sRr @uri)

# Or using perl
var2=$(awk 'NR>=38 && NR<=39' file.txt | perl -pe 's/([^\w ])/sprintf("%%%02X", ord($1))/ge; s/ /+/g')

When implementing this in production environments:

#!/bin/bash

file="${1:-myfile.txt}"
[ -f "$file" ] || { echo "Error: File not found"; exit 1; }

# Safe variable handling with error checking
var1=$(awk 'NR==30{print $2; exit}' "$file") || exit 1
var2=$(awk 'NR>=38 && NR<=39 {gsub(/ /,"+"); printf "%s", $0}' "$file") || exit 1

curl -G \
  --data-urlencode "phone=$var1" \
  --data-urlencode "text=$var2" \
  "http://example.com/send_sms"

When working with AWK to process text for CURL requests, line breaks and whitespace can become major obstacles. The original script shows a common scenario where we need to:

var2=$(awk 'NR>=38 && NR<=39' $file)
var3=${var2// /+}

While this converts spaces to plus signs, it doesn't handle the newline characters between lines 38 and 39. Let's explore robust solutions.

Here are three effective approaches to eliminate newlines:

Method 1: AWK's ORS (Output Record Separator)

var2=$(awk 'NR>=38 && NR<=39 {printf "%s ", $0}' $file | sed 's/ $//')

Method 2: Using tr Command

var2=$(awk 'NR>=38 && NR<=39' $file | tr '\n' ' ')

Method 3: Pure AWK String Concatenation

var2=$(awk 'NR>=38 && NR<=39 {s=s $0 "+"} END {sub(/\+$/, "", s); print s}' $file)

Here's the improved script with proper whitespace handling:

#!/bin/bash

cd /root/Msgs/TESTNEW/new

file=myfile.txt

var1=$(awk '(NR==30){print $2}' $file)
var2=$(awk 'NR>=38 && NR<=39 {s=s $0 " "} END {sub(/ $/, "", s); print s}' $file)
var3=${var2// /+}

curl "http:///power_sms/send_sms.php?username=&password=&phoneno=$var1&text=$var3"

Consider this input file (myfile.txt):

...
30: 987654321
...
38: Hello world
39: This is a test message
...

With our solution, the output becomes:

Hello+world+This+is+a+test+message

Always test for:

  • Empty lines in your range
  • Special characters that might need URL encoding
  • Very long messages that might exceed URL length limits