Best Practices for Writing Multi-line XML/String Literals in Bash Scripts


3 views



When working with bash scripts, we often encounter situations where we need to store complex multi-line strings - particularly XML fragments or JSON data - directly in the script. Traditional approaches using regular strings can become messy:


# Problematic approach:
xml="<?xml version=\"1.0\" encoding='UTF-8'?>\n<painting>\n  <img src=\"madonna.jpg\" alt='Foligno Madonna, by Raphael'/>\n  <caption>This is Raphael's \"Foligno\" Madonna, painted in\n  <date>1511</date>-<date>1512</date>.</caption>\n</painting>"

This requires excessive escaping and makes the XML unreadable within the script.

Bash provides a clean solution through "here documents" using the <<EOF syntax:


# Clean multi-line XML storage
read -r -d '' xml_content <<EOF
<?xml version="1.0" encoding='UTF-8'?>
<painting>
  <img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
  <caption>This is Raphael's "Foligno" Madonna, painted in
  <date>1511</date>-<date>1512</date>.</caption>
</painting>
EOF

Key advantages of this approach:

  • No need to escape quotes or special characters
  • Preserves original formatting and indentation
  • Maintains human readability within the script
  • Works with complex nested XML/JSON structures

For more complex scenarios, consider these variations:


# With variable substitution
read -r -d '' email_template <<EOF
From: $sender
To: $recipient
Subject: $subject

$message_body
EOF

# With command substitution
read -r -d '' deployment_config <<EOF
{
  "version": "$(git rev-parse HEAD)",
  "environment": "$ENV",
  "timestamp": "$(date -u)"
}
EOF

For content that might contain the delimiter (EOF in our examples), use a unique delimiter:


# Using a unique delimiter
read -r -d '' sql_query <<'UNIQUE_DELIMITER'
SELECT * FROM users 
WHERE created_at > '2023-01-01'
AND status = 'active'
UNIQUE_DELIMITER

Note the single quotes around the delimiter name which prevents variable expansion within the here document.

For very large multi-line strings (10KB+), consider these optimizations:


# Store in a function for better performance
generate_large_xml() {
  cat <<EOF
<?xml version="1.0"?>
<data>
$(for i in {1..10000}; do echo "  <record id=\"$i\"/>"; done)
</data>
EOF
}

xml_content=$(generate_large_xml)

When working with bash scripts, one common challenge is cleanly storing multi-line strings like XML fragments. Traditional methods using double quotes or escaping characters quickly become unreadable and error-prone, especially with complex XML structures.

Bash's heredoc syntax provides the perfect solution for this scenario. It allows:

  • Preservation of all formatting and indentation
  • No need for character escaping (except the delimiter itself)
  • Human-readable code structure
#!/bin/bash

read -r -d '' xml_content <<'EOF'
<?xml version="1.0" encoding='UTF-8'?>
<painting>
  <img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
  <caption>This is Raphael's "Foligno" Madonna, painted in
  <date>1511</date>-<date>1512</date>.</caption>
</painting>
EOF

echo "$xml_content"

The heredoc approach offers several useful variations:

# Version with variable substitution
cat <<EOF
<config>
  <user>$USER</user>
  <home>$HOME</home>
</config>
EOF

# Version without variable substitution (literal)
cat <<'EOF'
<config>
  <user>$USER</user>  # Will not be expanded
</config>
EOF

For XML containing the delimiter string itself, you can use a more unique delimiter:

read -r -d '' sql_query <<'SQL_END'
SELECT * FROM users
WHERE name LIKE 'John%'
AND created_at > '2023-01-01'
SQL_END

When dealing with very large XML fragments, consider these alternatives:

  • Store in separate files and read with $(cat file.xml)
  • Use base64 encoding for binary XML or special characters
  • Consider XML-aware tools like xmlstarlet for complex manipulations