When working in Unix shell environments, extracting substrings is one of the most fundamental operations. The challenge lies in choosing between powerful regex capabilities and simple, focused tools. Let's explore the spectrum of solutions from simplest to more complex.
For fixed-width or delimiter-separated fields, nothing beats cut
for simplicity:
# Extract characters 2-5
echo "abcdef" | cut -c2-5
# Extract second field delimited by colon
echo "john:doe:30" | cut -d: -f2
The expr
command provides simple pattern extraction without full regex complexity:
# Extract everything before first colon
expr "sample:text" : '$[^:]*$'
# Match first sequence of digits
expr "version2.3.4" : '[^0-9]*$[0-9]*$'
For Bash users, built-in parameter expansion offers efficient substring extraction:
str="hello_world"
# Substring from position 2 (0-based), 4 characters long
echo ${str:2:4}
# Remove shortest prefix matching pattern
echo ${str#*_}
# Remove longest suffix matching pattern
echo ${str%_*}
When you absolutely need regex but want to keep it simple:
# Extract first email address from text
echo "contact me@example.com soon" | grep -oE '[a-zA-Z0-9._]+@[a-zA-Z0-9.]+'
# Capture version number
echo "version 1.23.45 released" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+'
For processing large files or in performance-critical scripts:
cut
is fastest for fixed-format data- Bash built-ins avoid process creation overhead
grep -o
is more efficient thansed
for simple extractions
Common substring extraction scenarios:
# Get filename without extension
file="document.txt"
echo ${file%.*}
# Extract domain from URL
url="https://www.example.com/path"
domain=$(expr "$url" : 'https\?://$[^/]*$')
# Get process IDs only
ps aux | grep '▼显示shd' | awk '{print $2}'
When working in Unix shells, we often need quick solutions for substring extraction without diving into complex regular expressions or lengthy commands. Here are three fundamental approaches ordered by simplicity:
For fixed-width or delimiter-based extraction, cut
is the most straightforward tool:
# Extract characters 2-5
echo "abcdef" | cut -c2-5
# Output: bcde
# Extract second field delimited by commas
echo "apple,banana,cherry" | cut -d',' -f2
# Output: banana
For Bash users, parameter expansion provides substring capabilities without external commands:
str="programming"
echo ${str:3:5} # From index 3, length 5
# Output: gramm
When you need slightly more sophisticated extraction, awk
offers a good balance:
# Extract text between parentheses
echo "test (extract this) string" | awk -F'[()]' '{print $2}'
# Output: extract this
For genuine regular expression needs, sed
is your last resort for simplicity:
# Extract version number from string
echo "version-1.2.3-release" | sed -E 's/.*-([0-9.]+)-.*/\1/'
# Output: 1.2.3
For scripts processing large files, the choice matters:
cut
is fastest for fixed-width data- Bash built-ins have no process overhead
awk
andsed
have more startup overhead