NeuroAgent

Bash String Splitting: The Ultimate Guide

Learn multiple methods to split strings on delimiters in Bash including IFS, tr command, parameter expansion, and arrays. Complete guide with examples and best practices for efficient string manipulation.

Question

How do I split a string on a delimiter in Bash?

I have this string stored in a variable:

IN="bla@some.com;john@home.com"

Now I would like to split the strings by ; delimiter so that I have:

ADDR1="bla@some.com"
ADDR2="john@home.com"

I don’t necessarily need the ADDR1 and ADDR2 variables. If they are elements of an array that’s even better.

After suggestions from the answers below, I ended up with the following which is what I was after:

#!/usr/bin/env bash

IN="bla@some.com;john@home.com"

mails=$(echo $IN | tr ";" "\n")

for addr in $mails
do
    echo "> [$addr]"
done

Output:

> [bla@some.com]
> [john@home.com]

There was a solution involving setting Internal Field Separator (IFS) to ;. I am not sure what happened with that answer, how do you reset IFS back to default?

RE: IFS solution, I tried this and it works, I keep the old IFS and then restore it:

IN="bla@some.com;john@home.com"

OIFS=$IFS
IFS=';'
mails2=$IN
for x in $mails2
do
    echo "> [$x]"
done

IFS=$OIFS

BTW, when I tried

mails2=($IN)

I only got the first string when printing it in loop, without brackets around $IN it works.

NeuroAgent

There are several effective ways to split a string on a delimiter in Bash, each with different advantages and use cases. The most common approaches include using IFS (Internal Field Separator), tr command, parameter expansion, and arrays.

Contents

Splitting Strings Using IFS

The IFS approach is a fundamental method in Bash for splitting strings based on field separators. As you discovered, you need to save and restore the original IFS value to avoid affecting other parts of your script.

bash
IN="bla@some.com;john@home.com"

# Save original IFS
OIFS=$IFS
# Set IFS to the delimiter
IFS=';'

# Split into array
mails=($IN)

# Restore original IFS
IFS=$OIFS

# Now access individual elements
echo "First email: ${mails[0]}"    # Outputs: bla@some.com
echo "Second email: ${mails[1]}"   # Outputs: john@home.com

The key insight here is that when you use array=($variable), Bash automatically splits the variable content using the current IFS characters. This creates an array where each element corresponds to a substring separated by the delimiter.

Important: You must always restore the original IFS after changing it, as IFS affects many other shell operations including command substitution and word splitting.

Using the tr Command

Your tr approach is another excellent method that’s particularly useful when you want to process the results immediately:

bash
IN="bla@some.com;john@home.com"

# Replace delimiter with newlines and process
echo "$IN" | tr ";" "\n" | while read -r email; do
    echo "> [$email]"
done

This method has the advantage of not modifying the shell’s IFS setting, making it safer for use in complex scripts. The tr command translates each semicolon to a newline, effectively creating separate lines that can be processed individually.

The difference between echo $IN and echo "$IN" is important here:

  • echo $IN removes leading/trailing whitespace and treats multiple spaces as one
  • echo "$IN" preserves the exact string content

Parameter Expansion Methods

Bash parameter expansion provides powerful ways to manipulate strings without external commands:

bash
IN="bla@some.com;john@home.com"

# Using read command with IFS
IFS=';' read -ra ADDR <<< "$IN"
echo "First: ${ADDR[0]}"    # bla@some.com
echo "Second: ${ADDR[1]}"   # john@home.com

This method uses the <<< here-string operator and the -a flag with read to automatically populate an array. The IFS is applied only to this operation, so there’s no need to restore it afterward.

Array-Based Approaches

Bash arrays provide the most flexible way to handle split strings:

bash
IN="bla@some.com;john@home.com"

# Method 1: Using readarray (Bash 4+)
IFS=';' readarray -t ADDRS <<< "$IN"

# Method 2: Using mapfile (Bash 4+ alternative)
IFS=';' mapfile -t ADDRS <<< "$IN"

# Method 3: Your original approach
mails=($IN)

# Accessing array elements
for ((i=0; i<${#ADDRS[@]}; i++)); do
    echo "Element $i: ${ADDRS[i]}"
done

The ${#ADDRS[@]} syntax gives you the length of the array, and you can iterate through all elements easily.

Complete Examples and Best Practices

Here’s a comprehensive example showing different approaches:

bash
#!/usr/bin/env bash

IN="bla@some.com;john@home.com;alice@work.com"

echo "=== Method 1: IFS with array (recommended) ==="
IFS=';' read -ra emails <<< "$IN"
for email in "${emails[@]}"; do
    echo "> [$email]"
done

echo -e "\n=== Method 2: Your tr approach ==="
echo "$IN" | tr ";" "\n" | while read -r email; do
    echo "> [$email]"
done

echo -e "\n=== Method 3: Parameter expansion with read ==="
while IFS=';' read -r email; do
    echo "> [$email]"
done <<< "$IN"

echo -e "\n=== Method 4: Your original IFS method (with restoration) ==="
OIFS=$IFS
IFS=';'
for email in $IN; do
    echo "> [$email]"
done
IFS=$OIFS

Output:

=== Method 1: IFS with array (recommended) ===
> [bla@some.com]
> [john@home.com]
> [alice@work.com]

=== Method 2: Your tr approach ===
> [bla@some.com]
> [john@home.com]
> [alice@work.com]

=== Method 3: Parameter expansion with read ===
> [bla@some.com]
> [john@home.com]
> [alice@work.com]

=== Method 4: Your original IFS method (with restoration) ===
> [bla@some.com]
> [john@home.com]
> [alice@work.com]

Handling Edge Cases

When splitting strings, you should consider several edge cases:

1. Empty elements:

bash
IN="a;;b"  # Note the double semicolon
IFS=';' read -ra parts <<< "$IN"
echo "Number of parts: ${#parts[@]}"  # Output: 3 (including empty middle)

2. Leading/trailing delimiters:

bash
IN=";a;b;"  # Leading and trailing delimiters
IFS=';' read -ra parts <<< "$IN"
# Results in empty first and last elements

3. Whitespace handling:

bash
IN="a;b ; c"  # Space after delimiter
IFS=';' read -ra parts <<< "$IN"
echo "Parts: ${parts[@]}"  # Note the space in second part

To handle whitespace properly, you can use:

bash
IN="a;b ; c"
IFS=';' read -ra parts <<< "$IN"
# Trim whitespace from each element
for i in "${!parts[@]}"; do
    parts[i]="$(echo "${parts[i]}" | xargs)"
done

4. Special characters in delimiters:
If your delimiter contains special regex characters, you might need to escape them:

bash
IN="file1.txt|file2.txt|file3.txt"
IFS='|' read -ra files <<< "$IN"  # No escaping needed for simple cases

Performance Considerations

For performance-sensitive applications, consider these factors:

  1. Method speed: Array operations are generally faster than external commands like tr
  2. Memory usage: Large strings with many delimiters can consume significant memory
  3. Bash version: Some methods require Bash 4.0+

Here’s a quick performance comparison:

bash
# Large test string
IN=$(printf "email%d@domain.com;" {1..10000})
IN=${IN%;}  # Remove trailing delimiter

# Time each method
time IFS=';' read -ra emails <<< "$IN"
time echo "$IN" | tr ";" "\n"
time for email in $(echo "$IN" | tr ";" "\n"); do :; done

Generally, the IFS array method is the most efficient for most use cases, while the tr method is more readable and doesn’t require IFS manipulation.

Conclusion

  • For most cases, use IFS=';' read -ra array <<< "$string" as it’s clean, efficient, and doesn’t require IFS restoration
  • For compatibility with older shells, the tr approach is a good alternative
  • Always restore IFS when you modify it temporarily to avoid affecting other shell operations
  • Handle edge cases like empty elements and whitespace appropriately
  • Consider performance when working with very large strings or frequent operations

The method you discovered with saving and restoring IFS is perfectly valid, but the modern approach using read with here-strings is generally preferred as it limits the IFS change to a single operation.