NeuroAgent

Extract 5-Digit Number from Filename in Bash

Learn multiple methods to extract 5-digit numbers from filenames in Bash. Discover parameter expansion, regex, awk, sed, and grep approaches with complete examples and performance comparisons.

How to extract a 5-digit number from a filename in Bash?

Given a filename in the format someletters_12345_moreleters.ext, how can I extract the 5-digit number (12345 in this example) and store it in a variable?

I’m looking for multiple approaches to accomplish this task in Bash, where the filename has the following pattern:

  • A sequence of characters (x number of characters)
  • A single underscore
  • A 5-digit number
  • Another single underscore
  • Another sequence of characters (x number of characters)
  • A file extension

What are the different methods to extract the 5-digit number from such filenames in Bash?

NeuroAgent

You can extract a 5-digit number from filenames in the format someletters_12345_moreleters.ext using several methods in Bash. The most common approaches include parameter expansion, regular expressions with grep/egrep, awk, and sed, each with different advantages in terms of simplicity, performance, and readability.

Contents

Parameter Expansion Methods

Parameter expansion in Bash provides efficient ways to extract patterns from strings without creating subprocesses.

Method 1: Using Pattern Substitution

bash
filename="someletters_12345_moreleters.ext"
number="${filename##*_}"
number="${number%%_*}"
echo "$number"  # Output: 12345

How it works:

  • ${filename##*_} removes everything up to and including the last underscore
  • ${number%%_*} removes everything from the first underscore onward
  • This method doesn’t validate it’s exactly 5 digits, but extracts the middle part

Method 2: Using Bash Regex Matching

bash
filename="someletters_12345_moreleters.ext"
if [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]]; then
    number="${BASH_REMATCH[1]}"
    echo "$number"  # Output: 12345
fi

How it works:

  • ^[^_]*_ matches everything up to the first underscore
  • ([0-9]{5}) captures exactly 5 digits in a group
  • _[^_]*\. matches from the second underscore to the dot
  • ${BASH_REMATCH[1]} contains the captured group

Using grep and Regular Expressions

The grep family of tools is excellent for pattern matching with regular expressions.

Method 3: Using egrep with Anchored Pattern

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | egrep -o '[0-9]{5}')
echo "$number"  # Output: 12345

How it works:

  • egrep -o prints only the matching part
  • [0-9]{5} matches exactly 5 consecutive digits
  • This is simple but may match multiple 5-digit sequences if they exist

Method 4: Using grep with Context Matching

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | grep -oP '(?<=_)[0-9]{5}(?=_)')
echo "$number"  # Output: 12345

How it works:

  • -P enables Perl-compatible regex
  • (?<=_) is a positive lookbehind for underscore
  • (?=_) is a positive lookahead for underscore
  • This ensures the 5-digit number is surrounded by underscores

Awk-Based Solutions

Awk is powerful for text processing and can handle complex pattern matching.

Method 5: Using awk with Field Separation

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | awk -F'_' '{print $2}')
echo "$number"  # Output: 12345

How it works:

  • -F'_' sets underscore as the field separator
  • print $2 prints the second field (the number)
  • Simple but assumes the number is always the second field

Method 6: Using awk with Regex Matching

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | awk '{match($0, /_[0-9]{5}_/, arr); print substr(arr[0], 2, 5)}')
echo "$number"  # Output: 12345

How it works:

  • match() finds the pattern and stores it in array arr
  • substr() extracts the 5 digits, skipping the first underscore
  • More flexible than simple field separation

Sed-Based Approach

Sed (stream editor) can extract patterns using substitution commands.

Method 7: Using sed with Pattern Substitution

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | sed -n 's/.*_\([0-9]\{5\}\)_.*\.ext/\1/p')
echo "$number"  # Output: 12345

How it works:

  • -n suppresses automatic printing
  • s/.*_\([0-9]\{5\}\)_.*\.ext/\1/p substitutes and prints the captured group
  • \{5\} is equivalent to {5} in extended regex
  • The pattern explicitly matches .ext extension

Using cut and tr Commands

These traditional Unix tools can be combined for number extraction.

Method 8: Using cut and tr Combination

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | cut -d'_' -f2 | tr -d '.ext')
echo "$number"  # Output: 12345

How it works:

  • cut -d'_' -f2 extracts the second field (after first underscore)
  • tr -d '.ext' removes the extension characters
  • Simple but less robust for variable extensions

Method 9: Using tr to Remove Non-Digits

bash
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | tr -c -d '0-9' | grep -o '.....')
echo "$number"  # Output: 12345

How it works:

  • tr -c -d '0-9' keeps only digits, removing all other characters
  • grep -o '.....' extracts exactly 5 characters
  • Works but may fail if there are multiple digit sequences

Comparing Methods

Method Performance Readability Robustness Dependencies
Parameter Expansion Excellent Good Moderate None (Bash built-in)
grep/egrep Good Excellent Good grep/egrep
Awk Moderate Good Excellent awk
Sed Moderate Moderate Good sed
cut/tr Excellent Good Poor cut, tr
Bash Regex Excellent Good Excellent Bash 4+

Recommendations:

  • For pure Bash environments: Use parameter expansion or Bash regex
  • For maximum flexibility: Use awk with regex
  • For simple cases: grep with -o flag
  • For performance-critical scripts: parameter expansion

Complete Script Example

Here’s a complete script demonstrating multiple approaches:

bash
#!/bin/bash

# Function to extract 5-digit number using different methods
extract_number_param() {
    local filename="$1"
    local number="${filename##*_}"
    number="${number%%_*}"
    echo "$number"
}

extract_number_grep() {
    local filename="$1"
    echo "$filename" | grep -oP '(?<=_)[0-9]{5}(?=_)'
}

extract_number_awk() {
    local filename="$1"
    echo "$filename" | awk -F'_' '{print $2}'
}

extract_number_bash_regex() {
    local filename="$1"
    if [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]]; then
        echo "${BASH_REMATCH[1]}"
    fi
}

# Test with sample filename
filename="someletters_12345_moreleters.ext"

echo "Original filename: $filename"
echo "Parameter expansion: $(extract_number_param "$filename")"
echo "grep with lookaround: $(extract_number_grep "$filename")"
echo "awk field separation: $(extract_number_awk "$filename")"
echo "Bash regex: $(extract_number_bash_regex "$filename")"

# Process multiple files in a directory
echo -e "\nProcessing files in current directory:"
for file in *_*.ext; do
    if [[ -f "$file" ]]; then
        num=$(extract_number_bash_regex "$file")
        if [[ $num =~ ^[0-9]{5}$ ]]; then
            echo "File: $file -> Number: $num"
        fi
    fi
done

This script provides four different methods and demonstrates how to apply them to both individual filenames and bulk processing of files in a directory. The Bash regex method includes validation to ensure exactly 5 digits are extracted.

Sources

  1. Stack Overflow - Extract numbers from filename
  2. Unix & Linux Stack Exchange - Extracting number from filename
  3. Unix & Linux Stack Exchange - How to extract the numbers from a filename
  4. Baeldung - How to Find Filenames That End With a Number in Bash
  5. Collecting Wisdom - Bash: How to Extract Numbers from Filename

Conclusion

Extracting 5-digit numbers from filenames in Bash can be accomplished through multiple approaches, each with distinct advantages:

  • Parameter expansion offers the best performance and no external dependencies
  • Bash regex matching provides excellent robustness with built-in pattern validation
  • grep with lookaround delivers the most readable solution for complex patterns
  • awk offers the most flexibility for advanced text processing
  • sed is useful for patterns involving substitutions and captures

For the specific pattern someletters_12345_moreleters.ext, the Bash regex method with [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]] is recommended as it’s both performant and validates the exact 5-digit format. For bulk processing, combine this with a simple loop through files in your directory.

Choose the method that best fits your specific use case, considering factors like performance requirements, readability needs, and the complexity of your filename patterns.