How to extract a 5-digit number from a filename in Bash?
Given a filename in the format someletters_12345_moreleters.ext, how can I extract the 5-digit number (12345 in this example) and store it in a variable?
I’m looking for multiple approaches to accomplish this task in Bash, where the filename has the following pattern:
- A sequence of characters (x number of characters)
- A single underscore
- A 5-digit number
- Another single underscore
- Another sequence of characters (x number of characters)
- A file extension
What are the different methods to extract the 5-digit number from such filenames in Bash?
You can extract a 5-digit number from filenames in the format someletters_12345_moreleters.ext using several methods in Bash. The most common approaches include parameter expansion, regular expressions with grep/egrep, awk, and sed, each with different advantages in terms of simplicity, performance, and readability.
Contents
- Parameter Expansion Methods
- Using grep and Regular Expressions
- Awk-Based Solutions
- Sed-Based Approach
- Using cut and tr Commands
- Comparing Methods
- Complete Script Example
Parameter Expansion Methods
Parameter expansion in Bash provides efficient ways to extract patterns from strings without creating subprocesses.
Method 1: Using Pattern Substitution
filename="someletters_12345_moreleters.ext"
number="${filename##*_}"
number="${number%%_*}"
echo "$number" # Output: 12345
How it works:
${filename##*_}removes everything up to and including the last underscore${number%%_*}removes everything from the first underscore onward- This method doesn’t validate it’s exactly 5 digits, but extracts the middle part
Method 2: Using Bash Regex Matching
filename="someletters_12345_moreleters.ext"
if [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]]; then
number="${BASH_REMATCH[1]}"
echo "$number" # Output: 12345
fi
How it works:
^[^_]*_matches everything up to the first underscore([0-9]{5})captures exactly 5 digits in a group_[^_]*\.matches from the second underscore to the dot${BASH_REMATCH[1]}contains the captured group
Using grep and Regular Expressions
The grep family of tools is excellent for pattern matching with regular expressions.
Method 3: Using egrep with Anchored Pattern
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | egrep -o '[0-9]{5}')
echo "$number" # Output: 12345
How it works:
egrep -oprints only the matching part[0-9]{5}matches exactly 5 consecutive digits- This is simple but may match multiple 5-digit sequences if they exist
Method 4: Using grep with Context Matching
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | grep -oP '(?<=_)[0-9]{5}(?=_)')
echo "$number" # Output: 12345
How it works:
-Penables Perl-compatible regex(?<=_)is a positive lookbehind for underscore(?=_)is a positive lookahead for underscore- This ensures the 5-digit number is surrounded by underscores
Awk-Based Solutions
Awk is powerful for text processing and can handle complex pattern matching.
Method 5: Using awk with Field Separation
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | awk -F'_' '{print $2}')
echo "$number" # Output: 12345
How it works:
-F'_'sets underscore as the field separatorprint $2prints the second field (the number)- Simple but assumes the number is always the second field
Method 6: Using awk with Regex Matching
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | awk '{match($0, /_[0-9]{5}_/, arr); print substr(arr[0], 2, 5)}')
echo "$number" # Output: 12345
How it works:
match()finds the pattern and stores it in arrayarrsubstr()extracts the 5 digits, skipping the first underscore- More flexible than simple field separation
Sed-Based Approach
Sed (stream editor) can extract patterns using substitution commands.
Method 7: Using sed with Pattern Substitution
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | sed -n 's/.*_\([0-9]\{5\}\)_.*\.ext/\1/p')
echo "$number" # Output: 12345
How it works:
-nsuppresses automatic printings/.*_\([0-9]\{5\}\)_.*\.ext/\1/psubstitutes and prints the captured group\{5\}is equivalent to{5}in extended regex- The pattern explicitly matches
.extextension
Using cut and tr Commands
These traditional Unix tools can be combined for number extraction.
Method 8: Using cut and tr Combination
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | cut -d'_' -f2 | tr -d '.ext')
echo "$number" # Output: 12345
How it works:
cut -d'_' -f2extracts the second field (after first underscore)tr -d '.ext'removes the extension characters- Simple but less robust for variable extensions
Method 9: Using tr to Remove Non-Digits
filename="someletters_12345_moreleters.ext"
number=$(echo "$filename" | tr -c -d '0-9' | grep -o '.....')
echo "$number" # Output: 12345
How it works:
tr -c -d '0-9'keeps only digits, removing all other charactersgrep -o '.....'extracts exactly 5 characters- Works but may fail if there are multiple digit sequences
Comparing Methods
| Method | Performance | Readability | Robustness | Dependencies |
|---|---|---|---|---|
| Parameter Expansion | Excellent | Good | Moderate | None (Bash built-in) |
| grep/egrep | Good | Excellent | Good | grep/egrep |
| Awk | Moderate | Good | Excellent | awk |
| Sed | Moderate | Moderate | Good | sed |
| cut/tr | Excellent | Good | Poor | cut, tr |
| Bash Regex | Excellent | Good | Excellent | Bash 4+ |
Recommendations:
- For pure Bash environments: Use parameter expansion or Bash regex
- For maximum flexibility: Use awk with regex
- For simple cases: grep with
-oflag - For performance-critical scripts: parameter expansion
Complete Script Example
Here’s a complete script demonstrating multiple approaches:
#!/bin/bash
# Function to extract 5-digit number using different methods
extract_number_param() {
local filename="$1"
local number="${filename##*_}"
number="${number%%_*}"
echo "$number"
}
extract_number_grep() {
local filename="$1"
echo "$filename" | grep -oP '(?<=_)[0-9]{5}(?=_)'
}
extract_number_awk() {
local filename="$1"
echo "$filename" | awk -F'_' '{print $2}'
}
extract_number_bash_regex() {
local filename="$1"
if [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]]; then
echo "${BASH_REMATCH[1]}"
fi
}
# Test with sample filename
filename="someletters_12345_moreleters.ext"
echo "Original filename: $filename"
echo "Parameter expansion: $(extract_number_param "$filename")"
echo "grep with lookaround: $(extract_number_grep "$filename")"
echo "awk field separation: $(extract_number_awk "$filename")"
echo "Bash regex: $(extract_number_bash_regex "$filename")"
# Process multiple files in a directory
echo -e "\nProcessing files in current directory:"
for file in *_*.ext; do
if [[ -f "$file" ]]; then
num=$(extract_number_bash_regex "$file")
if [[ $num =~ ^[0-9]{5}$ ]]; then
echo "File: $file -> Number: $num"
fi
fi
done
This script provides four different methods and demonstrates how to apply them to both individual filenames and bulk processing of files in a directory. The Bash regex method includes validation to ensure exactly 5 digits are extracted.
Sources
- Stack Overflow - Extract numbers from filename
- Unix & Linux Stack Exchange - Extracting number from filename
- Unix & Linux Stack Exchange - How to extract the numbers from a filename
- Baeldung - How to Find Filenames That End With a Number in Bash
- Collecting Wisdom - Bash: How to Extract Numbers from Filename
Conclusion
Extracting 5-digit numbers from filenames in Bash can be accomplished through multiple approaches, each with distinct advantages:
- Parameter expansion offers the best performance and no external dependencies
- Bash regex matching provides excellent robustness with built-in pattern validation
- grep with lookaround delivers the most readable solution for complex patterns
- awk offers the most flexibility for advanced text processing
- sed is useful for patterns involving substitutions and captures
For the specific pattern someletters_12345_moreleters.ext, the Bash regex method with [[ $filename =~ ^[^_]*_([0-9]{5})_[^_]*\. ]] is recommended as it’s both performant and validates the exact 5-digit format. For bulk processing, combine this with a simple loop through files in your directory.
Choose the method that best fits your specific use case, considering factors like performance requirements, readability needs, and the complexity of your filename patterns.