How to extract filename and extension separately in Bash when the filename contains multiple dots?
I want to get the filename (without extension) and the extension separately in Bash.
The best solution I found so far is:
NAME=`echo "$FILE" | cut -d'.' -f1`
EXTENSION=`echo "$FILE" | cut -d'.' -f2`
This approach is incorrect because it doesn’t work when the filename contains multiple . characters. For example, with a file named a.b.js, it would extract a as the filename and b.js as the extension, instead of the correct a.b as the filename and js as the extension.
While this can be easily done in Python with:
file, ext = os.path.splitext(path)
I’d prefer not to invoke a Python interpreter just for this task if possible.
What are better approaches to correctly extract the filename and extension in Bash, especially for filenames with multiple dots?
The most reliable approach to extract filename and extension in Bash when dealing with multiple dots is to use parameter expansion with pattern removal, which correctly identifies the extension as everything after the rightmost dot. The simplest solution is:
FILENAME="${FILE%.*}"
EXTENSION="${FILE##*.}"
This will correctly handle filenames with multiple dots, giving you a.b as the filename and js as the extension for a.b.js.
Contents
- Understanding the Problem
- Parameter Expansion Solutions
- Alternative Approaches
- Comparison of Methods
- Edge Cases and Considerations
- Conclusion
Understanding the Problem
The fundamental issue with the cut-based approach is that it treats dots as delimiters and splits the filename into multiple parts. When a filename contains multiple dots like document.v1.2.txt, the cut -d'.' -f1 method only captures the first part (document), while cut -d'.' -f2 captures the second part (v1.2), leaving you with an incomplete picture.
The correct approach, as implemented in Python’s
os.path.splitext(), considers the extension to be everything after the last dot, not the first dot.
This means:
- For
filename.tar.gz→ filename:filename.tar, extension:gz - For
archive.2024.01.zip→ filename:archive.2024.01, extension:zip - For
config(no dots) → filename:config, extension: empty string - For
.hidden(starts with dot) → filename: empty string, extension:hidden
Parameter Expansion Solutions
Basic Parameter Expansion
The most elegant and efficient solution uses Bash’s built-in parameter expansion:
#!/bin/bash
FILE="a.b.js"
FILENAME="${FILE%.*}" # Remove shortest pattern from end
EXTENSION="${FILE##*.}" # Remove longest pattern from beginning
echo "Filename: $FILENAME" # Output: a.b
echo "Extension: $EXTENSION" # Output: js
How it works:
${FILE%.*}removes the shortest matching pattern from the end (everything after the last dot)${FILE##*.}removes the longest matching pattern from the beginning (everything before the last dot)
Complete Script Example
#!/bin/bash
extract_parts() {
local file="$1"
local filename="${file%.*}"
local extension="${file##*.}"
# Handle case where there's no extension
if [ "$filename" = "$file" ]; then
extension=""
fi
echo "Filename: '$filename'"
echo "Extension: '$extension'"
}
# Test cases
extract_parts "a.b.js" # Filename: 'a.b', Extension: 'js'
extract_parts "document.tar.gz" # Filename: 'document.tar', Extension: 'gz'
extract_parts "config" # Filename: 'config', Extension: ''
extract_parts ".hidden" # Filename: '', Extension: 'hidden'
extract_parts "..test.." # Filename: '..test.', Extension: ''
Alternative Approaches
Using basename and dirname
While basename and dirname are primarily for path manipulation, they can be combined to achieve the result:
#!/bin/bash
FILE="path/to/file.name.with.dots.txt"
BASENAME=$(basename "$FILE")
FILENAME="${BASENAME%.*}"
EXTENSION="${BASENAME##*.}"
echo "Filename: $FILENAME" # Output: file.name.with.dots
echo "Extension: $EXTENSION" # Output: txt
Using awk
The awk solution provides an alternative approach:
#!/bin/bash
FILE="a.b.c.d.e"
FILENAME=$(echo "$FILE" | awk -F'.' '{$NF=""; gsub(/\.$/, ""); print}')
EXTENSION=$(echo "$FILE" | awk -F'.' '{print $NF}')
echo "Filename: $FILENAME" # Output: a.b.c.d
echo "Extension: $EXTENSION" # Output: e
Using sed
For those who prefer sed, here’s a solution:
#!/bin/bash
FILE="document.v1.2.txt"
FILENAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
echo "Filename: $FILENAME" # Output: document.v1.2
echo "Extension: $EXTENSION" # Output: txt
Using rev and cut
This creative approach reverses the string, cuts, then reverses back:
#!/bin/bash
FILE="archive.2024.01.zip"
EXTENSION=$(echo "$FILE" | rev | cut -d'.' -f1 | rev)
FILENAME="${FILE%.*}"
echo "Filename: $FILENAME" # Output: archive.2024.01
echo "Extension: $EXTENSION" # Output: zip
Comparison of Methods
| Method | Performance | Readability | Portability | Edge Cases |
|---|---|---|---|---|
| Parameter Expansion | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Excellent |
basename + %.* |
⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Very Good |
awk |
⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Good |
sed |
⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | Good |
rev + cut |
⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Fair |
Parameter expansion wins on all fronts:
- Fastest: No external processes invoked
- Cleanest: Built-in shell syntax
- Most portable: Works in all modern shells
- Most reliable: Handles all edge cases well
Edge Cases and Considerations
Filenames Without Extensions
FILE="config"
FILENAME="${FILE%.*}" # config
EXTENSION="${FILE##*.}" # config (problem - same as filename)
Solution: Check if they’re the same:
if [ "$FILENAME" = "$FILE" ]; then
EXTENSION=""
fi
Hidden Files (Starting with Dot)
FILE=".bashrc"
FILENAME="${FILE%.*}" # empty string
EXTENSION="${FILE##*.}" # bashrc
Files Ending with Dot
FILE="file."
FILENAME="${FILE%.*}" # file
EXTENSION="${FILE##*.}" # empty string
Files with Only Dots
FILE=".."
FILENAME="${FILE%.*}" # empty string
EXTENSION="${FILE##*.}" # empty string
Complete Robust Function
#!/bin/bash
extract_file_parts() {
local file="$1"
local filename="${file%.*}"
local extension="${file##*.}"
# Handle special cases
if [ "$filename" = "$file" ]; then
extension=""
elif [ -z "$filename" ] && [ -n "$extension" ]; then
# Hidden file like .bashrc
filename=".$extension"
extension=""
fi
printf "Filename: '%s'\n" "$filename"
printf "Extension: '%s'\n" "$extension"
}
# Test all edge cases
extract_file_parts "a.b.js" # Normal case
extract_file_parts "document.tar.gz" # Multiple dots
extract_file_parts "config" # No extension
extract_file_parts ".hidden" # Hidden file
extract_file_parts "file." # Ends with dot
extract_file_parts ".." # Only dots
Conclusion
Parameter expansion with ${FILE%.*} and ${FILE##*.} is the best approach for extracting filename and extension in Bash when dealing with multiple dots. This method:
- Is the most efficient - No external processes are called
- Handles all edge cases correctly including hidden files and files without extensions
- Works in all modern shells (Bash, Zsh, Dash, etc.)
- Maintains readability once you understand the syntax
- Is the most reliable for production scripts
For robust script writing, always include checks for filenames without extensions and hidden files. The parameter expansion approach outperforms alternatives like cut, awk, sed, and rev+cut in performance, reliability, and portability while maintaining excellent readability.
When dealing with file paths that include directories, combine basename with parameter expansion: FILENAME="${basename%.*}" to ensure you’re working with just the filename portion.
This approach gives you Python’s os.path.splitext() functionality using only built-in shell features, eliminating the need to invoke external interpreters for simple filename parsing tasks.