How can I make POSIX ex portably ignore unmatched patterns in substitute commands under the following restrictions?
POSIX.1-2017 specifies two line-oriented file editing utilities: ed(1) and ex(1). ex is more efficient for noninteractive batch-editing because POSIX requires it to accept one or more file operands, while ed is only required to accept one. This means editing with ex avoids fork(2)ing when used in find(1)'s exec primary.
The issue is that when a search for a BRE fails to find a matching line, it’s considered an error. While this is explicitly stated in ed’s manual, all implementations of ex behave the same way. For example, in ex -sc s/pattern1/repl1|s/pattern2/repl2|x file
, if pattern1 is not matched in the file, the second substitute command is never attempted.
Given these restrictions:
- Vim’s ‘e’ flag is an extension and not required by POSIX
- Implementations providing multiple -c options cannot be relied upon
- sed -i should never be used to edit files
Is there a portable way to make ex ignore unmatched patterns in substitute commands?
Making POSIX ex Portably Ignore Unmatched Patterns in Substitute Commands
POSIX ex can be made to portably ignore unmatched patterns in substitute commands by using the :global
command with the substitute command, ensuring each substitution is attempted independently regardless of whether previous patterns matched. This approach leverages built-in POSIX functionality without relying on implementation-specific extensions or workarounds that violate the given restrictions.
Contents
- Understanding POSIX ex Behavior with Unmatched Patterns
- Using the Global Command Approach
- Shell Scripting Alternatives
- Practical Implementation Examples
- Limitations and Considerations
- Conclusion
Understanding POSIX ex Behavior with Unmatched Patterns
POSIX.1-2017 specifies that when a search for a BRE (Basic Regular Expression) fails to find a matching line, it’s considered an error condition. This behavior is consistent across all implementations of ex, even though it’s explicitly documented in ed’s manual.
The core issue manifests when using command pipelines like ex -sc "s/pattern1/repl1|s/pattern2/repl2|x" file
. In this scenario, if pattern1 doesn’t match any lines in the file, the entire command sequence aborts, preventing pattern2 from being executed. This limitation creates significant challenges in batch processing scenarios where some patterns may not exist in all files being processed.
This error-termination behavior differs from what many users expect when performing multiple substitutions, where they’d prefer each substitution to be attempted independently.
Using the Global Command Approach
The most portable solution within POSIX ex is to use the :global
command (:g
) with the substitute command (:s
). This approach leverages the fact that :global
doesn’t produce errors when patterns don’t match, allowing subsequent commands to execute normally.
Basic Syntax
The fundamental pattern is:
ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file
Key features of this approach:
g/pattern/
searches for all lines matching the pattern without failing if no matches exists//replacement/
uses the empty pattern//
which repeats the pattern from the preceding:global
command- Each substitution command is processed independently
Why This Works
The :global
command in POSIX ex is designed to work with zero matches - it simply doesn’t execute its associated command if no lines match, but doesn’t terminate the command sequence. This contrasts with the substitute command (:s
) on its own, which does terminate the sequence when a pattern fails to match.
# This will fail if pattern1 doesn't exist:
ex -c 's/pattern1/replacement1/s/pattern2/replacement2/x' file
# This works regardless of whether patterns match:
ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/x' file
Shell Scripting Alternatives
For more complex scenarios, shell scripting can provide additional flexibility while maintaining POSIX compliance. These approaches are particularly useful when processing multiple files or when patterns and replacements need to be dynamically generated.
Simple Loop Approach
#!/bin/sh
file=$1
shift
while [ $# -ge 2 ]; do
pattern=$1
replacement=$2
shift 2
ex -c "g/$pattern/s//$replacement/|x" "$file"
done
This script accepts a file followed by alternating patterns and replacements, applying each substitution separately.
Pattern File Processing
For scenarios with many substitution patterns:
#!/bin/sh
file=$1
patterns=$2
while IFS= read -r pattern && IFS= read -r replacement; do
ex -c "g/$pattern/s//$replacement/|x" "$file"
done < "$patterns"
This reads pattern/replacement pairs from a file, allowing for more complex and maintainable substitution rules.
Find Integration
The original question mentioned using ex with find(1)'s exec primary. Here’s how to integrate the pattern-ignoring approach:
find . -name "*.txt" -exec ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' {} +
This processes multiple files efficiently while ensuring unmatched patterns don’t terminate the command sequence.
Practical Implementation Examples
Single File Multiple Substitutions
For processing a single file with multiple substitutions:
ex -c 'g/OLD1/s//NEW1/|g/OLD2/s//NEW2/|g/OLD3/s//NEW3/|x' document.txt
This will attempt all three substitutions regardless of whether previous patterns matched.
Conditional Replacements with Patterns
When you need to perform different replacements based on different patterns:
#!/bin/sh
file="$1"
# First pass - replace all instances of "foo" with "bar"
ex -c "g/foo/s//bar/|x" "$file"
# Second pass - replace all instances of "baz" with "qux"
ex -c "g/baz/s//qux/|x" "$file"
# Third pass - replace all instances of "quux" with "corge"
ex -c "g/quux/s//corge/|x" "$file"
Logging Failed Patterns
For debugging or logging purposes, you can extend the shell script approach to track which patterns didn’t match:
#!/bin/sh
file="$1"
shift
log_file="substitution_log_$(date +%Y%m%d_%H%M%S).log"
echo "Starting substitutions on $(date)" > "$log_file"
while [ $# -ge 2 ]; do
pattern="$1"
replacement="$2"
shift 2
# Count matches before substitution
match_count=$(grep -c "$pattern" "$file" 2>/dev/null || echo 0)
if [ "$match_count" -gt 0 ]; then
ex -c "g/$pattern/s//$replacement/|x" "$file"
echo "Applied: $pattern -> $replacement ($match_count occurrences)" >> "$log_file"
else
echo "Skipped: $pattern -> $replacement (no matches found)" >> "$log_file"
fi
done
echo "Completed substitutions on $(date)" >> "$log_file"
echo "Log saved to: $log_file"
Limitations and Considerations
While the approaches described above work within POSIX constraints, there are some limitations to consider:
Performance Considerations
Processing each substitution separately can be less efficient than a single command pipeline, especially for large files. The performance impact generally becomes noticeable only with:
- Very large files (many MB or GB)
- A large number of substitution patterns (dozens or more)
- Frequent execution in tight loops
For most common use cases, the performance difference is negligible compared to the reliability gained.
Pattern Escaping
When patterns contain special regular expression characters or shell metacharacters, they need proper escaping:
# For complex patterns, use printf to handle escaping
pattern=$(printf "%s" "special.*chars+" | sed 's/[[\.*^$()+?{|]/\\&/g')
replacement=$(printf "%s" "replacement" | sed 's/[\/&]/\\&/g')
ex -c "g/$pattern/s//$replacement/|x" file
Atomicity Concerns
Each substitution is a separate operation on the file, which means:
- Intermediate states exist between operations
- If the script terminates partway through, some substitutions may be applied while others are not
- The file is modified between each substitution
For critical operations, consider creating a backup first:
cp file.txt file.txt.backup
ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file.txt
Portability Testing
Different Unix implementations may have subtle variations in ex behavior. When writing scripts for distribution:
- Test on multiple platforms (Linux, BSD, Solaris, etc.)
- Use the most basic POSIX features possible
- Avoid implementation-specific optimizations
Conclusion
To make POSIX ex portably ignore unmatched patterns in substitute commands:
-
Use the
:global
command approach - This is the most reliable POSIX-compliant method, as inex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file
. Each substitution is attempted independently regardless of whether previous patterns matched. -
Leverage shell scripting for complex scenarios - When dealing with multiple files or dynamic patterns, shell scripts that process each substitution separately provide additional flexibility while maintaining POSIX compliance.
-
Consider performance and atomicity trade-offs - While the global command approach is reliable, be aware of performance implications and atomicity concerns when processing large files or many patterns.
-
Test across different implementations - POSIX ex behavior is generally consistent, but testing on multiple Unix platforms ensures maximum portability for critical scripts.
This approach meets all the specified restrictions: it doesn’t rely on Vim’s ‘e’ flag, doesn’t depend on multiple -c options, and avoids sed -i while providing a portable solution to the problem of unmatched patterns in ex substitute commands.