GLM 4.5 Air

How to Make POSIX ex Ignore Unmatched Patterns

Learn how to make POSIX ex portably ignore unmatched patterns in substitute commands. Discover the global command approach that works across all Unix systems without relying on non-standard extensions.

Question

How can I make POSIX ex portably ignore unmatched patterns in substitute commands under the following restrictions?

POSIX.1-2017 specifies two line-oriented file editing utilities: ed(1) and ex(1). ex is more efficient for noninteractive batch-editing because POSIX requires it to accept one or more file operands, while ed is only required to accept one. This means editing with ex avoids fork(2)ing when used in find(1)'s exec primary.

The issue is that when a search for a BRE fails to find a matching line, it’s considered an error. While this is explicitly stated in ed’s manual, all implementations of ex behave the same way. For example, in ex -sc s/pattern1/repl1|s/pattern2/repl2|x file, if pattern1 is not matched in the file, the second substitute command is never attempted.

Given these restrictions:

  1. Vim’s ‘e’ flag is an extension and not required by POSIX
  2. Implementations providing multiple -c options cannot be relied upon
  3. sed -i should never be used to edit files

Is there a portable way to make ex ignore unmatched patterns in substitute commands?

GLM 4.5 Air

Making POSIX ex Portably Ignore Unmatched Patterns in Substitute Commands

POSIX ex can be made to portably ignore unmatched patterns in substitute commands by using the :global command with the substitute command, ensuring each substitution is attempted independently regardless of whether previous patterns matched. This approach leverages built-in POSIX functionality without relying on implementation-specific extensions or workarounds that violate the given restrictions.

Contents


Understanding POSIX ex Behavior with Unmatched Patterns

POSIX.1-2017 specifies that when a search for a BRE (Basic Regular Expression) fails to find a matching line, it’s considered an error condition. This behavior is consistent across all implementations of ex, even though it’s explicitly documented in ed’s manual.

The core issue manifests when using command pipelines like ex -sc "s/pattern1/repl1|s/pattern2/repl2|x" file. In this scenario, if pattern1 doesn’t match any lines in the file, the entire command sequence aborts, preventing pattern2 from being executed. This limitation creates significant challenges in batch processing scenarios where some patterns may not exist in all files being processed.

This error-termination behavior differs from what many users expect when performing multiple substitutions, where they’d prefer each substitution to be attempted independently.


Using the Global Command Approach

The most portable solution within POSIX ex is to use the :global command (:g) with the substitute command (:s). This approach leverages the fact that :global doesn’t produce errors when patterns don’t match, allowing subsequent commands to execute normally.

Basic Syntax

The fundamental pattern is:

ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file

Key features of this approach:

  • g/pattern/ searches for all lines matching the pattern without failing if no matches exist
  • s//replacement/ uses the empty pattern // which repeats the pattern from the preceding :global command
  • Each substitution command is processed independently

Why This Works

The :global command in POSIX ex is designed to work with zero matches - it simply doesn’t execute its associated command if no lines match, but doesn’t terminate the command sequence. This contrasts with the substitute command (:s) on its own, which does terminate the sequence when a pattern fails to match.

bash
# This will fail if pattern1 doesn't exist:
ex -c 's/pattern1/replacement1/s/pattern2/replacement2/x' file

# This works regardless of whether patterns match:
ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/x' file

Shell Scripting Alternatives

For more complex scenarios, shell scripting can provide additional flexibility while maintaining POSIX compliance. These approaches are particularly useful when processing multiple files or when patterns and replacements need to be dynamically generated.

Simple Loop Approach

bash
#!/bin/sh

file=$1
shift

while [ $# -ge 2 ]; do
  pattern=$1
  replacement=$2
  shift 2
  
  ex -c "g/$pattern/s//$replacement/|x" "$file"
done

This script accepts a file followed by alternating patterns and replacements, applying each substitution separately.

Pattern File Processing

For scenarios with many substitution patterns:

bash
#!/bin/sh

file=$1
patterns=$2

while IFS= read -r pattern && IFS= read -r replacement; do
  ex -c "g/$pattern/s//$replacement/|x" "$file"
done < "$patterns"

This reads pattern/replacement pairs from a file, allowing for more complex and maintainable substitution rules.

Find Integration

The original question mentioned using ex with find(1)'s exec primary. Here’s how to integrate the pattern-ignoring approach:

bash
find . -name "*.txt" -exec ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' {} +

This processes multiple files efficiently while ensuring unmatched patterns don’t terminate the command sequence.


Practical Implementation Examples

Single File Multiple Substitutions

For processing a single file with multiple substitutions:

bash
ex -c 'g/OLD1/s//NEW1/|g/OLD2/s//NEW2/|g/OLD3/s//NEW3/|x' document.txt

This will attempt all three substitutions regardless of whether previous patterns matched.

Conditional Replacements with Patterns

When you need to perform different replacements based on different patterns:

bash
#!/bin/sh

file="$1"

# First pass - replace all instances of "foo" with "bar"
ex -c "g/foo/s//bar/|x" "$file"

# Second pass - replace all instances of "baz" with "qux"
ex -c "g/baz/s//qux/|x" "$file"

# Third pass - replace all instances of "quux" with "corge"
ex -c "g/quux/s//corge/|x" "$file"

Logging Failed Patterns

For debugging or logging purposes, you can extend the shell script approach to track which patterns didn’t match:

bash
#!/bin/sh

file="$1"
shift
log_file="substitution_log_$(date +%Y%m%d_%H%M%S).log"

echo "Starting substitutions on $(date)" > "$log_file"

while [ $# -ge 2 ]; do
  pattern="$1"
  replacement="$2"
  shift 2
  
  # Count matches before substitution
  match_count=$(grep -c "$pattern" "$file" 2>/dev/null || echo 0)
  
  if [ "$match_count" -gt 0 ]; then
    ex -c "g/$pattern/s//$replacement/|x" "$file"
    echo "Applied: $pattern -> $replacement ($match_count occurrences)" >> "$log_file"
  else
    echo "Skipped: $pattern -> $replacement (no matches found)" >> "$log_file"
  fi
done

echo "Completed substitutions on $(date)" >> "$log_file"
echo "Log saved to: $log_file"

Limitations and Considerations

While the approaches described above work within POSIX constraints, there are some limitations to consider:

Performance Considerations

Processing each substitution separately can be less efficient than a single command pipeline, especially for large files. The performance impact generally becomes noticeable only with:

  • Very large files (many MB or GB)
  • A large number of substitution patterns (dozens or more)
  • Frequent execution in tight loops

For most common use cases, the performance difference is negligible compared to the reliability gained.

Pattern Escaping

When patterns contain special regular expression characters or shell metacharacters, they need proper escaping:

bash
# For complex patterns, use printf to handle escaping
pattern=$(printf "%s" "special.*chars+" | sed 's/[[\.*^$()+?{|]/\\&/g')
replacement=$(printf "%s" "replacement" | sed 's/[\/&]/\\&/g')

ex -c "g/$pattern/s//$replacement/|x" file

Atomicity Concerns

Each substitution is a separate operation on the file, which means:

  • Intermediate states exist between operations
  • If the script terminates partway through, some substitutions may be applied while others are not
  • The file is modified between each substitution

For critical operations, consider creating a backup first:

bash
cp file.txt file.txt.backup
ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file.txt

Portability Testing

Different Unix implementations may have subtle variations in ex behavior. When writing scripts for distribution:

  1. Test on multiple platforms (Linux, BSD, Solaris, etc.)
  2. Use the most basic POSIX features possible
  3. Avoid implementation-specific optimizations

Conclusion

To make POSIX ex portably ignore unmatched patterns in substitute commands:

  1. Use the :global command approach - This is the most reliable POSIX-compliant method, as in ex -c 'g/pattern1/s//replacement1/|g/pattern2/s//replacement2/|x' file. Each substitution is attempted independently regardless of whether previous patterns matched.

  2. Leverage shell scripting for complex scenarios - When dealing with multiple files or dynamic patterns, shell scripts that process each substitution separately provide additional flexibility while maintaining POSIX compliance.

  3. Consider performance and atomicity trade-offs - While the global command approach is reliable, be aware of performance implications and atomicity concerns when processing large files or many patterns.

  4. Test across different implementations - POSIX ex behavior is generally consistent, but testing on multiple Unix platforms ensures maximum portability for critical scripts.

This approach meets all the specified restrictions: it doesn’t rely on Vim’s ‘e’ flag, doesn’t depend on multiple -c options, and avoids sed -i while providing a portable solution to the problem of unmatched patterns in ex substitute commands.