How to Filter JSON Objects with jq to Exclude Substrings
Learn to use jq to filter JSON objects and exclude those containing specific substrings in attributes like 'fruits'. Fix common issues with any() and select() for effective JSON processing and data filtering.
How do I filter JSON objects to exclude those containing specific substrings using jq? I need to filter objects in a list based on whether their ‘fruits’ attribute contains any of the blacklisted substrings, but my current approach with any() isn’t working correctly.
To filter JSON objects using jq and exclude those containing specific substrings in their ‘fruits’ attribute, you need to combine select(), any(), and string matching functions. The common mistake with any() is not properly handling the comparison to false for exclusion filtering, which requires understanding how jq’s boolean evaluation works.
Contents
- Understanding the Problem with jq Filtering
- Correct Approach Using select() and any()
- Blacklisted Substring Filtering Technique
- Practical Examples
- Common Pitfalls and Solutions
- Advanced Filtering Patterns
Understanding the Problem with jq Filtering
When working with JSON objects in jq, filtering based on string content requires understanding how jq handles string matching and boolean operations. The challenge arises because many developers try to use any() directly without properly negating the result for exclusion filtering.
The official jq documentation explains that any() alone returns true if any element in an array matches the condition. However, when you want to exclude objects that contain blacklisted substrings, you need to invert this logic by checking if any equals false.
Let’s consider a typical JSON structure where you have an array of objects, each containing a ‘fruits’ attribute with an array of fruit names:
[
{"name": "Alice", "fruits": ["apple", "banana", "cherry"]},
{"name": "Bob", "fruits": ["orange", "grape", "lemon"]},
{"name": "Charlie", "fruits": ["pear", "blackberry", "kiwi"]}
]
Your goal might be to exclude any object whose ‘fruits’ array contains “banana” or “lemon”.
Correct Approach Using select() and any()
The proper approach involves several jq functions working together:
- select() - Filters objects based on a condition
- any() - Checks if any element in an array matches
- String matching functions - Contains, test, etc.
Here’s the fundamental pattern:
jq '.[] | select(.fruits | map(contains("substring")) | any == false)'
Breaking this down:
.[]iterates through each object in the arrayselect()keeps only objects that meet the condition.fruitsaccesses the fruits arraymap(contains("substring"))creates an array of booleansany == falsechecks that no elements contain the substring
Blacklisted Substring Filtering Technique
For filtering against multiple blacklisted substrings, you can use the test() function with a regex pattern. This is more efficient than checking each substring individually.
According to the JSON manipulation tutorial, you can create a regex pattern that combines all blacklisted substrings using the pipe | operator as an OR condition:
jq '.[] | select(.fruits | map(test("blacklist1|blacklist2|blacklist3")) | any == false)'
This approach creates a regex that matches any of the blacklisted substrings. The test() function returns true for each string in the fruits array that matches the pattern, and any == false ensures only objects with no matches are kept.
For example, to exclude objects containing “banana”, “lemon”, or “grape”:
jq '.[] | select(.fruits | map(test("banana|lemon|grape")) | any == false)'
Practical Examples
Let’s work through a complete example with sample data:
[
{"id": 1, "fruits": ["apple", "banana", "cherry"]},
{"id": 2, "fruits": ["orange", "grape", "lemon"]},
{"id": 3, "fruits": ["pear", "blackberry", "kiwi"]},
{"id": 4, "fruits": ["strawberry", "raspberry", "blueberry"]},
{"id": 5, "fruits": []}
]
To exclude objects containing “berry” or “banana”:
cat data.json | jq '.[] | select(.fruits | map(test("berry|banana")) | any == false)'
This will return objects with IDs 1, 2, and 5, excluding objects 3 and 4 which contain “berry”.
The jq community discussion shows that you can also use map_values when working with object keys rather than arrays. For example:
echo '{"foo":[10],"bar":[20],"baz":[10,20],"qux":[30],"quux":[]}' | jq -c 'map_values(select(contains(10,20 | [.])))'
Common Pitfalls and Solutions
1. Incorrect boolean evaluation
Problem: Using any without comparing to false for exclusion
# Incorrect - this will exclude objects with NO matches
jq '.[] | select(.fruits | any(contains("banana")))'
Solution: Always compare to false for exclusion filtering
# Correct - excludes objects WITH matches
jq '.[] | select(.fruits | any(contains("banana")) | not)'
2. String vs. filter confusion
Problem: Using contains() as a filter instead of a string predicate
# Incorrect syntax
jq '.[] | select(.fruits | contains("banana"))'
Solution: Use contains() properly with map() and any()
# Correct syntax
jq '.[] | select(.fruits | map(contains("banana")) | any == false)'
3. Empty array handling
Problem: Objects with empty arrays don’t match properly
# An empty fruits array will always match (no blacklisted substrings)
# but you might want to handle this case differently
Solution: Add additional conditions if needed
# Exclude empty arrays AND blacklisted substrings
jq '.[] | select(.fruits | length > 0 and (map(test("blacklist")) | any == false))'
Advanced Filtering Patterns
For more complex filtering scenarios, you can combine multiple conditions:
1. Case-insensitive matching
jq '.[] | select(.fruits | map(ascii_downcase | contains("banana")) | any == false)'
2. Multiple attributes filtering
jq '.[] | select((.fruits | map(test("blacklist")) | any == false) and (.name | test("allowed_pattern")))'
3. Using variables for blacklisted terms
BLACKLIST="banana|lemon|grape"
jq --arg blacklist "$BLACKLIST" '.[] | select(.fruits | map(test($blacklist)) | any == false)'
4. Filtering with custom functions
def exclude_blacklisted($blacklist):
.fruits | map(test($blacklist)) | any == false;
jq '.[] | select(exclude_blacklisted("banana|lemon"))'
Sources
- jq 1.8 Manual - Official documentation with detailed explanations of filtering functions and common pitfalls
- JSON Manipulation with jq Filter - Examples - Practical tutorial with working code examples for substring filtering
- How to filter a array contains any of the giving elements ? - Community discussion demonstrating advanced filtering patterns
Conclusion
Filtering JSON objects with jq to exclude those containing specific substrings requires understanding the proper combination of select(), any(), and string matching functions. The key insight is that for exclusion filtering, you need to check if any == false rather than just using any() alone. By using test() with regex patterns, you can efficiently filter against multiple blacklisted substrings in a single pass. This approach works reliably for arrays of strings in JSON objects, handling edge cases like empty arrays appropriately when combined with additional conditions as needed.