Programming

How to Filter JSON Objects with jq to Exclude Substrings

Learn to use jq to filter JSON objects and exclude those containing specific substrings in attributes like 'fruits'. Fix common issues with any() and select() for effective JSON processing and data filtering.

1 answer 1 view

How do I filter JSON objects to exclude those containing specific substrings using jq? I need to filter objects in a list based on whether their ‘fruits’ attribute contains any of the blacklisted substrings, but my current approach with any() isn’t working correctly.

To filter JSON objects using jq and exclude those containing specific substrings in their ‘fruits’ attribute, you need to combine select(), any(), and string matching functions. The common mistake with any() is not properly handling the comparison to false for exclusion filtering, which requires understanding how jq’s boolean evaluation works.

Contents

Understanding the Problem with jq Filtering

When working with JSON objects in jq, filtering based on string content requires understanding how jq handles string matching and boolean operations. The challenge arises because many developers try to use any() directly without properly negating the result for exclusion filtering.

The official jq documentation explains that any() alone returns true if any element in an array matches the condition. However, when you want to exclude objects that contain blacklisted substrings, you need to invert this logic by checking if any equals false.

Let’s consider a typical JSON structure where you have an array of objects, each containing a ‘fruits’ attribute with an array of fruit names:

json
[
  {"name": "Alice", "fruits": ["apple", "banana", "cherry"]},
  {"name": "Bob", "fruits": ["orange", "grape", "lemon"]},
  {"name": "Charlie", "fruits": ["pear", "blackberry", "kiwi"]}
]

Your goal might be to exclude any object whose ‘fruits’ array contains “banana” or “lemon”.

Correct Approach Using select() and any()

The proper approach involves several jq functions working together:

  1. select() - Filters objects based on a condition
  2. any() - Checks if any element in an array matches
  3. String matching functions - Contains, test, etc.

Here’s the fundamental pattern:

bash
jq '.[] | select(.fruits | map(contains("substring")) | any == false)'

Breaking this down:

  • .[] iterates through each object in the array
  • select() keeps only objects that meet the condition
  • .fruits accesses the fruits array
  • map(contains("substring")) creates an array of booleans
  • any == false checks that no elements contain the substring

Blacklisted Substring Filtering Technique

For filtering against multiple blacklisted substrings, you can use the test() function with a regex pattern. This is more efficient than checking each substring individually.

According to the JSON manipulation tutorial, you can create a regex pattern that combines all blacklisted substrings using the pipe | operator as an OR condition:

bash
jq '.[] | select(.fruits | map(test("blacklist1|blacklist2|blacklist3")) | any == false)'

This approach creates a regex that matches any of the blacklisted substrings. The test() function returns true for each string in the fruits array that matches the pattern, and any == false ensures only objects with no matches are kept.

For example, to exclude objects containing “banana”, “lemon”, or “grape”:

bash
jq '.[] | select(.fruits | map(test("banana|lemon|grape")) | any == false)'

Practical Examples

Let’s work through a complete example with sample data:

json
[
  {"id": 1, "fruits": ["apple", "banana", "cherry"]},
  {"id": 2, "fruits": ["orange", "grape", "lemon"]},
  {"id": 3, "fruits": ["pear", "blackberry", "kiwi"]},
  {"id": 4, "fruits": ["strawberry", "raspberry", "blueberry"]},
  {"id": 5, "fruits": []}
]

To exclude objects containing “berry” or “banana”:

bash
cat data.json | jq '.[] | select(.fruits | map(test("berry|banana")) | any == false)'

This will return objects with IDs 1, 2, and 5, excluding objects 3 and 4 which contain “berry”.

The jq community discussion shows that you can also use map_values when working with object keys rather than arrays. For example:

bash
echo '{"foo":[10],"bar":[20],"baz":[10,20],"qux":[30],"quux":[]}' | jq -c 'map_values(select(contains(10,20 | [.])))'

Common Pitfalls and Solutions

1. Incorrect boolean evaluation

Problem: Using any without comparing to false for exclusion

bash
# Incorrect - this will exclude objects with NO matches
jq '.[] | select(.fruits | any(contains("banana")))'

Solution: Always compare to false for exclusion filtering

bash
# Correct - excludes objects WITH matches
jq '.[] | select(.fruits | any(contains("banana")) | not)'

2. String vs. filter confusion

Problem: Using contains() as a filter instead of a string predicate

bash
# Incorrect syntax
jq '.[] | select(.fruits | contains("banana"))'

Solution: Use contains() properly with map() and any()

bash
# Correct syntax
jq '.[] | select(.fruits | map(contains("banana")) | any == false)'

3. Empty array handling

Problem: Objects with empty arrays don’t match properly

bash
# An empty fruits array will always match (no blacklisted substrings)
# but you might want to handle this case differently

Solution: Add additional conditions if needed

bash
# Exclude empty arrays AND blacklisted substrings
jq '.[] | select(.fruits | length > 0 and (map(test("blacklist")) | any == false))'

Advanced Filtering Patterns

For more complex filtering scenarios, you can combine multiple conditions:

1. Case-insensitive matching

bash
jq '.[] | select(.fruits | map(ascii_downcase | contains("banana")) | any == false)'

2. Multiple attributes filtering

bash
jq '.[] | select((.fruits | map(test("blacklist")) | any == false) and (.name | test("allowed_pattern")))'

3. Using variables for blacklisted terms

bash
BLACKLIST="banana|lemon|grape"
jq --arg blacklist "$BLACKLIST" '.[] | select(.fruits | map(test($blacklist)) | any == false)'

4. Filtering with custom functions

bash
def exclude_blacklisted($blacklist):
  .fruits | map(test($blacklist)) | any == false;

jq '.[] | select(exclude_blacklisted("banana|lemon"))'

Sources

Conclusion

Filtering JSON objects with jq to exclude those containing specific substrings requires understanding the proper combination of select(), any(), and string matching functions. The key insight is that for exclusion filtering, you need to check if any == false rather than just using any() alone. By using test() with regex patterns, you can efficiently filter against multiple blacklisted substrings in a single pass. This approach works reliably for arrays of strings in JSON objects, handling edge cases like empty arrays appropriately when combined with additional conditions as needed.

Authors
Verified by moderation
Moderation
How to Filter JSON Objects with jq to Exclude Substrings