How to group by geography in BigQuery while preserving the geography column?

Question

I have the following code: SELECT h3s.h3id, h3s.geog, MIN(ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id))) OVER (PARTITION BY h3s.h3id) FROM rgns_h3_geogs_clipped h3s CROSS JOIN hotspot_centers htsp I would like to group by h3s.h3id while keeping the geography column. When I try to remove the OVER clause and use GROUP BY h3id, geog instead, I get the error: "Grouping by expressions of type GEOGRAPHY is not allowed" How can I perform the grouping operation while retaining the geography column in BigQuery?

Neurogram · Accepted Answer

In BigQuery, you cannot directly group by geography columns because the GEOGRAPHY data type is not supported in GROUP BY clauses. To solve this while preserving the geography information, you can use the H3 identifier for grouping and then retrieve the geography through a join back to the original table using the H3 ID as the key. Solution 1: Group by H3 ID and Join Back The most efficient solution is to group by the H3 ID and then join back to retrieve the geography information: WITH grouped_data AS ( SELECT h3s.h3id, MIN(ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id))) AS min_distance FROM rgns_h3_geogs_clipped h3s CROSS JOIN hotspot_centers htsp GROUP BY h3s.h3id ) SELECT gd.h3id, h3s.geog, gd.min_distance FROM grouped_data gd JOIN rgns_h3_geogs_clipped h3s ON gd.h3id = h3s.h3id This approach: 1. Groups by the H3 ID (which is a string/integer) 2. Calculates the minimum distance for each H3 cell 3. Joins back to the original table to retrieve the geography column Solution 2: Use Window Functions If you need to preserve all rows while still performing the grouping operation, window functions can be a good alternative: SELECT h3s.h3id, h3s.geog, MIN(ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id))) OVER (PARTITION BY h3s.h3id) AS min_distance FROM rgns_h3_geogs_clipped h3s CROSS JOIN hotspot_centers htsp QUALIFY ROW_NUMBER() OVER (PARTITION BY h3s.h3id ORDER BY h3s.geog) = 1 This approach: - Uses the window function as in your original query - Uses QUALIFY to return only one row per H3 ID - Preserves the geography column by filtering after window operations Solution 3: Convert Geography to String If you need to group by geographic characteristics, you can convert the geography to a string representation: SELECT h3s.h3id, h3s.geog, ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id)) AS distance FROM rgns_h3_geogs_clipped h3s CROSS JOIN hotspot_centers htsp GROUP BY h3s.h3id, h3s.geog, ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id)) However, this approach has limitations: - It may not work for all geography operations - Performance might be impacted - The distance calculation would need to be repeated for each group Performance Considerations When working with H3 and geographic data in BigQuery, consider these performance aspects: 1. Indexing: H3 IDs can serve as natural indexes for spatial operations 2. Join Strategy: Solution 1 (join back) typically performs better than repeated distance calculations 3. Data Volume: For large datasets, consider pre-aggregating or using materialized views 4. Cross Join Impact: The CROSS JOIN in your query can generate significant data - ensure you're filtering appropriately Here's a performance-optimized version: WITH hotspots AS ( SELECT h3id, ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id)) AS distance FROM rgns_h3_geogs_clipped h3s CROSS JOIN hotspot_centers htsp ) SELECT h3s.h3id, h3s.geog, MIN(h.distance) AS min_distance FROM hotspots h JOIN rgns_h3_geogs_clipped h3s ON h.h3id = h3s.h3id GROUP BY h3s.h3id, h3s.geog Best Practices for H3 Geographic Analysis When working with H3 and geography data in BigQuery: 1. Use H3 IDs for grouping: Since H3 IDs are string/integer representations of geography, they're ideal for grouping operations 2. Minimize geography operations in GROUP BY: Convert geography to H3 IDs whenever possible 3. Leverage window functions: For preserving rows while performing aggregations 4. Consider pre-aggregation: For repeated analysis, create pre-aggregated views 5. Use appropriate H3 resolutions: Higher resolutions (smaller cells) provide more precision but increase data volume By following these approaches, you can effectively group by geography in BigQuery while preserving the geography column information for your analysis.

Complete Guide: Group by Geography in BigQuery

Brief Answer

Contents

Why BigQuery Doesn’t Allow GROUP BY on GEOGRAPHY

Solution 1: Group by H3 ID and Join Back

Solution 2: Use Window Functions

Solution 3: Convert Geography to String

Performance Considerations

Best Practices for H3 Geographic Analysis