How to Optimize a Slow PostgreSQL Query Without Using UNION
I have a large partitioned PostgreSQL table (approximately 150 GB) with the following schema:
CREATE TABLE table_partition
(
id int4,
"key" text,
value_type int4,
value jsonb,
device_time timestamptz,
ts timestamptz
) PARTITION BY RANGE (ts);
CREATE INDEX table_partition_ts_id_index ON table_partition (id, ts);
CREATE INDEX table_partition_ts_id_index_2 ON table_partition (ts DESC, id);
CREATE INDEX table_partition_key_idx ON table_partition (key);
CREATE INDEX table_partition_key_ts_idx ON table_partition (key, ts);
CREATE INDEX table_partition_ts_key_idx ON table_partition (ts, key);
The table is partitioned by date, with partitions ranging from 160K to 66 GB in size.
The Slow Query
My original query is:
SELECT csh.key, csh.ts
FROM table_partition202508 csh
WHERE csh.key IN ('string1', 'string2')
ORDER BY csh.ts
LIMIT 10;
This query has an execution plan showing:
- Execution time: 292,582.395 ms
- High I/O: 1,104,471 buffer reads
- Uses Index Scan Backward on
table_partition202508_ts_key_idx
The Fast Query (with UNION)
When I add a “magic” UNION with a non-existent key value, the query becomes dramatically faster:
SELECT csh.key, csh.ts
FROM table_partition202508 csh
WHERE csh.key IN ('string1', 'string2')
UNION ALL
SELECT csh.key, csh.ts
FROM table_partition202508 csh
WHERE csh.key = 'rubbish' -- there is no such a key value in the dB!
ORDER BY csh.ts
LIMIT 10;
This query has an execution plan showing:
- Execution time: 82.582 ms
- Much lower I/O: 12,582 buffer reads
- Uses Parallel Bitmap Heap Scan with Parallel Append
Question
Is there a way to convince PostgreSQL to use the faster execution plan (with the bitmap scan and parallel processing) without resorting to the “magic” UNION trick? What PostgreSQL configuration settings, query hints, or index strategies could help optimize the original query to achieve similar performance?
The “magic” UNION trick works because it enables PostgreSQL’s parallel query execution by forcing the query planner to use a Parallel Append operation, which allows multiple worker processes to scan the table simultaneously. This transforms a single-threaded index scan into a parallel bitmap heap scan, dramatically reducing execution time from 292 seconds to just 82 milliseconds.
Contents
- Understanding the Performance Difference
- Why the UNION Trick Enables Parallel Processing
- Alternative Optimization Strategies
- PostgreSQL Configuration Settings
- Index Optimization Strategies
- Query Rewriting Techniques
- Partitioning Considerations
Understanding the Performance Difference
The dramatic performance difference between your two queries stems from how PostgreSQL’s query planner chooses execution strategies based on the query structure. Your original query uses a simple Index Scan Backward on the ts_key_idx, which processes the data sequentially in a single process.
According to PostgreSQL’s parallel plans documentation, the planner might choose a Parallel Append of regular Index Scan plans where “each individual index scan would have to be executed to completion by a single process, but different scans could be performed at the same time by different processes.”
The key insight is that your “fast” query with UNION ALL creates a scenario where PostgreSQL can:
- Use multiple worker processes to scan the table
- Employ bitmap operations for more efficient data retrieval
- Leverage shared memory structures for parallel processing
As stated in the documentation, “whenever PostgreSQL needs to combine rows from multiple sources into a single result set, it uses an Append or MergeAppend plan node. This commonly happens when implementing UNION ALL or when scanning a partitioned table.”
Why the UNION Trick Enables Parallel Processing
The UNION trick works by forcing the query planner into a parallel execution path. When you add even a non-existent condition like WHERE csh.key = 'rubbish', PostgreSQL sees this as requiring multiple separate scans that can be executed in parallel.
From the research findings, we learn that “In a parallel bitmap heap scan, one process is chosen as the leader” and that “one process is chosen as the leader” in parallel bitmap operations. This suggests that the UNION trick creates multiple scan paths that can be distributed across worker processes.
The performance improvement you’re seeing (3,500x faster) is consistent with the research showing that bitmap index scans can be “3x faster by using 3 workers whereas overall plan got ~40% faster” in some cases, though your improvement is even more dramatic.
The critical limitation is that “[there is] no such thing as a Parallel Bitmap Index Scan” as noted in one source. However, the heap scan phase can be parallelized, which is what the UNION trick enables.
Alternative Optimization Strategies
1. Explicit Query Hints and Planner Controls
PostgreSQL provides several ways to influence the query planner without resorting to UNION tricks:
-- Force parallel execution
SET max_parallel_workers_per_gather = 4;
SET max_parallel_workers = 8;
-- Use specific join methods
SET enable_seqscan = off;
SET enable_indexscan = on;
SET enable_bitmapscan = on;
2. Materialized Views
For frequently executed queries, consider creating a materialized view:
CREATE MATERIALIZED VIEW fast_key_lookup AS
SELECT key, ts
FROM table_partition202508
WHERE key IN ('string1', 'string2');
CREATE INDEX mv_fast_key_lookup_idx ON fast_key_lookup (ts);
3. Query Rewriting for Parallel Processing
Instead of UNION, try rewriting your query to create multiple conditions that can be parallelized:
-- Using OR conditions that can be optimized separately
SELECT csh.key, csh.ts
FROM table_partition202508 csh
WHERE (csh.key = 'string1' OR csh.key = 'string2')
ORDER BY csh.ts
LIMIT 10;
4. Using LATERAL Joins
SELECT csh.key, csh.ts
FROM (VALUES ('string1'), ('string2')) AS keys(key)
LEFT JOIN LATERAL (
SELECT key, ts
FROM table_partition202508
WHERE key = keys.key
ORDER BY ts
LIMIT 10
) csh ON true
ORDER BY csh.ts;
PostgreSQL Configuration Settings
Several configuration parameters can influence whether PostgreSQL chooses parallel execution plans:
Parallel Query Settings
-- Number of worker processes per gather operation
SET max_parallel_workers_per_gather = 4;
-- Total number of worker processes
SET max_parallel_workers = 8;
-- Minimum table size for parallel scans
SET min_parallel_table_scan_size = '8MB';
-- Minimum index size for parallel scans
SET min_parallel_index_scan_size = '512kB';
Cost Parameters
-- Cost of transferring tuples between processes
SET parallel_tuple_cost = 0.1; -- Default is 0.1
-- Cost of starting parallel workers
SET parallel_setup_cost = 1000.0;
-- Random page cost (affects bitmap vs index scan decisions)
SET random_page_cost = 1.1; -- Default is 1.1 for SSD, 4.0 for HDD
Work Memory Settings
-- Work memory for sort operations
SET work_mem = '100MB';
-- Memory for maintenance operations
SET maintenance_work_mem = '256MB';
As noted in the research, “bumping random_page_cost to 2 produced the following explain” which can significantly affect whether bitmap scans are chosen over index scans.
Index Optimization Strategies
1. Composite Index Optimization
Your current indexes are well-designed, but consider creating additional specialized indexes:
-- For your specific query pattern
CREATE INDEX partitioned_key_ts_idx ON table_partition202508 (key, ts)
WHERE key IN ('string1', 'string2');
-- Partial index for faster lookups
CREATE INDEX fast_string1_idx ON table_partition202508 (ts)
WHERE key = 'string1';
CREATE INDEX fast_string2_idx ON table_partition202508 (ts)
WHERE key = 'string2';
2. Index Reorganization
As shown in the research findings, “SET maintenance_work_mem TO ‘1GB’; CLUSTER foo USING val_index;” can dramatically improve bitmap scan performance:
-- Reorganize table using your key index
ALTER TABLE table_partition202508 CLUSTER USING table_partition_key_ts_idx;
3. Index Scan Optimization
-- Consider creating a covering index
CREATE INDEX covering_key_ts_idx ON table_partition202508 (key, ts)
INCLUDE (id); -- if you need additional columns
Query Rewriting Techniques
1. Using WITH Clauses (CTEs)
WITH key_data AS (
SELECT key, ts
FROM table_partition202508
WHERE key IN ('string1', 'string2')
)
SELECT key, ts
FROM key_data
ORDER BY ts
LIMIT 10;
2. Window Function Approach
SELECT key, ts
FROM (
SELECT key, ts,
ROW_NUMBER() OVER (PARTITION BY key ORDER BY ts) as rn
FROM table_partition202508
WHERE key IN ('string1', 'string2')
) ranked
WHERE rn <= 10
ORDER BY ts;
3. Using EXPLAIN ANALYZE to Test Different Approaches
EXPLAIN (ANALYZE, BUFFERS)
SELECT csh.key, csh.ts
FROM table_partition202508 csh
WHERE csh.key IN ('string1', 'string2')
ORDER BY csh.ts
LIMIT 10;
Partitioning Considerations
1. Partition Pruning Optimization
Your table is already partitioned by date, but ensure partition pruning is working effectively:
-- Check which partitions are being used
EXPLAIN (ANALYZE, BUFFERS)
SELECT csh.key, csh.ts
FROM table_partition csh
WHERE csh.key IN ('string1', 'string2')
AND csh.ts BETWEEN '2025-08-01' AND '2025-08-31'
ORDER BY csh.ts
LIMIT 10;
2. Partition-Wide Indexing
Consider creating indexes on each partition rather than global indexes:
-- Create partition-specific indexes
CREATE INDEX table_partition202508_key_ts_idx
ON table_partition202508 (key, ts);
-- Consider bitmap indexes for equality conditions
CREATE INDEX table_partition202508_key_bitmap_idx
ON table_partition202508 USING hash (key);
3. Parallel Partition Scanning
For large tables, ensure parallel processing can work across partitions:
-- Enable parallel query for partitioned tables
SET max_parallel_workers_per_gather = 4;
SET max_parallel_workers = 8;
SET max_parallel_maintenance_workers = 4;
Conclusion
The “magic” UNION trick works by forcing PostgreSQL into a parallel execution plan that combines multiple scan operations. However, several alternatives can achieve similar performance without this workaround:
-
Optimize PostgreSQL configuration by adjusting parallel query parameters, work memory settings, and cost factors to encourage the planner to choose bitmap scans and parallel execution.
-
Reorganize your indexes using the CLUSTER command, which can dramatically improve bitmap scan performance as shown in the research findings.
-
Consider partial and specialized indexes that can be more efficiently scanned for your specific query patterns.
-
Rewrite your queries using CTEs, lateral joins, or other constructs that can enable parallel processing without the UNION trick.
-
Leverage your partitioning scheme more effectively by ensuring partition pruning works and creating appropriate partition-specific indexes.
The key takeaway is that PostgreSQL’s query planner needs the right conditions to choose parallel execution plans. By understanding these conditions and configuring your database appropriately, you can achieve the performance benefits of parallel processing without resorting to query tricks.
Sources
- PostgreSQL: Documentation: 18: 15.3. Parallel Plans
- PostgreSQL: Documentation: 11: 15.3. Parallel Plans
- PostgreSQL: Documentation: 15: 15.3. Parallel Plans
- PostgreSQL: Documentation: 12: 15.3. Parallel Plans
- PostgreSQL: Documentation: 13: 15.3. Parallel Plans
- PostgreSQL: The World’s Most Advanced Open Source Relational Database
- PostgreSQL: Re: Parallel bitmap index scan
- PostgreSQL Bitmap Heap Scan on index is very slow but Index Only Scan is fast - Stack Overflow
- In PostgreSQL, how can I make Bitmap Index Scan parallelized? - Database Administrators Stack Exchange
- PostgreSQL: Documentation: 10: 15.3. Parallel Plans
- PostgreSQL: Documentation: 8.1: Performance Tips
- Parallel Query v2 | EDB
- Query performance of index scans slower than parallel seq scan on Postgres - Database Administrators Stack Exchange
- PostgreSQL: Documentation: 18: 19.7. Query Planning
- PostgreSQL: Optimize Bitmap Heap Scan - Database Administrators Stack Exchange