NeuroAgent

UNION vs UNION ALL: Key Differences Explained

Learn the crucial differences between UNION and UNION ALL in SQL. Discover how UNION removes duplicates while UNION ALL preserves all rows, impacting performance and result size. Complete guide for SQL developers.

#database-queries #sql-best-practices #sql-duplicate-handling #sql-operators #sql-optimization #sql-performance #union-vs-union-all

10/31/2025, 04:21 PM

Question

What is the difference between UNION and UNION ALL in SQL?

NeuroAgent

The main difference between UNION and UNION ALL in SQL is that UNION removes duplicate rows from the result set while UNION ALL returns all rows including duplicates, which significantly impacts both performance and result set size. UNION performs an implicit DISTINCT operation making it slower but ensuring unique results, while UNION ALL is more efficient as it skips the deduplication step entirely.

Core Differences Explained
Performance Comparison
Result Set Size and NULL Handling
Syntax and Usage Rules
Practical Examples
When to Use Each Operator
Best Practices and Considerations

Core Differences Explained

The fundamental distinction between UNION and UNION ALL lies in how they handle duplicate records in the combined result set.

UNION performs an implicit DISTINCT operation on the final result set, automatically removing any duplicate rows that appear across all SELECT statements. This means that if identical rows are found in both queries, only one instance will appear in the final output.

UNION ALL, on the other hand, simply concatenates all the rows from both SELECT statements without any duplicate removal, preserving every row exactly as it appears in the source queries.

According to DataCamp, “Where UNION selects only distinct records, UNION ALL selects all of them, affecting performance and result set size.” This difference in duplicate handling is the primary distinction that influences all other aspects of these operators.

Performance Comparison

Performance differences between UNION and UNION ALL are significant, particularly with large datasets.

UNION Performance:

Requires additional processing to identify and remove duplicate rows
Typically involves sorting operations and hash-based deduplication
As noted by DevArt, “UNION gives better performance in query execution as it does not waste resources on removing duplicate rows”
SQLShack analysis shows “SQL Union contains a Sort operator having cost 53.7% in overall batch operators”
The deduplication step can be computationally expensive, especially with large datasets

UNION ALL Performance:

No duplicate removal processing required
Simply concatenates result sets directly
Generally 2-5x faster than UNION depending on dataset size
StrataScratch confirms that “UNION ALL gives better performance in query execution as it does not waste resources on removing duplicate rows”
More efficient resource utilization as it avoids the overhead of duplicate detection

The performance gap becomes more pronounced with larger datasets. As LearnSQL.com explains, “If you know that all of the records returned by UNION are going to be unique, use UNION ALL; it will be faster. This is especially relevant for larger datasets.”

Result Set Size and NULL Handling

The handling of duplicates and NULL values differs significantly between these operators.

Result Set Size:

UNION: Produces smaller datasets due to duplicate removal. The final result contains only unique rows.
UNION ALL: Produces larger datasets as it preserves all rows, including duplicates.

NULL Value Handling:

UNION: Treats NULL values as duplicates when evaluating row uniqueness. According to Zentut, “The SQL UNION operator treats all NULL values as a single NULL value when evaluating duplicate.” If all column values (including NULLs) match between rows, UNION will consider them duplicates and remove one.
UNION ALL: Includes NULL values without special treatment, preserving all NULL occurrences.

The GeeksforGeeks confirms that “UNION ALL will include NULL values in the result set,” while StrataScratch notes that “NULL handling: The UNION operator treats NULLs as duplicates, meaning it removes them if all column values (including NULL) match.”

Syntax and Usage Rules

Both operators follow specific syntax requirements and usage rules.

Basic Syntax:

sql

SELECT column1, column2 FROM table1
UNION | UNION ALL
SELECT column1, column2 FROM table2;

Key Requirements:

Both SELECT statements must have the same number of columns
Corresponding columns must have compatible data types
Column names in the result set come from the first SELECT statement

Important Usage Rules:

ORDER BY: Can only be applied to the final result set, not within individual SELECT statements
GROUP BY and HAVING: Can only be used within individual SELECT statements, not on the final combined result
Aggregate Functions: Work within individual SELECT statements but not across the UNION operation itself

According to MSSQLTips, “ORDER BY and COMPUTE clauses can only be issued for the overall result set and not within each individual result set GROUP BY and HAVING clauses can only be issued for each individual result set and not for the overall result set.”

Practical Examples

Let’s explore concrete examples demonstrating the differences between these operators.

Example 1: Employee Teams

sql

-- Using UNION (removes duplicates)
SELECT employee_id, employee_name FROM sales_team
UNION
SELECT employee_id, employee_name FROM support_team;

-- Using UNION ALL (keeps duplicates)
SELECT employee_id, employee_name FROM sales_team
UNION ALL
SELECT employee_id, employee_name FROM support_team;

Example 2: Wine Sales Analysis
From StrataScratch:

sql

WITH CTE AS(
    SELECT region_1 AS region, variety, price FROM winemag_p1
    UNION ALL
    SELECT region_2 AS region, variety, price FROM winemag_p1
)
SELECT region, variety, SUM(price) AS price_sum 
FROM CTE 
WHERE region IS NOT NULL AND price IS NOT NULL 
GROUP BY region, variety 
ORDER BY price_sum DESC;

Example 3: NULL Handling

sql

-- UNION treats NULLs as potential duplicates
SELECT id, name FROM table1
UNION
SELECT id, name FROM table2;

-- UNION ALL preserves all NULLs
SELECT id, name FROM table1
UNION ALL
SELECT id, name FROM table2;

When to Use Each Operator

Choosing between UNION and UNION ALL depends on your specific requirements.

Use UNION When:

You need distinct results and duplicates should be eliminated
Data integrity requires unique records in the final output
You’re working with smaller datasets where performance impact is minimal
The business logic explicitly requires unique combinations of values

Use UNION ALL When:

Performance is critical and you’re working with large datasets
You need to preserve all records including duplicates
You know the data won’t contain duplicates between the result sets
You need to aggregate or process all rows individually

As SQLPad.io advises, “Prefer UNION ALL for faster query execution when duplicate rows in the result set are acceptable or desired.”

Best Practices and Considerations

Performance Optimization:

Always use UNION ALL when duplicates are acceptable or when you know the data is unique
Consider filtering data before using UNION to reduce the dataset size
Use appropriate indexing on columns involved in UNION operations

Data Quality:

Be aware that UNION might remove duplicates you actually need
Consider NULL handling implications when working with incomplete data
Test both operators with your actual data to understand the impact

Code Readability:

Use parentheses to clearly define the scope of each SELECT statement in complex queries
Add comments explaining why you chose UNION vs UNION ALL
Document any assumptions about data uniqueness

Common Pitfalls:

Forgetting that UNION requires identical column counts and compatible data types
Attempting to use ORDER BY within individual SELECT statements
Not considering the performance impact of UNION on large datasets

Conclusion

Understanding the differences between UNION and UNION ALL is crucial for writing efficient SQL queries. The key takeaways are:

UNION removes duplicates while UNION ALL preserves all rows, making UNION slower but ensuring unique results.
Performance difference is significant - UNION ALL is typically 2-5x faster than UNION, especially with large datasets, because it avoids the overhead of duplicate detection.
NULL handling differs - UNION treats NULLs as duplicates when evaluating uniqueness, while UNION ALL preserves all NULL values.
Use UNION when you need distinct results and can tolerate the performance cost.
Use UNION ALL when performance is critical, duplicates are acceptable, or you know the data is unique between result sets.

Always consider your specific requirements and dataset characteristics when choosing between these operators, and test both approaches with your actual data to make informed decisions about performance and correctness.

Sources

How does UNION ALL improve query performance compared to UNION?When should I use UNION instead of UNION ALL in my SQL queries?What are the syntax requirements for using UNION and UNION ALL in SQL?How do NULL values affect UNION and UNION ALL operations differently?Can I use ORDER BY with UNION and UNION ALL in SQL?What are the common performance pitfalls when using UNION in large datasets?

Ask NeuroAgent