What is the difference between UNION and UNION ALL in SQL?
The main difference between UNION and UNION ALL in SQL is that UNION removes duplicate rows from the result set while UNION ALL returns all rows including duplicates, which significantly impacts both performance and result set size. UNION performs an implicit DISTINCT operation making it slower but ensuring unique results, while UNION ALL is more efficient as it skips the deduplication step entirely.
Contents
- Core Differences Explained
- Performance Comparison
- Result Set Size and NULL Handling
- Syntax and Usage Rules
- Practical Examples
- When to Use Each Operator
- Best Practices and Considerations
Core Differences Explained
The fundamental distinction between UNION and UNION ALL lies in how they handle duplicate records in the combined result set.
UNION performs an implicit DISTINCT operation on the final result set, automatically removing any duplicate rows that appear across all SELECT statements. This means that if identical rows are found in both queries, only one instance will appear in the final output.
UNION ALL, on the other hand, simply concatenates all the rows from both SELECT statements without any duplicate removal, preserving every row exactly as it appears in the source queries.
According to DataCamp, “Where UNION selects only distinct records, UNION ALL selects all of them, affecting performance and result set size.” This difference in duplicate handling is the primary distinction that influences all other aspects of these operators.
Performance Comparison
Performance differences between UNION and UNION ALL are significant, particularly with large datasets.
UNION Performance:
- Requires additional processing to identify and remove duplicate rows
- Typically involves sorting operations and hash-based deduplication
- As noted by DevArt, “UNION gives better performance in query execution as it does not waste resources on removing duplicate rows”
- SQLShack analysis shows “SQL Union contains a Sort operator having cost 53.7% in overall batch operators”
- The deduplication step can be computationally expensive, especially with large datasets
UNION ALL Performance:
- No duplicate removal processing required
- Simply concatenates result sets directly
- Generally 2-5x faster than UNION depending on dataset size
- StrataScratch confirms that “UNION ALL gives better performance in query execution as it does not waste resources on removing duplicate rows”
- More efficient resource utilization as it avoids the overhead of duplicate detection
The performance gap becomes more pronounced with larger datasets. As LearnSQL.com explains, “If you know that all of the records returned by UNION are going to be unique, use UNION ALL; it will be faster. This is especially relevant for larger datasets.”
Result Set Size and NULL Handling
The handling of duplicates and NULL values differs significantly between these operators.
Result Set Size:
- UNION: Produces smaller datasets due to duplicate removal. The final result contains only unique rows.
- UNION ALL: Produces larger datasets as it preserves all rows, including duplicates.
NULL Value Handling:
-
UNION: Treats NULL values as duplicates when evaluating row uniqueness. According to Zentut, “The SQL UNION operator treats all NULL values as a single NULL value when evaluating duplicate.” If all column values (including NULLs) match between rows, UNION will consider them duplicates and remove one.
-
UNION ALL: Includes NULL values without special treatment, preserving all NULL occurrences.
The GeeksforGeeks confirms that “UNION ALL will include NULL values in the result set,” while StrataScratch notes that “NULL handling: The UNION operator treats NULLs as duplicates, meaning it removes them if all column values (including NULL) match.”
Syntax and Usage Rules
Both operators follow specific syntax requirements and usage rules.
Basic Syntax:
SELECT column1, column2 FROM table1
UNION | UNION ALL
SELECT column1, column2 FROM table2;
Key Requirements:
- Both SELECT statements must have the same number of columns
- Corresponding columns must have compatible data types
- Column names in the result set come from the first SELECT statement
Important Usage Rules:
- ORDER BY: Can only be applied to the final result set, not within individual SELECT statements
- GROUP BY and HAVING: Can only be used within individual SELECT statements, not on the final combined result
- Aggregate Functions: Work within individual SELECT statements but not across the UNION operation itself
According to MSSQLTips, “ORDER BY and COMPUTE clauses can only be issued for the overall result set and not within each individual result set GROUP BY and HAVING clauses can only be issued for each individual result set and not for the overall result set.”
Practical Examples
Let’s explore concrete examples demonstrating the differences between these operators.
Example 1: Employee Teams
-- Using UNION (removes duplicates)
SELECT employee_id, employee_name FROM sales_team
UNION
SELECT employee_id, employee_name FROM support_team;
-- Using UNION ALL (keeps duplicates)
SELECT employee_id, employee_name FROM sales_team
UNION ALL
SELECT employee_id, employee_name FROM support_team;
Example 2: Wine Sales Analysis
From StrataScratch:
WITH CTE AS(
SELECT region_1 AS region, variety, price FROM winemag_p1
UNION ALL
SELECT region_2 AS region, variety, price FROM winemag_p1
)
SELECT region, variety, SUM(price) AS price_sum
FROM CTE
WHERE region IS NOT NULL AND price IS NOT NULL
GROUP BY region, variety
ORDER BY price_sum DESC;
Example 3: NULL Handling
-- UNION treats NULLs as potential duplicates
SELECT id, name FROM table1
UNION
SELECT id, name FROM table2;
-- UNION ALL preserves all NULLs
SELECT id, name FROM table1
UNION ALL
SELECT id, name FROM table2;
When to Use Each Operator
Choosing between UNION and UNION ALL depends on your specific requirements.
Use UNION When:
- You need distinct results and duplicates should be eliminated
- Data integrity requires unique records in the final output
- You’re working with smaller datasets where performance impact is minimal
- The business logic explicitly requires unique combinations of values
Use UNION ALL When:
- Performance is critical and you’re working with large datasets
- You need to preserve all records including duplicates
- You know the data won’t contain duplicates between the result sets
- You need to aggregate or process all rows individually
As SQLPad.io advises, “Prefer UNION ALL for faster query execution when duplicate rows in the result set are acceptable or desired.”
Best Practices and Considerations
Performance Optimization:
- Always use UNION ALL when duplicates are acceptable or when you know the data is unique
- Consider filtering data before using UNION to reduce the dataset size
- Use appropriate indexing on columns involved in UNION operations
Data Quality:
- Be aware that UNION might remove duplicates you actually need
- Consider NULL handling implications when working with incomplete data
- Test both operators with your actual data to understand the impact
Code Readability:
- Use parentheses to clearly define the scope of each SELECT statement in complex queries
- Add comments explaining why you chose UNION vs UNION ALL
- Document any assumptions about data uniqueness
Common Pitfalls:
- Forgetting that UNION requires identical column counts and compatible data types
- Attempting to use ORDER BY within individual SELECT statements
- Not considering the performance impact of UNION on large datasets
Conclusion
Understanding the differences between UNION and UNION ALL is crucial for writing efficient SQL queries. The key takeaways are:
-
UNION removes duplicates while UNION ALL preserves all rows, making UNION slower but ensuring unique results.
-
Performance difference is significant - UNION ALL is typically 2-5x faster than UNION, especially with large datasets, because it avoids the overhead of duplicate detection.
-
NULL handling differs - UNION treats NULLs as duplicates when evaluating uniqueness, while UNION ALL preserves all NULL values.
-
Use UNION when you need distinct results and can tolerate the performance cost.
-
Use UNION ALL when performance is critical, duplicates are acceptable, or you know the data is unique between result sets.
Always consider your specific requirements and dataset characteristics when choosing between these operators, and test both approaches with your actual data to make informed decisions about performance and correctness.
Sources
- UNION vs UNION ALL in SQL | DataCamp
- What is the difference between UNION and UNION ALL? - Stack Overflow
- UNION vs. UNION ALL in SQL: What’s the Difference? | LearnSQL.com
- SQL UNION vs UNION ALL - Syntax, Differences & Examples | DevArt
- UNION vs UNION ALL in SQL | Atlassian
- SQL UNION vs UNION ALL: Differences You Need to Know - StrataScratch
- SQL Union vs Union All in SQL Server | SQLShack
- Union vs Union All in SQL: Key Differences Explained - SQLPad.io
- Performance of UNION versus UNION ALL in SQL Server - Stack Overflow
- UNION vs. UNION ALL in SQL Server | MSSQLTips