Find SQL Duplicates by Email & Name: GROUP BY Guide
Learn to find and remove SQL duplicates by multiple fields like email and name using GROUP BY, HAVING, window functions. Examples for detecting, retrieving full rows, and safe deletion in MySQL, PostgreSQL, SQL Server.
How to Find Duplicate Records in SQL Based on Multiple Fields (Email and Name)
I need to find duplicate records in a SQL table where both the email and name fields are identical. I know how to find duplicates based on a single field:
SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
For example, with this table:
ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com
The query above returns John, Sam, Tom, Tom because they share the same email.
However, I need to find duplicates where both email AND name are the same. In this case, I want to identify the duplicate ‘Tom’ records.
I made an error in my database design and allowed duplicate name and email combinations. Now I need to find these duplicates so I can remove or modify them.
Use GROUP BY on both email and name to find SQL duplicates (group by sql):
SELECT email, name, COUNT() AS dup_cnt FROM users GROUP BY email, name HAVING COUNT() > 1 returns the (email,name) pairs that repeat. To list full duplicate rows or remove extras, join that grouped result back to the table or use window functions like ROW_NUMBER() OVER (PARTITION BY email, name) — these patterns work in MySQL 8+, PostgreSQL and SQL Server.
Contents
- Find SQL duplicates by email and name
- Retrieve full duplicate rows (JOIN / EXISTS)
- Window functions for sql duplicates (COUNT()/ROW_NUMBER())
- Delete duplicates safely — DB-specific patterns
- Normalization, constraints and prevention
- Performance, large-table strategies and common pitfalls
- Sources
- Conclusion
Find SQL duplicates by email and name
If you want duplicate rows where both email and name are identical, group by both columns:
SELECT name, email, COUNT(*) AS dup_cnt
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1;
Applied to your sample table:
| ID | NAME | |
|---|---|---|
| 1 | John | asd@asd.com |
| 2 | Sam | asd@asd.com |
| 3 | Tom | asd@asd.com |
| 4 | Bob | bob@asd.com |
| 5 | Tom | asd@asd.com |
The query above returns:
| NAME | dup_cnt | |
|---|---|---|
| Tom | asd@asd.com | 2 |
So only the Tom rows are returned — John and Sam share the same email but different names, so they are not a duplicate pair.
Why this works: GROUP BY aggregates rows by the exact combination (name,email) and HAVING COUNT(*) > 1 filters combinations that appear more than once. This is the classic pattern described in many how‑tos on duplicates (see the GROUP BY / HAVING pattern on GeeksforGeeks). https://www.geeksforgeeks.org/sql/how-to-find-duplicates-values-across-multiple-columns-in-sql/
Retrieve full duplicate rows (JOIN / EXISTS)
GROUP BY gives the offending combinations, but not the full rows. To list every row that matches those duplicate (email,name) pairs, join back to the original table:
SELECT u.*
FROM users u
JOIN (
SELECT email, name
FROM users
GROUP BY email, name
HAVING COUNT(*) > 1
) dup
ON u.email = dup.email
AND u.name = dup.name;
This returns both Tom rows (IDs 3 and 5) in your sample.
Alternative (EXISTS) — sometimes clearer and index-friendly:
SELECT u.*
FROM users u
WHERE EXISTS (
SELECT 1
FROM users u2
WHERE u2.email = u.email
AND u2.name = u.name
AND u2.id <> u.id
);
Use whichever reads best for you. For very large tables you’ll want proper indexes on (email, name) so the join/exists walks fewer rows.
Need quick reference for single-/multi-column patterns? See common community answers on Stack Overflow (example discussions show GROUP BY, JOIN and window-based options). https://stackoverflow.com/questions/8149210/how-do-i-find-duplicates-across-multiple-columns
Window functions for sql duplicates (COUNT() OVER / ROW_NUMBER())
Window functions give more flexibility: mark duplicates in-place and choose which row to keep.
List every row that belongs to a duplicate group:
SELECT *
FROM (
SELECT u.*,
COUNT(*) OVER (PARTITION BY email, name) AS dup_cnt
FROM users u
) t
WHERE dup_cnt > 1;
Mark duplicates and keep only the “first” row per group (by id), then select extras:
SELECT *
FROM (
SELECT u.*,
ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
FROM users u
) t
WHERE rn > 1; -- rows to consider removing
ROW_NUMBER is handy because you can define the ORDER BY to keep the oldest (ORDER BY id), newest (ORDER BY created_at DESC), or the row with the most complete data. Window-function approaches are supported in MySQL 8+, PostgreSQL and SQL Server; if you’re on older MySQL, use the GROUP BY / JOIN approach instead. LearnSQL has a practical walkthrough of these options if you want step‑by‑step examples. https://learnsql.com/blog/how-to-find-duplicate-values-in-sql/
Delete duplicates safely — DB-specific patterns
Want to remove duplicates but keep one canonical row per (email,name)? Test with SELECT first, backup, then run DELETE inside a transaction if possible. Below are common, tested patterns.
PostgreSQL (CTE + ROW_NUMBER):
WITH duplicates AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);
SQL Server (CTE delete directly):
WITH cte AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
FROM users
)
DELETE FROM cte WHERE rn > 1;
MySQL (pre-8.0): delete via self-join — keeps the row with the smallest id:
DELETE u1
FROM users u1
JOIN users u2
ON u1.email = u2.email
AND u1.name = u2.name
AND u1.id > u2.id;
MySQL (all versions) — delete keeping MIN(id) using a wrapped subquery to avoid “you can’t specify target table” error:
DELETE FROM users
WHERE id NOT IN (
SELECT id FROM (
SELECT MIN(id) AS id
FROM users
GROUP BY email, name
) x
);
A couple of safety tips:
- Always run the SELECT version first (the same JOIN/CTE but with SELECT) and verify the rows to be deleted.
- Work in a transaction and keep a backup.
- For very large tables, delete in batches (LIMIT-based loop or id ranges) to avoid long locks. Atlassian has a short checklist on duplicate handling you may find useful. https://www.atlassian.com/data/sql/how-to-find-duplicate-values-in-a-sql-table
Normalization, constraints and prevention
After cleaning duplicates, prevent them from coming back:
-
Add a unique constraint on the combination:
ALTER TABLE users ADD CONSTRAINT uniq_email_name UNIQUE (email, name); -
Normalize values for comparisons (case/whitespace): many duplicates are caused by “Tom” vs “tom” or trailing spaces. Either:
-
enforce normalization on insert/update (store LOWER(TRIM(email))), or
-
create a functional index on the normalized expression (database support varies).
-
Use upsert/merge patterns on insert: MySQL’s ON DUPLICATE KEY UPDATE or PostgreSQL’s ON CONFLICT DO NOTHING/DO UPDATE avoid inserting repeats if a unique index exists.
Remember: you must remove existing duplicates before adding a UNIQUE constraint, otherwise the ALTER will fail.
Performance, large-table strategies and common pitfalls
Indexing:
- Create a composite index on (email, name) to speed GROUP BY, JOINs and PARTITION BY scans:
CREATE INDEX idx_users_email_name ON users(email, name);
Large-table strategies:
- Batch deletes: delete 1,000–10,000 rows at a time in a loop to reduce lock contention. MySQL supports LIMIT on DELETE; other DBs may require ranged deletes by id.
- Table rebuild: for huge datasets, create a new deduped table, swap it in, then rebuild indexes — faster and less locking for some workloads.
- Use EXPLAIN to check that the query uses the index you expect.
Common pitfalls:
- Collation / case sensitivity: your DB collation might make ‘Tom’ and ‘tom’ equal or different. Normalize explicitly if you want case-insensitive deduping.
- Nulls: UNIQUE(email, name) treats NULL differently across DBs; check your DB’s behavior if columns are nullable.
- Choosing which row to keep: prefer a deterministic rule (MIN(id), latest updated_at, non-null fields count, etc.).
If you want community perspectives and alternative hacks (joins, DISTINCT tricks), see the classic Stack Overflow threads showing many variations on these patterns. https://stackoverflow.com/questions/4522431/find-duplicate-entries-in-a-column
Sources
- How to Find Duplicates Values Across Multiple Columns in SQL — GeeksforGeeks
- How do I find duplicates across multiple columns? — Stack Overflow
- How to Find Duplicate Values in a SQL Table — Atlassian
- How to Find Duplicate Values in SQL — LearnSQL.com
- Find duplicate entries in a column — Stack Overflow
- Yandex Wordstat (keyword research)
Conclusion
Bottom line: run GROUP BY on both email and name to find which pairs repeat (group by sql), then inspect full rows with a JOIN or EXISTS and remove extras with a window-function or join-based DELETE. Test first, back up, then add a UNIQUE(email, name) constraint (or normalize on write) to stop future sql duplicates.