Databases

Find SQL Duplicates by Email & Name: GROUP BY Guide

Learn to find and remove SQL duplicates by multiple fields like email and name using GROUP BY, HAVING, window functions. Examples for detecting, retrieving full rows, and safe deletion in MySQL, PostgreSQL, SQL Server.

1 answer 5 views

How to Find Duplicate Records in SQL Based on Multiple Fields (Email and Name)

I need to find duplicate records in a SQL table where both the email and name fields are identical. I know how to find duplicates based on a single field:

sql
SELECT email, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

For example, with this table:

ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com

The query above returns John, Sam, Tom, Tom because they share the same email.

However, I need to find duplicates where both email AND name are the same. In this case, I want to identify the duplicate ‘Tom’ records.

I made an error in my database design and allowed duplicate name and email combinations. Now I need to find these duplicates so I can remove or modify them.

Use GROUP BY on both email and name to find SQL duplicates (group by sql):
SELECT email, name, COUNT() AS dup_cnt FROM users GROUP BY email, name HAVING COUNT() > 1 returns the (email,name) pairs that repeat. To list full duplicate rows or remove extras, join that grouped result back to the table or use window functions like ROW_NUMBER() OVER (PARTITION BY email, name) — these patterns work in MySQL 8+, PostgreSQL and SQL Server.


Contents


Find SQL duplicates by email and name

If you want duplicate rows where both email and name are identical, group by both columns:

sql
SELECT name, email, COUNT(*) AS dup_cnt
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1;

Applied to your sample table:

ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com

The query above returns:

NAME EMAIL dup_cnt
Tom asd@asd.com 2

So only the Tom rows are returned — John and Sam share the same email but different names, so they are not a duplicate pair.

Why this works: GROUP BY aggregates rows by the exact combination (name,email) and HAVING COUNT(*) > 1 filters combinations that appear more than once. This is the classic pattern described in many how‑tos on duplicates (see the GROUP BY / HAVING pattern on GeeksforGeeks). https://www.geeksforgeeks.org/sql/how-to-find-duplicates-values-across-multiple-columns-in-sql/


Retrieve full duplicate rows (JOIN / EXISTS)

GROUP BY gives the offending combinations, but not the full rows. To list every row that matches those duplicate (email,name) pairs, join back to the original table:

sql
SELECT u.*
FROM users u
JOIN (
 SELECT email, name
 FROM users
 GROUP BY email, name
 HAVING COUNT(*) > 1
) dup
 ON u.email = dup.email
 AND u.name = dup.name;

This returns both Tom rows (IDs 3 and 5) in your sample.

Alternative (EXISTS) — sometimes clearer and index-friendly:

sql
SELECT u.*
FROM users u
WHERE EXISTS (
 SELECT 1
 FROM users u2
 WHERE u2.email = u.email
 AND u2.name = u.name
 AND u2.id <> u.id
);

Use whichever reads best for you. For very large tables you’ll want proper indexes on (email, name) so the join/exists walks fewer rows.

Need quick reference for single-/multi-column patterns? See common community answers on Stack Overflow (example discussions show GROUP BY, JOIN and window-based options). https://stackoverflow.com/questions/8149210/how-do-i-find-duplicates-across-multiple-columns


Window functions for sql duplicates (COUNT() OVER / ROW_NUMBER())

Window functions give more flexibility: mark duplicates in-place and choose which row to keep.

List every row that belongs to a duplicate group:

sql
SELECT *
FROM (
 SELECT u.*,
 COUNT(*) OVER (PARTITION BY email, name) AS dup_cnt
 FROM users u
) t
WHERE dup_cnt > 1;

Mark duplicates and keep only the “first” row per group (by id), then select extras:

sql
SELECT *
FROM (
 SELECT u.*,
 ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
 FROM users u
) t
WHERE rn > 1; -- rows to consider removing

ROW_NUMBER is handy because you can define the ORDER BY to keep the oldest (ORDER BY id), newest (ORDER BY created_at DESC), or the row with the most complete data. Window-function approaches are supported in MySQL 8+, PostgreSQL and SQL Server; if you’re on older MySQL, use the GROUP BY / JOIN approach instead. LearnSQL has a practical walkthrough of these options if you want step‑by‑step examples. https://learnsql.com/blog/how-to-find-duplicate-values-in-sql/


Delete duplicates safely — DB-specific patterns

Want to remove duplicates but keep one canonical row per (email,name)? Test with SELECT first, backup, then run DELETE inside a transaction if possible. Below are common, tested patterns.

PostgreSQL (CTE + ROW_NUMBER):

sql
WITH duplicates AS (
 SELECT id,
 ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
 FROM users
)
DELETE FROM users
WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);

SQL Server (CTE delete directly):

sql
WITH cte AS (
 SELECT id,
 ROW_NUMBER() OVER (PARTITION BY email, name ORDER BY id) AS rn
 FROM users
)
DELETE FROM cte WHERE rn > 1;

MySQL (pre-8.0): delete via self-join — keeps the row with the smallest id:

sql
DELETE u1
FROM users u1
JOIN users u2
 ON u1.email = u2.email
 AND u1.name = u2.name
 AND u1.id > u2.id;

MySQL (all versions) — delete keeping MIN(id) using a wrapped subquery to avoid “you can’t specify target table” error:

sql
DELETE FROM users
WHERE id NOT IN (
 SELECT id FROM (
 SELECT MIN(id) AS id
 FROM users
 GROUP BY email, name
 ) x
);

A couple of safety tips:

  • Always run the SELECT version first (the same JOIN/CTE but with SELECT) and verify the rows to be deleted.
  • Work in a transaction and keep a backup.
  • For very large tables, delete in batches (LIMIT-based loop or id ranges) to avoid long locks. Atlassian has a short checklist on duplicate handling you may find useful. https://www.atlassian.com/data/sql/how-to-find-duplicate-values-in-a-sql-table

Normalization, constraints and prevention

After cleaning duplicates, prevent them from coming back:

  • Add a unique constraint on the combination:
    ALTER TABLE users ADD CONSTRAINT uniq_email_name UNIQUE (email, name);

  • Normalize values for comparisons (case/whitespace): many duplicates are caused by “Tom” vs “tom” or trailing spaces. Either:

  • enforce normalization on insert/update (store LOWER(TRIM(email))), or

  • create a functional index on the normalized expression (database support varies).

  • Use upsert/merge patterns on insert: MySQL’s ON DUPLICATE KEY UPDATE or PostgreSQL’s ON CONFLICT DO NOTHING/DO UPDATE avoid inserting repeats if a unique index exists.

Remember: you must remove existing duplicates before adding a UNIQUE constraint, otherwise the ALTER will fail.


Performance, large-table strategies and common pitfalls

Indexing:

  • Create a composite index on (email, name) to speed GROUP BY, JOINs and PARTITION BY scans:
    CREATE INDEX idx_users_email_name ON users(email, name);

Large-table strategies:

  • Batch deletes: delete 1,000–10,000 rows at a time in a loop to reduce lock contention. MySQL supports LIMIT on DELETE; other DBs may require ranged deletes by id.
  • Table rebuild: for huge datasets, create a new deduped table, swap it in, then rebuild indexes — faster and less locking for some workloads.
  • Use EXPLAIN to check that the query uses the index you expect.

Common pitfalls:

  • Collation / case sensitivity: your DB collation might make ‘Tom’ and ‘tom’ equal or different. Normalize explicitly if you want case-insensitive deduping.
  • Nulls: UNIQUE(email, name) treats NULL differently across DBs; check your DB’s behavior if columns are nullable.
  • Choosing which row to keep: prefer a deterministic rule (MIN(id), latest updated_at, non-null fields count, etc.).

If you want community perspectives and alternative hacks (joins, DISTINCT tricks), see the classic Stack Overflow threads showing many variations on these patterns. https://stackoverflow.com/questions/4522431/find-duplicate-entries-in-a-column


Sources


Conclusion

Bottom line: run GROUP BY on both email and name to find which pairs repeat (group by sql), then inspect full rows with a JOIN or EXISTS and remove extras with a window-function or join-based DELETE. Test first, back up, then add a UNIQUE(email, name) constraint (or normalize on write) to stop future sql duplicates.

Authors
Verified by moderation
Moderation
Find SQL Duplicates by Email & Name: GROUP BY Guide