I want reliable, efficient ways to detect duplicate elements in a Python list. My current toggle-on-prepopulated-set trick is incorrect for certain multiplicities; what are better approaches and their time/space complexities?

Recommended approaches: 1) Single-pass set (best general-purpose for hashable items) seen = set() dups = set() for x in nums: if x in seen: dups.add(x) else: seen.add(x) - Time: O(n), Space: O(k) where k = unique items. Returns unique duplicate values. 2) collections.Counter (when you need frequencies) from collections import Counter cnt = Counter(nums) duplicates = [x for x, c in cnt.items() if c > 1] - Time: O(n), Space: O(k). Very readable and implemented in C. 3) Sorting + scan (if you can sort in-place to save auxiliary memory) nums.sort() # scan adjacent equal items to find duplicates - Time: O(n log n), Space: O(1) extra if in-place. 4) NumPy / pandas (for large numeric/tabular data) vals, counts = np.unique(arr, return_counts=True) dups = vals[counts > 1] # or pandas.Series.duplicated for DataFrame/Series - Often highly optimized; behavior may involve sorting under the hood. Edge cases and notes: - To get indices, build a defaultdict(list) mapping value -> list of positions. - For unhashable items (lists, dicts), convert to a stable hashable key (tuple, sorted items, or json.dumps with sort_keys=True) before counting. - For streaming or huge data, consider external sorting, database-backed deduplication, or probabilistic structures (Bloom filters) for approximate detection. - Avoid list.count() inside loops (O(n^2)) and avoid the pre-filled toggle set trick — it toggles membership and fails for odd multiplicities. Complexity summary: - Single-pass set / Counter: O(n) time, O(k) space - Sorting: O(n log n) time, O(1) extra space (if in-place) - NumPy/pandas: optimized C implementations, often fastest for large numeric/tabular datasets

Programming

Find Duplicates in Python Lists: Set, Counter, Sort

Efficient methods to find duplicates in Python lists using sets, collections.Counter or sorting. Handles unhashable items and covers time and space complexity.

1 answer• 1 view

01/09/2026, 04:51 PM

How can I efficiently find elements that appear more than once in a Python list?

Using Python, my current approach is to convert the list nums to a set to get uniqueness, then iterate over nums removing and adding elements to detect duplicates:

python

nums = [4,3,2,7,8,2,3,1]
z = set(nums)
for i in nums:
 if i in z: z.remove(i)
 else:
 z.add(i)

print(nums, z)

This works, but I want a more efficient and optimized solution. What are better approaches (for example a single-pass set-based method, collections.Counter, or other patterns), and what are their time and space complexities?

To find duplicates in a Python list (find duplicates python), prefer either a single-pass set-based approach or using python collections counter for counts and clarity. The single-pass seen/duplicates pattern is O(n) time and O(n) space; collections.Counter gives you counts in one pass too and is ideal when you need frequencies. Your current toggle-on-a-prepopulated-set trick looks clever but can miss duplicates for values that occur an odd number of times — the corrected patterns below are simpler and reliable.

Why your current code is fragile
Single-pass set method (find duplicates python)
collections.Counter — counts and frequencies (python collections counter)
Sorting, NumPy and pandas alternatives
Indices, order and unhashable items — edge cases
Performance summary and recommendations
Sources
Conclusion

Why your current code is fragile

Your snippet toggles membership in a set that you pre-filled with all unique elements. That makes the final set contain values with even multiplicity, not “appears more than once.” Example:

python

nums = [1, 1, 1] # 1 appears three times (should be a duplicate)
z = set(nums) # {1}
for i in nums:
 if i in z:
 z.remove(i)
 else:
 z.add(i)
print(z) # set() — wrong: 1 is not reported, despite appearing >1

Why? You start with membership = True, then toggle membership once per occurrence. After k occurrences the element is present iff k is even. So counts of 2, 4, 6 will appear; counts 1, 3, 5 won’t. That makes the approach incorrect for general duplicate detection. Use one of the patterns below instead.

Single-pass set method (find duplicates python)

This is the simplest and most memory/time efficient general-purpose method for hashable items.

Code (unique duplicates, order not preserved):

python

nums = [4, 3, 2, 7, 8, 2, 3, 1]
seen = set()
dups = set()
for x in nums:
 if x in seen:
 dups.add(x)
 else:
 seen.add(x)

print(list(dups)) # e.g. [2, 3]

If you want duplicates in the order they first become duplicates:

python

seen = set()
dups_set = set()
dups_in_order = []
for x in nums:
 if x in seen:
 if x not in dups_set:
 dups_in_order.append(x)
 dups_set.add(x)
 else:
 seen.add(x)

Complexity

Time: O(n) (one pass; set ops are amortized O(1))
Space: O(k) where k is number of unique items (<= n)

When to use

Use when elements are hashable (ints, strings, tuples).
Use when you only need to know which items repeat (not counts).
This is the usual go-to for “find duplicates in a list” (see community patterns on Stack Overflow for the same idea: https://stackoverflow.com/questions/9835762/find-duplicates-in-a-list).

collections.Counter — counts and frequencies (python collections counter)

If you need counts (how many times each duplicate appears) or a concise implementation, Counter is ideal.

Code:

python

from collections import Counter

nums = [4, 3, 2, 7, 8, 2, 3, 1]
cnt = Counter(nums)
duplicates = [x for x, c in cnt.items() if c > 1]
duplicates_with_counts = {x: c for x, c in cnt.items() if c > 1}

print(duplicates) # [2, 3]
print(duplicates_with_counts) # {2: 2, 3: 2}

Notes

Counter counts in a single pass and is implemented efficiently in C; it’s very readable and handy when you want frequency info. Official docs: https://docs.python.org/3/library/collections.html#collections.Counter
The result gives unique duplicate values (not every repeated occurrence).

Complexity

Time: O(n)
Space: O(k) for counts of unique items

Also see a practical guide and comparisons at GeeksforGeeks: https://www.geeksforgeeks.org/python/how-to-find-duplicates-in-a-list-python/

Sorting, NumPy and pandas alternatives

Sorting-based scan

Sort the list and scan adjacent items to find runs. This is useful when you can modify the list and want to avoid auxiliary sets.

Code:

python

nums_sorted = sorted(nums)
dups = []
for i in range(1, len(nums_sorted)):
 if nums_sorted[i] == nums_sorted[i-1]:
 if not dups or dups[-1] != nums_sorted[i]:
 dups.append(nums_sorted[i])

Complexity

Time: O(n log n) (sorting dominates)
Space: O(1) extra if you sort in-place with nums.sort(), otherwise O(n)

NumPy (numeric arrays)

For large numeric arrays, NumPy can be faster because operations are in C:

python

import numpy as np
vals, counts = np.unique(np.array(nums), return_counts=True)
dups = vals[counts > 1].tolist()

Note: np.unique often sorts under the hood, so it behaves like O(n log n) but is highly optimized for big numeric data. For small-range non-negative integers, np.bincount + np.nonzero is O(n + max_val).

pandas (tabular / large datasets)

If you already use pandas:

python

import pandas as pd
s = pd.Series(nums)
dups = s[s.duplicated(keep=False)].unique().tolist()

pandas has vectorized, C-backed routines that can be faster for very large datasets, but they carry library overhead.

For quick removal or order-preserving unique extraction see Dataquest notes on removing duplicates: https://www.dataquest.io/blog/how-to-remove-duplicates-from-a-python-list/

Indices, order and unhashable items — edge cases

Want indices (positions) of duplicate occurrences?

python

from collections import defaultdict
pos = defaultdict(list)
for i, x in enumerate(nums):
 pos[x].append(i)

duplicates = {x: idxs for x, idxs in pos.items() if len(idxs) > 1}
# duplicates maps value -> list of indices

This is O(n) time and O(n) space.

Unhashable items (lists, dicts)

Sets/Counter require hashable elements. If your list contains lists/dicts, convert them to a hashable representation when safe:
Convert lists to tuples: key = tuple(x)
Convert dict to a tuple of sorted items: key = tuple(sorted(d.items()))
Or use a stable serialization (json.dumps(…, sort_keys=True)) as a fallback (slower).
Beware: serialization and conversions have costs and potential collisions if not done carefully.

Streaming / very-large data

If data is streaming or can’t fit in memory, exact duplicate detection requires external storage or multi-pass algorithms (external sort, database-backed dedup). For approximate detection with bounded memory you can use a Bloom filter (but that yields false positives).

For practical tips on indexing and locating duplicates see a compact guide: https://towardsdatascience.com/pythonic-tips-tricks-checking-and-indexing-duplicates-d62fdfea9eec/

Performance summary and recommendations

Quick cheat-sheet

Single-pass seen/dups (hash sets)
Time: O(n)
Space: O(k)
Use: general-purpose, fastest for mixed types that are hashable
collections.Counter
Time: O(n)
Space: O(k)
Use: when you need counts/frequencies; very readable
Sorting + scan
Time: O(n log n)
Space: O(1) extra if in-place
Use: when you can sort in-place and want to save auxiliary memory
NumPy / pandas
Time: often O(n log n) for unique (but fast in C)
Space: O(n)
Use: large numerical/tabular data where vectorized ops pay off
For unhashable items
Convert to stable hashable keys or use position-mapping strategies

Recommendation

For most use-cases: use the single-pass seen + dups method (simple and O(n)).
If you also need counts or a compact implementation: use collections.Counter.
If memory is constrained but sorting is allowed: sort and scan.

Community and reference examples that expand on these patterns: GeeksforGeeks (multiple examples) https://www.geeksforgeeks.org/python/python-program-print-duplicates-list-integers/ and the Counter documentation https://docs.python.org/3/library/collections.html#collections.Counter.

Sources

Conclusion

To reliably detect python duplicates (find duplicates python), use a single-pass seen/duplicates set pattern for speed and simplicity, or collections.Counter when you also need frequency counts. Avoid the pre-filled-toggle trick — it toggles membership and misses odd-count duplicates. For large numeric/tabular datasets consider NumPy/pandas or sorting when memory or in-place constraints make sense.

Authors

NeuroAnswers

Author

Verified by moderation

NeuroAnswers

Moderation

Find Duplicates in Python Lists: Set, Counter, Sort

Contents

Why your current code is fragile

Single-pass set method (find duplicates python)

collections.Counter — counts and frequencies (python collections counter)

Sorting, NumPy and pandas alternatives

Indices, order and unhashable items — edge cases

Performance summary and recommendations

Sources

Conclusion