Efficient Sock Pairing Algorithm: O(n log n) Guide
Discover the optimal sock pairing algorithm using in-place sorting like heapsort for O(n log n) time under log-space constraints. Compare with hashing, radix methods, and practical tips for 30 pairs. Relates to element distinctness.
What is the most efficient algorithm for pairing socks from a pile?
As a computer scientist, I’m looking for an optimal solution to pair socks efficiently. My current approach of picking one sock and searching through the entire pile for its match has poor time complexity (approximately n²/8 operations on average).
I’m considering sorting the socks by size or color to achieve an O(NlogN) solution, but I’m open to other approaches. Hashing or other not-in-place solutions aren’t viable since I can’t duplicate the socks.
Key aspects I’d like addressed:
- A theoretical solution for a large number of socks
- Practical considerations for a smaller collection (around 30 pairs)
- Whether this problem is equivalent to the element distinctness problem
Constraints: I can use up to logarithmic extra space (memory) for the solution.
The fastest deterministic method that fits a logarithmic-space constraint is an in-place comparison sort (e.g., heapsort or introsort) followed by a single linear scan to collect adjacent identical socks — a deterministic solution that will pair socks efficiently. If you can use linear extra space, hashing or counting/radix approaches achieve expected or worst-case time, but they require extra memory (or small key ranges). For about 30 pairs a quick physical bucket-by-color/pattern pass (radix-like) and local pairing is usually the fastest in practice.
Contents
- In-place sock pairing algorithm — O(n log n)
- Why O(n log n) is (essentially) optimal — relation to element distinctness
- Linear-time methods (hashing, counting, radix) — trade-offs
- Practical routine for ~30 pairs (human / engineer)
- Memory-limited tricks and time–space trade-offs (log-space)
- Pseudocode and example cost estimates (naive vs sort vs hash)
- Sources
- Conclusion
In-place sock pairing algorithm — O(n log n)
If you must restrict extra memory to (or ) and you can compare socks under a total order (for example by an encoded color/size/label), the cleanest, provably efficient approach is:
- Sort the pile in-place using a comparison-based in-place sorter (heapsort or introsort).
- Scan the sorted array once and pair adjacent equal items.
Why that works: sorting groups equal socks into contiguous runs; a single linear sweep then outputs floor(run_length/2) pairs per run. The sort step dominates cost, so the time is and extra space is for heapsort or recursion/stack for introsort (used by many std::sort implementations).
Practical notes
- Use heapsort when you need strict extra space and worst-case time.
- Use introsort (quicksort with a heapsort fallback, as in many std::sorts) for better average performance with guaranteed worst-case and stack.
- After sorting, handle runs of identical items by pairing adjacent elements in-place (no extra memory).
Short pseudocode (high level):
procedure PairByInPlaceSort(A[0..n-1]):
heapsort(A) // in-place O(n log n), O(1) extra
i = 0
while i < n:
j = i + 1
while j < n and A[j] == A[i]:
j += 1
count = j - i
output floor(count/2) pairs from A[i..j-1]
i = j
References for the sort→scan pattern and comparison-model costs: see the discussion and examples on Stack Overflow and CS StackExchange about pairing and sorting strategies — e.g., the classic linear-scan-with-storage pattern and the sort-based alternative (Stack Overflow, CS.SE). For algorithmic cost comparisons and sort theory, see the university lecture notes on sorting (McGill lecture notes).
Why O(n log n) is (essentially) optimal — relation to element distinctness
Is pairing socks as hard as element distinctness (detecting duplicates)? Short answer: in the standard comparison model, yes — pairing (grouping identical items) is at least as hard as element distinctness, so you hit the same lower bound.
Intuition/proof sketch
- If you have an algorithm that groups identical items (produces runs of identical keys) using only pairwise comparisons (less-than / equality), you can decide whether any duplicate exists by checking whether any produced run has length > 1. Thus any lower bound for element distinctness transfers to grouping/pairing.
- Element distinctness (and sorting) require comparisons in the comparison decision-tree model; therefore pairing requires comparisons in that model.
Caveat: that lower bound is model-dependent. If you leave the comparison model and allow:
- hashing (randomized expected-time algorithms), or
- bucket/counting/radix methods because keys come from a small universe (integers, small color set),
then you can beat in expected time or worst-case time, at the cost of using extra space (or special key structure). See the CS.SE thread and algorithm notes for these distinctions (CS.SE, Baeldung example).
Linear-time methods (hashing, counting, radix) — trade-offs
If you can use O(n) extra memory (or extra workspace proportional to the number of distinct keys), you can pair in (expected) time:
Hashing (typical, expected )
- One-pass algorithm: keep a hash map from key → index (or presence flag). For each sock you see, check the map: if there’s an unmatched sock with same key, pair them and remove the key; otherwise insert the sock into the map. Expected time is , space where is number of distinct keys.
- This is the approach used in typical coding-challenge solutions (e.g., HackerRank “Sock Merchant”) and works well when you can store references to unmatched socks (HackerRank, GeeksforGeeks).
Counting / Radix (deterministic linear when keys small)
- If sock “keys” (colors/labels) are integers in a small range or can be lexicographically bucketed, counting sort or radix-sort variants run in or time for fixed key-size, but require extra space for counts/buckets. The BBC/popular-science writeups describe the human analog: repeatedly bucket by color, then by pattern, until piles are small (BBC).
Why these aren’t allowed for your constraint
- Both hashing and counting use more than logarithmic extra memory in the worst case (they need space proportional to the number of distinct keys or to n). Since you stated you cannot allocate that extra workspace, these linear-time algorithms are not viable in your memory model.
Practical routine for ~30 pairs (human / engineer)
For a small pile (around 30 pairs → 60 socks), constants and physical movement dominate algorithmic asymptotics. Here’s a compact, fast routine you can do on a table:
- Spread socks out so you can see color/pattern at a glance.
- Make coarse color piles — one pass through the pile, putting each sock on the pile for its main color. (This is a human radix pass.)
- For each color pile, split by pattern/shade if needed until piles are small.
- Pair within each small pile by direct adjacency scanning or quick sorting of that pile if it’s getting big.
- Collect leftover unmatched socks (odd singles) separately.
Why this wins for ~30 pairs
- The bucket-by-color pass is roughly one move per sock (≈n operations). Within piles the average size is small, so local scans cost little. Compared to a full in-place heapsort, the human bucket method often runs faster in wall-clock time for small n because sorting overhead (heap operations, comparisons) has higher constants.
Tip: if you have only a couple of dominant colors, you’ll often finish after two simple sweeps. The BBC and Business Insider describe the same idea as a practical radix/bucket strategy for humans (BBC, Business Insider).
Memory-limited tricks and time–space trade-offs (log-space)
If you truly must keep extra memory to (bits or pointers), the go-to deterministic choices are:
- Heapsort → time, extra space.
- Introsort (std::sort) → worst-case, stack; fast in practice.
There are time–space trade-off results in theoretical literature: with very small workspace you can sometimes do algorithms that use more time but less space (multi-pass partitioning, selection trees). For the pairing/distinctness task, these approaches generally raise time above unless you relax the comparison model or allow randomization+extra memory.
If the number of distinct types m is small relative to n, you can sometimes do better: some community/academic posts show algorithms with expected cost depending on m (e.g., or behavior) — worth considering when m is tiny (CS.SE discussion).
Pseudocode and example cost estimates (naive vs sort vs hash)
Naive repeated-scan algorithm (what you described): pick a sock, scan remaining pile until match, remove pair, repeat. For n = 2k socks (k pairs) the expected number of pairwise comparisons is
So naive expected cost ≈ comparisons. Example: n = 60 → expected ≈ 60^2/8 = 450 comparisons.
Heapsort-based plan: worst-case comparisons are (practical constants vary by implementation). A rough upper estimate for heapsort comparisons is about (implementation dependent), so for n=60 that’s on the order of a few hundred to low thousands (practical outcome depends on constants). For large n the term wins comfortably: n=1000 → naive ≈ 125,000 comparisons vs heapsort ≈ tens of thousands.
Hashing (one-pass expected ):
procedure PairByHash(A[0..n-1]):
ht = empty hash map // key -> index or flag
for i in 0..n-1:
key = Key(A[i])
if key in ht:
output pair (ht[key], i)
remove ht[key]
else:
ht[key] = i
- Time: expected , Space: (distinct keys), not acceptable under space constraint.
References and worked-through algorithms: HackerRank (sock merchant) and GeeksforGeeks provide canonicalHash/count solutions for the keyed version, while Baeldung and StackOverflow compare naive vs sorted approaches (HackerRank, GeeksforGeeks, Baeldung, Stack Overflow).
Sources
- https://stackoverflow.com/questions/14415881/how-can-i-pair-socks-from-a-pile-efficiently
- https://www.baeldung.com/cs/pairing-socks-efficiently
- https://cs.stackexchange.com/questions/16133/sock-matching-algorithm
- https://cs.stackexchange.com/questions/75390/how-to-match-socks
- https://www.bbc.com/news/magazine-37196037
- https://www.businessinsider.com/engineers-write-algorithm-to-pair-socks-2014-4
- https://www.geeksforgeeks.org/dsa/minimum-number-of-socks-required-to-picked-to-have-at-least-k-pairs-of-the-same-color/
- https://www.hackerrank.com/challenges/sock-merchant/problem
- https://www.cs.mcgill.ca/~jpineau/comp102/Lectures/09Sorting.pdf
- http://www.stackprinter.com/export?service=stackoverflow&question=14415881&printer=false&linktohome=true
Conclusion
If you must stick to logarithmic extra space and want a deterministic guarantee, use an in-place comparison sort (heapsort or introsort) followed by a single linear scan — that in-place sock pairing algorithm runs in time and is asymptotically optimal in the comparison model. If extra workspace is allowed, hashing or counting/radix methods drop the time to (expected or worst-case) at the cost of extra memory; and for small piles (≈30 pairs) a quick bucket-by-color/pattern pass is usually the fastest in practice.