How do I split a list into equally-sized chunks?
How do I split a list of arbitrary length into equal sized chunks?
See also:
- How to iterate over a list in chunks
- Split string every nth character? (for chunking strings)
To split a list into equally-sized chunks in Python, you can use several methods including list slicing, itertools.batched() (Python 3.12+), numpy.array_split(), or generator functions. The most straightforward approach uses list slicing with a range() method to iterate through the list and create chunks of your specified size.
Contents
- Basic List Slicing Method
- Using itertools.batched() (Python 3.12+)
- NumPy Approach
- Generator Functions
- List Comprehension Method
- Performance Comparison
- Handling Edge Cases
- Conclusion
Basic List Slicing Method
The most common approach uses list slicing with a range() method to create chunks of equal size. This method is simple, readable, and works in all Python versions.
def split_list_simple(my_list, chunk_size):
"""Split a list into chunks of specified size"""
for i in range(0, len(my_list), chunk_size):
yield my_list[i:i + chunk_size]
# Example usage
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk_size = 3
chunks = list(split_list_simple(my_list, chunk_size))
print(chunks) # [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
This function iterates over the original list and creates a new chunk on each iteration by slicing the list from the current index i to i + chunk_size source.
As Python Engineer demonstrates, this approach creates batches of data from an iterable into lists of equal length n.
Using itertools.batched() (Python 3.12+)
For Python 3.12 and later, the itertools.batched() function provides the most elegant and efficient solution:
from itertools import batched
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk_size = 3
chunks = list(batched(my_list, chunk_size))
print(chunks) # [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
As Real Python explains, itertools.batched() should be your preferred tool for splitting a Python list into fixed-size chunks. It’s shipped with the standard library starting in Python 3.12, making it well-tested, documented, portable, and efficient thanks to the native C implementation.
The function works with both finite and infinite iterables by evaluating their elements lazily and makes your code look more readable and concise.
NumPy Approach
When working with data science applications, NumPy’s array_split() function provides an alternative that automatically handles unequal chunk sizes:
import numpy as np
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
chunk_size = 3
# Calculate number of chunks needed
num_chunks = len(my_list) // chunk_size + (len(my_list) % chunk_size != 0)
res = np.array_split(my_list, num_chunks)
# Convert numpy arrays back to Python lists
chunks = [list(arr) for arr in res]
print(chunks) # [[1, 2, 3], [4, 5, 6], [7, 8]]
This method is particularly useful when the list size is not perfectly divisible by n source.
As Spark By Examples notes, numpy.array_split() is another easy and efficient way to split a list into evenly sized chunks. It splits an array (or list) into multiple sub-arrays of equal or near-equal size.
Generator Functions
Generator functions provide a memory-efficient way to handle large lists by yielding chunks one at a time:
from itertools import islice
def chunked_list(lst, chunk_size):
"""Generator function to yield chunks of specified size"""
it = iter(lst)
while True:
chunk = list(islice(it, chunk_size))
if not chunk:
break
yield chunk
# Example usage
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk_size = 3
for chunk in chunked_list(my_list, chunk_size):
print(chunk)
This approach uses itertools.islice to efficiently create chunks without loading the entire list into memory at once source.
List Comprehension Method
For those who prefer a more compact approach, list comprehension can be used:
def chunkify(lst, chunk_size):
"""Split list into chunks using list comprehension"""
return [lst[i:i+chunk_size] for i in range(0, len(lst), chunk_size)]
# Example usage
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk_size = 3
chunks = chunkify(my_list, chunk_size)
print(chunks) # [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
This one-liner approach is concise and effective for most use cases source.
Performance Comparison
Different methods have different performance characteristics:
| Method | Memory Usage | Speed | Readability | Python Version |
|---|---|---|---|---|
itertools.batched() |
Low | Fast | Excellent | 3.12+ |
| List comprehension | Medium | Fast | Good | All |
| Generator function | Low | Medium | Good | All |
| NumPy array_split | Medium | Fast | Good | With NumPy |
As Real Python emphasizes, performance varies depending on the size of your list and whether you need lazy evaluation.
Handling Edge Cases
When splitting lists, consider these edge cases:
- Empty list - Should return an empty list
- Chunk size larger than list - Should return the original list as a single chunk
- Chunk size of zero - Should raise an exception
- Negative chunk size - Should raise an exception
Here’s a robust implementation:
def safe_split_list(my_list, chunk_size):
"""Safely split list with error handling"""
if chunk_size <= 0:
raise ValueError("Chunk size must be positive")
if not my_list:
return []
return list(split_list_simple(my_list, chunk_size))
For lists that don’t divide evenly, the methods above will create the last chunk with fewer elements. If you need exactly equal-sized chunks with padding, you’ll need additional logic to handle this case.
Conclusion
Splitting a list into equally-sized chunks in Python can be accomplished through several methods:
- For Python 3.12+: Use
itertools.batched()for the most elegant and efficient solution - For all Python versions: Use list slicing with range() for simplicity and readability
- For data science applications: Consider NumPy’s
array_split()for handling large datasets - For memory efficiency: Use generator functions to handle very large lists
Choose the method that best fits your Python version, performance requirements, and coding style preferences. The list slicing approach remains the most universally applicable and readable solution for most use cases.