Why is checking if a large number is in a Python 3 range object so fast?
In Python 3, the range() function is an object type that generates its contents on the fly, similar to a generator. However, the following operation appears to be instantaneous regardless of the size of the numbers involved:
1_000_000_000_000_000 in range(1_000_000_000_000_001)
This behavior is unexpected because one might assume that checking if a quadrillion is in a range would require generating a quadrillion values. The same performance is observed even with extremely large ranges:
# Count by tens
1_000_000_000_000_000_000_000 in range(0, 1_000_000_000_000_000_000_001, 10)
In contrast, implementing a custom range function demonstrates much slower performance:
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
What implementation details make Python’s range() object so efficient for membership testing, even with extremely large ranges?
Python’s range() object achieves remarkable efficiency for membership testing through mathematical calculations rather than iteration. Unlike generators that must produce each value sequentially, range objects store only three parameters (start, stop, step) and use arithmetic to determine membership in constant time, making operations like 1_000_000_000_000_000 in range(1_000_000_000_000_001) instantaneous regardless of range size.
Contents
- How Range Objects Store Data
- The Mathematical Membership Algorithm
- Performance Comparison with Generators
- Memory Efficiency Benefits
- Practical Implications and Use Cases
How Range Objects Store Data
Unlike Python 2’s xrange which was a separate type, Python 3’s range is a full-fledged object type that implements the Sequence Abstract Base Class. However, it doesn’t store its elements - it stores only the mathematical parameters that define the sequence:
r = range(10, 100, 5)
print(r.start) # 10
print(r.stop) # 100
print(r.step) # 5
This storage approach means a range object representing numbers from 0 to 1 quintillion consumes only 24 bytes (3 integers) regardless of how many values it theoretically represents. In contrast, a list storing the same values would require astronomical amounts of memory.
The range object’s internal representation is essentially a mathematical formula: generate values starting from start, incrementing by step, stopping before stop. This mathematical representation is the key to its efficiency.
The Mathematical Membership Algorithm
When you use the in operator with a range object, Python doesn’t iterate through values. Instead, it performs a series of mathematical checks in constant time O(1):
def is_in_range(x, start, stop, step):
if step > 0:
return x >= start and x < stop and (x - start) % step == 0
else: # negative step
return x <= start and x > stop and (x - start) % step == 0
For the example 1_000_000_000_000_000 in range(1_000_000_000_000_001), Python performs:
- Check if
1_000_000_000_000_000 >= 1✓ - Check if
1_000_000_000_000_000 < 1_000_000_000_000_001✓ - Check if
(1_000_000_000_000_000 - 1) % 1 == 0✓
All three checks complete in nanoseconds, regardless of the magnitude of the numbers involved.
For the stepped range example 1_000_000_000_000_000_000_000 in range(0, 1_000_000_000_000_000_000_001, 10), the algorithm:
- Checks bounds:
0 ≤ 1_000_000_000_000_000_000_000 < 1_000_000_000_000_000_000_001✓ - Checks divisibility:
(1_000_000_000_000_000_000_000 - 0) % 10 == 0✓
Again, this is completed instantly because it’s just a few arithmetic operations.
Performance Comparison with Generators
The user’s custom my_crappy_range implementation demonstrates why standard generators are so much slower for membership testing:
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
# To check if 1_000_000_000_000_000 is in this range,
# Python would need to iterate through 1 trillion values!
Key differences:
| Aspect | Range Object | Generator |
|---|---|---|
| Storage | 3 integers (24 bytes) | No storage, generates values on demand |
| Membership Test | O(1) mathematical check | O(n) iteration |
| Memory Usage | Constant | Constant per value |
| Time for Large Ranges | Nanoseconds | Linear with range size |
The generator must produce every value from 0 up to the target (1 quadrillion in this case), which would take an impractical amount of time. Even with optimizations, there’s no way around the fundamental O(n) complexity for generator-based membership testing.
Memory Efficiency Benefits
Range objects provide several memory advantages:
- Constant Memory Usage: A range representing 0 to 2^64 uses the same memory as one representing 0 to 10
- No List Overhead: Unlike lists, ranges don’t store Python objects for each value
- Cache Friendly: The small size means the entire range object fits easily in CPU cache
This efficiency enables operations that would be impossible with other sequence types:
# This would crash with a list due to memory constraints
huge_range = range(0, 10**20, 2)
print(len(huge_range)) # 50000000000000000000 - works instantly
The len() function on ranges is also optimized mathematically: len(range(start, stop, step)) = max(0, (stop - start + step - 1) // step).
Practical Implications and Use Cases
The efficiency of range objects enables several practical patterns:
1. Index Validation
def safe_index(sequence, index):
if index not in range(len(sequence)):
raise IndexError("Index out of range")
return sequence[index]
This check is O(1) regardless of sequence length.
2. Mathematical Operations
# Efficient mathematical operations without iteration
sum(range(1, 1000001)) # Uses formula n(n+1)/2 internally
3. Loop Optimization
# Instead of: for i in list(range(N)):
# Use: for i in range(N)
# Especially important for large N
4. Memory-Efficient Testing
# Check if a number is in a large arithmetic sequence
def is_fibonacci(n):
# Check if n is a Fibonacci number using mathematical properties
# Can be done without generating the entire sequence
pass
Sources
- Python Documentation - The range type
- Python Wiki - How range() works
- Real Python - Python’s range() Function
- Efficient Python - Range vs List Performance
- CPython Implementation - rangeobject.c
Conclusion
The remarkable speed of range membership testing stems from mathematical optimization rather than iteration. Key takeaways:
- Three-parameter storage: Range objects only store start, stop, and step values
- O(1) membership testing: Uses arithmetic checks instead of iteration
- Memory efficiency: Constant memory usage regardless of range size
- Generator contrast: Custom generators require O(n) time for membership tests
For maximum efficiency when working with large sequences, prefer range() over lists or generators when you need to test membership or perform mathematical operations. This optimization makes Python capable of handling astronomically large numerical ranges that would be impossible to store in memory or process with iterative approaches.