How can I iterate over rows in a Pandas DataFrame and access column values by name?
I have a pandas DataFrame df:
c1 c2
0 10 100
1 11 110
2 12 120
How do I iterate over the rows of this DataFrame? For every row, I want to access its elements (values in cells) by the column name. For example:
for row in df.rows:
print(row['c1'], row['c2'])
I found a similar question that suggests using either of these approaches:
for date, row in df.T.iteritems():
or
for row in df.iterrows():
But I don’t understand what the row object is and how I can work with it.
To iterate over rows in a Pandas DataFrame and access column values by name, you can use either iterrows() or itertuples() methods. With iterrows(), each iteration returns an (index, Series) pair where the Series contains the row values with column names as index, allowing you to access values using row['column_name']. For better performance, itertuples() returns namedtuples where you can access values using dot notation like row.column_name or tuple indexing.
Contents
- Using iterrows()
- Using itertuples()
- Performance Comparison
- Best Practices and Alternatives
- Complete Examples
Using iterrows()
The iterrows() method is the most straightforward approach for beginners. It returns an iterator that yields pairs of (index, row) where:
indexis the index label of the rowrowis a pandas Series containing the row values with column names as the index
For your DataFrame, you can access column values using dictionary-like syntax:
for index, row in df.iterrows():
print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")
What the row object is:
When you use iterrows(), each row is a pandas Series object. This means you can:
- Access values by column name using
row['column_name'] - Access values by integer position using
row[0],row[1], etc. - Use typical Series methods like
row.mean(),row.sum(), etc.
for index, row in df.iterrows():
# Accessing by column name (recommended)
c1_value = row['c1']
c2_value = row['c2']
# Accessing by position
c1_value_alt = row[0]
c2_value_alt = row[1]
print(f"Row {index}: c1 = {c1_value}, c2 = {c2_value}")
Important note: The row object returned by iterrows() is a Series that contains copies of the data, not views. This means you cannot modify the original DataFrame by modifying the row object.
Using itertuples()
The itertuples() method is generally more efficient than iterrows(). It returns an iterator that yields namedtuples where each row is represented as a lightweight namedtuple with field names corresponding to column names.
for row in df.itertuples():
print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")
What the row object is:
When you use itertuples(), each row is a namedtuple object with:
row.Index- the index label of the rowrow.c1- value of column ‘c1’ (attribute access)row.c2- value of column ‘c2’ (attribute access)
You can access values using either:
- Dot notation:
row.column_name(recommended for readability) - Positional indexing:
row[0],row[1], etc.
for row in df.itertuples():
# Accessing by attribute name (recommended)
c1_value = row.c1
c2_value = row.c2
# Accessing by position
c1_value_alt = row[1] # Note: position 0 is Index
c2_value_alt = row[2]
print(f"Row {row.Index}: c1 = {c1_value}, c2 = {c2_value}")
Advantages of itertuples():
- Faster performance (typically 10-15x faster than iterrows)
- Preserves data types across rows
- More memory-efficient
- Cleaner syntax with dot notation
Performance Comparison
The performance difference between iterrows() and itertuples() is significant, especially for larger datasets:
import pandas as pd
import time
# Create a larger DataFrame for performance testing
df_large = pd.DataFrame({'c1': range(1000000), 'c2': range(1000000, 2000000)})
# Test iterrows()
start_time = time.time()
for index, row in df_large.iterrows():
pass # Just iterating
iterrows_time = time.time() - start_time
print(f"iterrows() time: {iterrows_time:.4f} seconds")
# Test itertuples()
start_time = time.time()
for row in df_large.itertuples():
pass # Just iterating
itertuples_time = time.time() - start_time
print(f"itertuples() time: {itertuples_time:.4f} seconds")
print(f"itertuples() is {iterrows_time/itertuples_time:.1f}x faster")
Performance Results:
iterrows(): Slower because it creates new Series objects for each rowitertuples(): Faster because it returns lightweight namedtuples- For DataFrames with 1M rows:
itertuples()is typically 10-15x faster thaniterrows()
According to the pandas documentation: “To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()”
Best Practices and Alternatives
When to Use Iteration Methods:
Use iterrows() when:
- You need to modify row data (though this is generally not recommended)
- You’re working with small datasets (<10,000 rows)
- You need Series-specific methods during iteration
Use itertuples() when:
- You’re working with medium to large datasets
- Performance is important
- You want clean, readable code with dot notation
- You’re only reading data (not modifying)
Performance Considerations:
For large datasets, consider these alternatives to row-by-row iteration:
- Vectorized Operations (Recommended):
# Instead of iterating to calculate a new column
result = df['c1'] * df['c2']
# Instead of iterating to filter
filtered_df = df[df['c1'] > 11]
- List Comprehension:
# For creating lists from columns
c1_values = [row.c1 for row in df.itertuples()]
- Apply Methods:
# For applying functions to columns
df['c1_squared'] = df['c1'].apply(lambda x: x**2)
Important Caveats:
Never modify something you are iterating over - This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect. Source
Complete Examples
Example 1: Basic Row Iteration with iterrows()
import pandas as pd
# Create the DataFrame
df = pd.DataFrame({
'c1': [10, 11, 12],
'c2': [100, 110, 120]
})
print("Original DataFrame:")
print(df)
print("\nUsing iterrows():")
for index, row in df.iterrows():
print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")
Example 2: Basic Row Iteration with itertuples()
print("Using itertuples():")
for row in df.itertuples():
print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")
Example 3: Practical Application - Filtering and Transforming
# Using iterrows() for filtering
print("\nFiltering rows where c1 > 11 using iterrows():")
for index, row in df.iterrows():
if row['c1'] > 11:
print(f"Found: c1 = {row['c1']}, c2 = {row['c2']}")
# Using itertuples() for creating new data
print("\nCreating list of c2 values using itertuples():")
c2_values = [row.c2 for row in df.itertuples()]
print(c2_values)
Example 4: Working with Different Column Types
# DataFrame with mixed data types
df_mixed = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'salary': [50000.50, 60000.75, 70000.00],
'is_employed': [True, False, True]
})
print("\nMixed DataFrame with different data types:")
print(df_mixed)
print("\nIterating through mixed DataFrame:")
for row in df_mixed.itertuples():
print(f"{row.name} (age {row.age}) earns ${row.salary:.2f}, employed: {row.is_employed}")
Conclusion
Successfully iterating over rows in a Pandas DataFrame and accessing column values by name is straightforward once you understand the two main approaches:
- Use
iterrows()for simple cases with small datasets where you need Series methods - Use
itertuples()for better performance and cleaner code, especially with medium to large datasets
Key takeaways:
iterrows()returns(index, Series)pairs - access values withrow['column_name']itertuples()returns namedtuples - access values withrow.column_nameitertuples()is significantly faster (10-15x) and more memory-efficient- For production code with large datasets, consider vectorized operations instead of row iteration
- Never modify DataFrame values while iterating - this leads to unpredictable behavior
The choice between these methods depends on your specific needs, but for most use cases, itertuples() provides the best balance of performance and readability.
Sources
- Python Pandas iterate over rows and access column names - Stack Overflow
- pandas.DataFrame.iterrows — pandas 2.3.3 documentation
- How to Iterate Over Rows with Pandas – Loop Through a Dataframe
- Pandas Iterate Over Rows with Examples - Spark By {Examples}
- Iterating over rows and columns in Pandas DataFrame - GeeksforGeeks
- How can I iterate over rows in a Pandas DataFrame? - Stack Overflow
- Pandas DataFrame itertuples() Method - GeeksforGeeks
- Panda DataFrames - iterrows vs itertuples