NeuroAgent

Complete Guide: Iterate Rows in Pandas DataFrame and Access Columns

Learn how to iterate over rows in Pandas DataFrames and access column values by name using iterrows() and itertuples(). Complete guide with examples and performance comparisons.

#dataframe-iteration #data-manipulation #pandas-dataframe #pandas-row-access #python-data-analysis #python-pandas

10/20/2025, 09:01 PM

Question

How can I iterate over rows in a Pandas DataFrame and access column values by name?

I have a pandas DataFrame df:

How do I iterate over the rows of this DataFrame? For every row, I want to access its elements (values in cells) by the column name. For example:

python

for row in df.rows:
    print(row['c1'], row['c2'])

I found a similar question that suggests using either of these approaches:

python

for date, row in df.T.iteritems():

python

for row in df.iterrows():

But I don’t understand what the row object is and how I can work with it.

NeuroAgent

To iterate over rows in a Pandas DataFrame and access column values by name, you can use either iterrows() or itertuples() methods. With iterrows(), each iteration returns an (index, Series) pair where the Series contains the row values with column names as index, allowing you to access values using row['column_name']. For better performance, itertuples() returns namedtuples where you can access values using dot notation like row.column_name or tuple indexing.

Using iterrows()
Using itertuples()
Performance Comparison
Best Practices and Alternatives
Complete Examples

Using iterrows()

The iterrows() method is the most straightforward approach for beginners. It returns an iterator that yields pairs of (index, row) where:

index is the index label of the row
row is a pandas Series containing the row values with column names as the index

For your DataFrame, you can access column values using dictionary-like syntax:

python

for index, row in df.iterrows():
    print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")

What the row object is:
When you use iterrows(), each row is a pandas Series object. This means you can:

Access values by column name using row['column_name']
Access values by integer position using row[0], row[1], etc.
Use typical Series methods like row.mean(), row.sum(), etc.

python

for index, row in df.iterrows():
    # Accessing by column name (recommended)
    c1_value = row['c1']
    c2_value = row['c2']
    
    # Accessing by position
    c1_value_alt = row[0]
    c2_value_alt = row[1]
    
    print(f"Row {index}: c1 = {c1_value}, c2 = {c2_value}")

Important note: The row object returned by iterrows() is a Series that contains copies of the data, not views. This means you cannot modify the original DataFrame by modifying the row object.

Using itertuples()

The itertuples() method is generally more efficient than iterrows(). It returns an iterator that yields namedtuples where each row is represented as a lightweight namedtuple with field names corresponding to column names.

python

for row in df.itertuples():
    print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")

What the row object is:
When you use itertuples(), each row is a namedtuple object with:

row.Index - the index label of the row
row.c1 - value of column ‘c1’ (attribute access)
row.c2 - value of column ‘c2’ (attribute access)

You can access values using either:

Dot notation: row.column_name (recommended for readability)
Positional indexing: row[0], row[1], etc.

python

for row in df.itertuples():
    # Accessing by attribute name (recommended)
    c1_value = row.c1
    c2_value = row.c2
    
    # Accessing by position
    c1_value_alt = row[1]  # Note: position 0 is Index
    c2_value_alt = row[2]
    
    print(f"Row {row.Index}: c1 = {c1_value}, c2 = {c2_value}")

Advantages of itertuples():

Faster performance (typically 10-15x faster than iterrows)
Preserves data types across rows
More memory-efficient
Cleaner syntax with dot notation

Performance Comparison

The performance difference between iterrows() and itertuples() is significant, especially for larger datasets:

python

import pandas as pd
import time

# Create a larger DataFrame for performance testing
df_large = pd.DataFrame({'c1': range(1000000), 'c2': range(1000000, 2000000)})

# Test iterrows()
start_time = time.time()
for index, row in df_large.iterrows():
    pass  # Just iterating
iterrows_time = time.time() - start_time
print(f"iterrows() time: {iterrows_time:.4f} seconds")

# Test itertuples()
start_time = time.time()
for row in df_large.itertuples():
    pass  # Just iterating
itertuples_time = time.time() - start_time
print(f"itertuples() time: {itertuples_time:.4f} seconds")
print(f"itertuples() is {iterrows_time/itertuples_time:.1f}x faster")

Performance Results:

iterrows(): Slower because it creates new Series objects for each row
itertuples(): Faster because it returns lightweight namedtuples
For DataFrames with 1M rows: itertuples() is typically 10-15x faster than iterrows()

According to the pandas documentation: “To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()”

Best Practices and Alternatives

When to Use Iteration Methods:

Use iterrows() when:

You need to modify row data (though this is generally not recommended)
You’re working with small datasets (<10,000 rows)
You need Series-specific methods during iteration

Use itertuples() when:

You’re working with medium to large datasets
Performance is important
You want clean, readable code with dot notation
You’re only reading data (not modifying)

Performance Considerations:

For large datasets, consider these alternatives to row-by-row iteration:

Vectorized Operations (Recommended):

python

# Instead of iterating to calculate a new column
result = df['c1'] * df['c2']

# Instead of iterating to filter
filtered_df = df[df['c1'] > 11]

List Comprehension:

python

# For creating lists from columns
c1_values = [row.c1 for row in df.itertuples()]

Apply Methods:

python

# For applying functions to columns
df['c1_squared'] = df['c1'].apply(lambda x: x**2)

Important Caveats:

Never modify something you are iterating over - This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect. Source

Complete Examples

Example 1: Basic Row Iteration with iterrows()

python

import pandas as pd

# Create the DataFrame
df = pd.DataFrame({
    'c1': [10, 11, 12],
    'c2': [100, 110, 120]
})

print("Original DataFrame:")
print(df)
print("\nUsing iterrows():")
for index, row in df.iterrows():
    print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")

Example 2: Basic Row Iteration with itertuples()

python

print("Using itertuples():")
for row in df.itertuples():
    print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")

Example 3: Practical Application - Filtering and Transforming

python

# Using iterrows() for filtering
print("\nFiltering rows where c1 > 11 using iterrows():")
for index, row in df.iterrows():
    if row['c1'] > 11:
        print(f"Found: c1 = {row['c1']}, c2 = {row['c2']}")

# Using itertuples() for creating new data
print("\nCreating list of c2 values using itertuples():")
c2_values = [row.c2 for row in df.itertuples()]
print(c2_values)

Example 4: Working with Different Column Types

python

# DataFrame with mixed data types
df_mixed = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000.50, 60000.75, 70000.00],
    'is_employed': [True, False, True]
})

print("\nMixed DataFrame with different data types:")
print(df_mixed)

print("\nIterating through mixed DataFrame:")
for row in df_mixed.itertuples():
    print(f"{row.name} (age {row.age}) earns ${row.salary:.2f}, employed: {row.is_employed}")

Conclusion

Successfully iterating over rows in a Pandas DataFrame and accessing column values by name is straightforward once you understand the two main approaches:

Use iterrows() for simple cases with small datasets where you need Series methods
Use itertuples() for better performance and cleaner code, especially with medium to large datasets

Key takeaways:

iterrows() returns (index, Series) pairs - access values with row['column_name']
itertuples() returns namedtuples - access values with row.column_name
itertuples() is significantly faster (10-15x) and more memory-efficient
For production code with large datasets, consider vectorized operations instead of row iteration
Never modify DataFrame values while iterating - this leads to unpredictable behavior

The choice between these methods depends on your specific needs, but for most use cases, itertuples() provides the best balance of performance and readability.

Sources

What's the performance difference between iterrows() and itertuples() in Pandas?How can I modify DataFrame values while iterating over rows in Pandas?What are the alternatives to row-by-row iteration in Pandas for better performance?How to iterate over DataFrame rows and access both index and column values efficiently?Can I use DataFrame.apply() instead of row iteration for better performance?How to iterate over rows in a Pandas DataFrame with multiple conditions?

Ask NeuroAgent

Complete Guide: Iterate Rows in Pandas DataFrame and Access Columns

Contents

Using iterrows()

Using itertuples()

Performance Comparison

Best Practices and Alternatives

When to Use Iteration Methods:

Performance Considerations:

Important Caveats:

Complete Examples

Example 1: Basic Row Iteration with iterrows()

Example 2: Basic Row Iteration with itertuples()

Example 3: Practical Application - Filtering and Transforming

Example 4: Working with Different Column Types

Conclusion

Sources