NeuroAgent

Complete Guide: Iterate Rows in Pandas DataFrame and Access Columns

Learn how to iterate over rows in Pandas DataFrames and access column values by name using iterrows() and itertuples(). Complete guide with examples and performance comparisons.

Question

How can I iterate over rows in a Pandas DataFrame and access column values by name?

I have a pandas DataFrame df:

   c1   c2
0  10  100
1  11  110
2  12  120

How do I iterate over the rows of this DataFrame? For every row, I want to access its elements (values in cells) by the column name. For example:

python
for row in df.rows:
    print(row['c1'], row['c2'])

I found a similar question that suggests using either of these approaches:

python
for date, row in df.T.iteritems():

or

python
for row in df.iterrows():

But I don’t understand what the row object is and how I can work with it.

NeuroAgent

To iterate over rows in a Pandas DataFrame and access column values by name, you can use either iterrows() or itertuples() methods. With iterrows(), each iteration returns an (index, Series) pair where the Series contains the row values with column names as index, allowing you to access values using row['column_name']. For better performance, itertuples() returns namedtuples where you can access values using dot notation like row.column_name or tuple indexing.

Contents

Using iterrows()

The iterrows() method is the most straightforward approach for beginners. It returns an iterator that yields pairs of (index, row) where:

  • index is the index label of the row
  • row is a pandas Series containing the row values with column names as the index

For your DataFrame, you can access column values using dictionary-like syntax:

python
for index, row in df.iterrows():
    print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")

What the row object is:
When you use iterrows(), each row is a pandas Series object. This means you can:

  • Access values by column name using row['column_name']
  • Access values by integer position using row[0], row[1], etc.
  • Use typical Series methods like row.mean(), row.sum(), etc.
python
for index, row in df.iterrows():
    # Accessing by column name (recommended)
    c1_value = row['c1']
    c2_value = row['c2']
    
    # Accessing by position
    c1_value_alt = row[0]
    c2_value_alt = row[1]
    
    print(f"Row {index}: c1 = {c1_value}, c2 = {c2_value}")

Important note: The row object returned by iterrows() is a Series that contains copies of the data, not views. This means you cannot modify the original DataFrame by modifying the row object.

Using itertuples()

The itertuples() method is generally more efficient than iterrows(). It returns an iterator that yields namedtuples where each row is represented as a lightweight namedtuple with field names corresponding to column names.

python
for row in df.itertuples():
    print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")

What the row object is:
When you use itertuples(), each row is a namedtuple object with:

  • row.Index - the index label of the row
  • row.c1 - value of column ‘c1’ (attribute access)
  • row.c2 - value of column ‘c2’ (attribute access)

You can access values using either:

  1. Dot notation: row.column_name (recommended for readability)
  2. Positional indexing: row[0], row[1], etc.
python
for row in df.itertuples():
    # Accessing by attribute name (recommended)
    c1_value = row.c1
    c2_value = row.c2
    
    # Accessing by position
    c1_value_alt = row[1]  # Note: position 0 is Index
    c2_value_alt = row[2]
    
    print(f"Row {row.Index}: c1 = {c1_value}, c2 = {c2_value}")

Advantages of itertuples():

  • Faster performance (typically 10-15x faster than iterrows)
  • Preserves data types across rows
  • More memory-efficient
  • Cleaner syntax with dot notation

Performance Comparison

The performance difference between iterrows() and itertuples() is significant, especially for larger datasets:

python
import pandas as pd
import time

# Create a larger DataFrame for performance testing
df_large = pd.DataFrame({'c1': range(1000000), 'c2': range(1000000, 2000000)})

# Test iterrows()
start_time = time.time()
for index, row in df_large.iterrows():
    pass  # Just iterating
iterrows_time = time.time() - start_time
print(f"iterrows() time: {iterrows_time:.4f} seconds")

# Test itertuples()
start_time = time.time()
for row in df_large.itertuples():
    pass  # Just iterating
itertuples_time = time.time() - start_time
print(f"itertuples() time: {itertuples_time:.4f} seconds")
print(f"itertuples() is {iterrows_time/itertuples_time:.1f}x faster")

Performance Results:

  • iterrows(): Slower because it creates new Series objects for each row
  • itertuples(): Faster because it returns lightweight namedtuples
  • For DataFrames with 1M rows: itertuples() is typically 10-15x faster than iterrows()

According to the pandas documentation: “To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()

Best Practices and Alternatives

When to Use Iteration Methods:

Use iterrows() when:

  • You need to modify row data (though this is generally not recommended)
  • You’re working with small datasets (<10,000 rows)
  • You need Series-specific methods during iteration

Use itertuples() when:

  • You’re working with medium to large datasets
  • Performance is important
  • You want clean, readable code with dot notation
  • You’re only reading data (not modifying)

Performance Considerations:

For large datasets, consider these alternatives to row-by-row iteration:

  1. Vectorized Operations (Recommended):
python
# Instead of iterating to calculate a new column
result = df['c1'] * df['c2']

# Instead of iterating to filter
filtered_df = df[df['c1'] > 11]
  1. List Comprehension:
python
# For creating lists from columns
c1_values = [row.c1 for row in df.itertuples()]
  1. Apply Methods:
python
# For applying functions to columns
df['c1_squared'] = df['c1'].apply(lambda x: x**2)

Important Caveats:

Never modify something you are iterating over - This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect. Source

Complete Examples

Example 1: Basic Row Iteration with iterrows()

python
import pandas as pd

# Create the DataFrame
df = pd.DataFrame({
    'c1': [10, 11, 12],
    'c2': [100, 110, 120]
})

print("Original DataFrame:")
print(df)
print("\nUsing iterrows():")
for index, row in df.iterrows():
    print(f"Row {index}: c1 = {row['c1']}, c2 = {row['c2']}")

Example 2: Basic Row Iteration with itertuples()

python
print("Using itertuples():")
for row in df.itertuples():
    print(f"Row {row.Index}: c1 = {row.c1}, c2 = {row.c2}")

Example 3: Practical Application - Filtering and Transforming

python
# Using iterrows() for filtering
print("\nFiltering rows where c1 > 11 using iterrows():")
for index, row in df.iterrows():
    if row['c1'] > 11:
        print(f"Found: c1 = {row['c1']}, c2 = {row['c2']}")

# Using itertuples() for creating new data
print("\nCreating list of c2 values using itertuples():")
c2_values = [row.c2 for row in df.itertuples()]
print(c2_values)

Example 4: Working with Different Column Types

python
# DataFrame with mixed data types
df_mixed = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000.50, 60000.75, 70000.00],
    'is_employed': [True, False, True]
})

print("\nMixed DataFrame with different data types:")
print(df_mixed)

print("\nIterating through mixed DataFrame:")
for row in df_mixed.itertuples():
    print(f"{row.name} (age {row.age}) earns ${row.salary:.2f}, employed: {row.is_employed}")

Conclusion

Successfully iterating over rows in a Pandas DataFrame and accessing column values by name is straightforward once you understand the two main approaches:

  • Use iterrows() for simple cases with small datasets where you need Series methods
  • Use itertuples() for better performance and cleaner code, especially with medium to large datasets

Key takeaways:

  1. iterrows() returns (index, Series) pairs - access values with row['column_name']
  2. itertuples() returns namedtuples - access values with row.column_name
  3. itertuples() is significantly faster (10-15x) and more memory-efficient
  4. For production code with large datasets, consider vectorized operations instead of row iteration
  5. Never modify DataFrame values while iterating - this leads to unpredictable behavior

The choice between these methods depends on your specific needs, but for most use cases, itertuples() provides the best balance of performance and readability.

Sources

  1. Python Pandas iterate over rows and access column names - Stack Overflow
  2. pandas.DataFrame.iterrows — pandas 2.3.3 documentation
  3. How to Iterate Over Rows with Pandas – Loop Through a Dataframe
  4. Pandas Iterate Over Rows with Examples - Spark By {Examples}
  5. Iterating over rows and columns in Pandas DataFrame - GeeksforGeeks
  6. How can I iterate over rows in a Pandas DataFrame? - Stack Overflow
  7. Pandas DataFrame itertuples() Method - GeeksforGeeks
  8. Panda DataFrames - iterrows vs itertuples