NeuroAgent

Complete Guide: List Files in Python

Learn multiple methods to list files in Python directories and store them in lists. Compare os.listdir(), pathlib, os.scandir(), and os.walk() with practical examples and best practices.

Question

How can I list all files in a directory using Python and store them in a list?

NeuroAgent

You can list all files in a directory using Python by utilizing several methods from the standard library, with the most common approaches being os.listdir() for simple directory listing, pathlib.Path.glob() for more powerful pattern matching, and os.scandir() for better performance. Each method returns a list that you can store and manipulate as needed, with options to filter for specific file types or include full file paths.

Contents

Using os.listdir()

The os.listdir() function is the most straightforward method to list all files and directories in a specified path. It returns a list containing the names of the entries in the directory.

python
import os

# List all entries in current directory
entries = os.listdir()
print(entries)

# List all entries in a specific directory
directory_path = '/path/to/your/directory'
entries = os.listdir(directory_path)
print(entries)

Important: os.listdir() only provides the names of files and directories without their full paths. To get complete file paths, you need to join the directory path with each entry:

python
import os

directory_path = '/path/to/your/directory'
all_files = [os.path.join(directory_path, entry) for entry in os.listdir(directory_path)]
print(all_files)

This method is simple and works well for basic needs, but it doesn’t distinguish between files and directories and doesn’t provide file metadata.

Using pathlib.Path Methods

The pathlib module (introduced in Python 3.4) provides an object-oriented interface for filesystem paths and is generally preferred for modern Python code.

Basic listing with iterdir():

python
from pathlib import Path

# Get all entries as Path objects
directory = Path('/path/to/your/directory')
all_entries = list(directory.iterdir())
print(all_entries)

Using glob() for pattern matching:

python
from pathlib import Path

# Get all files (excluding directories)
directory = Path('/path/to/your/directory')
all_files = list(directory.glob('*'))  # '*' matches everything
files_only = [entry for entry in all_files if entry.is_file()]
print(files_only)

pathlib offers more readable syntax and better cross-platform compatibility compared to the older os module.

Using os.scandir()

os.scandir() is a more efficient alternative to os.listdir() introduced in Python 3.5. It provides file type information without requiring additional system calls.

python
import os

directory_path = '/path/to/your/directory'
with os.scandir(directory_path) as entries:
    files = [entry.path for entry in entries if entry.is_file()]
    print(files)

Performance benefits: os.scandir() is significantly faster, especially for directories with many files, because it provides file type information during the initial directory scan rather than requiring additional stat() calls.

Using os.walk() for Recursive Listing

When you need to list files from a directory and all its subdirectories recursively, os.walk() is the ideal choice:

python
import os

def get_all_files(directory_path):
    all_files = []
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            full_path = os.path.join(root, file)
            all_files.append(full_path)
    return all_files

# Usage
directory_path = '/path/to/your/directory'
all_files = get_all_files(directory_path)
print(all_files)

This method traverses the directory tree depth-first, collecting all files from all subdirectories.

Using glob Module

The glob module provides pattern matching functionality similar to Unix shell wildcards:

python
import glob

# Get all files in current directory
files = glob.glob('*')
print(files)

# Get all files in specific directory
files = glob.glob('/path/to/your/directory/*')
print(files)

# Recursive search (Python 3.5+ with recursive=True)
all_files = glob.glob('/path/to/your/directory/**/*', recursive=True)
print(all_files)

glob is particularly useful when you need to find files matching specific patterns, such as all .txt files or files with certain name patterns.

Filtering Files by Extension

Often you’ll want to list only files with specific extensions. Here are several approaches:

Using list comprehension with endswith():

python
import os
from pathlib import Path

# Method 1: Using os.listdir()
txt_files = [file for file in os.listdir('/path/to/your/directory') 
             if file.endswith('.txt')]

# Method 2: Using pathlib
directory = Path('/path/to/your/directory')
txt_files = [file for file in directory.glob('*.txt') if file.is_file()]

# Method 3: Full paths
txt_files_full = [str(file) for file in directory.glob('*.txt') if file.is_file()]

Using multiple extensions:

python
from pathlib import Path

directory = Path('/path/to/your/directory')
text_files = [file for file in directory.glob('*') 
              if file.is_file() and file.suffix.lower() in ['.txt', '.md', '.rst']]

Complete Best Practices Example

Here’s a comprehensive example that demonstrates best practices for listing files:

python
import os
from pathlib import Path
from typing import List

def list_files(directory_path: str, 
               extensions: List[str] = None, 
               recursive: bool = False) -> List[str]:
    """
    List all files in a directory with optional filtering by extension.
    
    Args:
        directory_path: Path to the directory to scan
        extensions: List of file extensions to include (e.g., ['.txt', '.py'])
        recursive: Whether to search subdirectories
    
    Returns:
        List of full file paths
    """
    if recursive:
        return _list_files_recursive(directory_path, extensions)
    else:
        return _list_files_single(directory_path, extensions)

def _list_files_single(directory_path: str, extensions: List[str] = None) -> List[str]:
    """List files in a single directory."""
    try:
        directory = Path(directory_path)
        if not directory.exists():
            raise FileNotFoundError(f"Directory not found: {directory_path}")
        
        files = []
        for entry in directory.iterdir():
            if entry.is_file():
                if extensions is None or entry.suffix.lower() in extensions:
                    files.append(str(entry.absolute()))
        
        return files
    except PermissionError:
        print(f"Permission denied accessing: {directory_path}")
        return []

def _list_files_recursive(directory_path: str, extensions: List[str] = None) -> List[str]:
    """List files recursively including subdirectories."""
    try:
        directory = Path(directory_path)
        if not directory.exists():
            raise FileNotFoundError(f"Directory not found: {directory_path}")
        
        files = []
        for root, dirs, files_in_dir in os.walk(directory_path):
            for file in files_in_dir:
                full_path = os.path.join(root, file)
                if extensions is None or any(full_path.lower().endswith(ext) for ext in extensions):
                    files.append(os.path.abspath(full_path))
        
        return files
    except PermissionError:
        print(f"Permission denied accessing: {directory_path}")
        return []

# Usage examples
if __name__ == "__main__":
    # List all files in current directory
    all_files = list_files('.')
    print(f"Found {len(all_files)} files")
    
    # List only Python files
    python_files = list_files('.', extensions=['.py'])
    print(f"Found {len(python_files)} Python files")
    
    # List all text files recursively
    text_files = list_files('/path/to/docs', extensions=['.txt', '.md', '.rst'], recursive=True)
    print(f"Found {len(text_files)} text files recursively")

This example includes error handling, type hints, and flexible filtering options. It demonstrates both single-directory and recursive listing with extension filtering.

Conclusion

To effectively list all files in a directory using Python, consider these key approaches:

  1. For simple needs: Use os.listdir() combined with os.path.join() for basic file listing
  2. For modern Python: Prefer pathlib.Path.glob() for its cleaner syntax and better cross-platform support
  3. For performance: Use os.scandir() when working with large directories as it provides file type information more efficiently
  4. For recursive searches: Choose os.walk() when you need to traverse directory trees
  5. For pattern matching: Use the glob module when you need to find files matching specific naming patterns

Remember to handle potential errors like permission issues and non-existent directories, especially in production code. The pathlib module is generally recommended for new projects due to its more intuitive object-oriented interface and better handling of path operations across different operating systems.

Sources

  1. Python Documentation - os Module
  2. Python Documentation - pathlib Module
  3. Python Documentation - glob Module
  4. Real Python - Working with Files in Python