How can I list all files in a directory using Python and store them in a list?
You can list all files in a directory using Python by utilizing several methods from the standard library, with the most common approaches being os.listdir() for simple directory listing, pathlib.Path.glob() for more powerful pattern matching, and os.scandir() for better performance. Each method returns a list that you can store and manipulate as needed, with options to filter for specific file types or include full file paths.
Contents
- Using os.listdir()
- Using pathlib.Path Methods
- Using os.scandir()
- Using os.walk() for Recursive Listing
- Using glob Module
- Filtering Files by Extension
- Complete Best Practices Example
Using os.listdir()
The os.listdir() function is the most straightforward method to list all files and directories in a specified path. It returns a list containing the names of the entries in the directory.
import os
# List all entries in current directory
entries = os.listdir()
print(entries)
# List all entries in a specific directory
directory_path = '/path/to/your/directory'
entries = os.listdir(directory_path)
print(entries)
Important: os.listdir() only provides the names of files and directories without their full paths. To get complete file paths, you need to join the directory path with each entry:
import os
directory_path = '/path/to/your/directory'
all_files = [os.path.join(directory_path, entry) for entry in os.listdir(directory_path)]
print(all_files)
This method is simple and works well for basic needs, but it doesn’t distinguish between files and directories and doesn’t provide file metadata.
Using pathlib.Path Methods
The pathlib module (introduced in Python 3.4) provides an object-oriented interface for filesystem paths and is generally preferred for modern Python code.
Basic listing with iterdir():
from pathlib import Path
# Get all entries as Path objects
directory = Path('/path/to/your/directory')
all_entries = list(directory.iterdir())
print(all_entries)
Using glob() for pattern matching:
from pathlib import Path
# Get all files (excluding directories)
directory = Path('/path/to/your/directory')
all_files = list(directory.glob('*')) # '*' matches everything
files_only = [entry for entry in all_files if entry.is_file()]
print(files_only)
pathlib offers more readable syntax and better cross-platform compatibility compared to the older os module.
Using os.scandir()
os.scandir() is a more efficient alternative to os.listdir() introduced in Python 3.5. It provides file type information without requiring additional system calls.
import os
directory_path = '/path/to/your/directory'
with os.scandir(directory_path) as entries:
files = [entry.path for entry in entries if entry.is_file()]
print(files)
Performance benefits: os.scandir() is significantly faster, especially for directories with many files, because it provides file type information during the initial directory scan rather than requiring additional stat() calls.
Using os.walk() for Recursive Listing
When you need to list files from a directory and all its subdirectories recursively, os.walk() is the ideal choice:
import os
def get_all_files(directory_path):
all_files = []
for root, dirs, files in os.walk(directory_path):
for file in files:
full_path = os.path.join(root, file)
all_files.append(full_path)
return all_files
# Usage
directory_path = '/path/to/your/directory'
all_files = get_all_files(directory_path)
print(all_files)
This method traverses the directory tree depth-first, collecting all files from all subdirectories.
Using glob Module
The glob module provides pattern matching functionality similar to Unix shell wildcards:
import glob
# Get all files in current directory
files = glob.glob('*')
print(files)
# Get all files in specific directory
files = glob.glob('/path/to/your/directory/*')
print(files)
# Recursive search (Python 3.5+ with recursive=True)
all_files = glob.glob('/path/to/your/directory/**/*', recursive=True)
print(all_files)
glob is particularly useful when you need to find files matching specific patterns, such as all .txt files or files with certain name patterns.
Filtering Files by Extension
Often you’ll want to list only files with specific extensions. Here are several approaches:
Using list comprehension with endswith():
import os
from pathlib import Path
# Method 1: Using os.listdir()
txt_files = [file for file in os.listdir('/path/to/your/directory')
if file.endswith('.txt')]
# Method 2: Using pathlib
directory = Path('/path/to/your/directory')
txt_files = [file for file in directory.glob('*.txt') if file.is_file()]
# Method 3: Full paths
txt_files_full = [str(file) for file in directory.glob('*.txt') if file.is_file()]
Using multiple extensions:
from pathlib import Path
directory = Path('/path/to/your/directory')
text_files = [file for file in directory.glob('*')
if file.is_file() and file.suffix.lower() in ['.txt', '.md', '.rst']]
Complete Best Practices Example
Here’s a comprehensive example that demonstrates best practices for listing files:
import os
from pathlib import Path
from typing import List
def list_files(directory_path: str,
extensions: List[str] = None,
recursive: bool = False) -> List[str]:
"""
List all files in a directory with optional filtering by extension.
Args:
directory_path: Path to the directory to scan
extensions: List of file extensions to include (e.g., ['.txt', '.py'])
recursive: Whether to search subdirectories
Returns:
List of full file paths
"""
if recursive:
return _list_files_recursive(directory_path, extensions)
else:
return _list_files_single(directory_path, extensions)
def _list_files_single(directory_path: str, extensions: List[str] = None) -> List[str]:
"""List files in a single directory."""
try:
directory = Path(directory_path)
if not directory.exists():
raise FileNotFoundError(f"Directory not found: {directory_path}")
files = []
for entry in directory.iterdir():
if entry.is_file():
if extensions is None or entry.suffix.lower() in extensions:
files.append(str(entry.absolute()))
return files
except PermissionError:
print(f"Permission denied accessing: {directory_path}")
return []
def _list_files_recursive(directory_path: str, extensions: List[str] = None) -> List[str]:
"""List files recursively including subdirectories."""
try:
directory = Path(directory_path)
if not directory.exists():
raise FileNotFoundError(f"Directory not found: {directory_path}")
files = []
for root, dirs, files_in_dir in os.walk(directory_path):
for file in files_in_dir:
full_path = os.path.join(root, file)
if extensions is None or any(full_path.lower().endswith(ext) for ext in extensions):
files.append(os.path.abspath(full_path))
return files
except PermissionError:
print(f"Permission denied accessing: {directory_path}")
return []
# Usage examples
if __name__ == "__main__":
# List all files in current directory
all_files = list_files('.')
print(f"Found {len(all_files)} files")
# List only Python files
python_files = list_files('.', extensions=['.py'])
print(f"Found {len(python_files)} Python files")
# List all text files recursively
text_files = list_files('/path/to/docs', extensions=['.txt', '.md', '.rst'], recursive=True)
print(f"Found {len(text_files)} text files recursively")
This example includes error handling, type hints, and flexible filtering options. It demonstrates both single-directory and recursive listing with extension filtering.
Conclusion
To effectively list all files in a directory using Python, consider these key approaches:
- For simple needs: Use
os.listdir()combined withos.path.join()for basic file listing - For modern Python: Prefer
pathlib.Path.glob()for its cleaner syntax and better cross-platform support - For performance: Use
os.scandir()when working with large directories as it provides file type information more efficiently - For recursive searches: Choose
os.walk()when you need to traverse directory trees - For pattern matching: Use the
globmodule when you need to find files matching specific naming patterns
Remember to handle potential errors like permission issues and non-existent directories, especially in production code. The pathlib module is generally recommended for new projects due to its more intuitive object-oriented interface and better handling of path operations across different operating systems.