Why does adding fmt.Println() inside filepath.WalkDir() significantly slow down my Go program?
I’m using filepath.WalkDir() in Go to traverse directories and collect all .pdf files. I’ve noticed a significant performance difference between these two code versions:
Code 1 (without fmt.Println):
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
if err != nil {
fmt.Println(err)
return nil
}
if !d.IsDir() && filepath.Ext(path) == ".pdf" {
files = append(files, d.Name())
}
return nil
})
Code 2 (with fmt.Println):
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
if err != nil {
fmt.Println(err)
return nil
}
if !d.IsDir() && filepath.Ext(path) == ".pdf" {
files = append(files, d.Name())
}
fmt.Println(path) // this line was added
return nil
})
When I measure the execution time:
- Code 1: around 600ms
- Code 2: around 6s
Why does adding a single fmt.Println() call inside the WalkDir callback make the program 10x slower?
Why Adding fmt.Println() Inside filepath.WalkDir() Slows Down Go Programs
Adding fmt.Println()
inside a filepath.WalkDir()
callback can dramatically slow down your Go program because print statements involve expensive I/O operations that compound across thousands of files during directory traversal. Even a single additional line can cause a 10x performance degradation due to system call overhead, locking, and lack of buffering in the default print implementation.
Contents
- Understanding the Performance Impact
- Why fmt.Println() Is Expensive
- The Compound Effect in Directory Traversal
- Solutions and Optimizations
- When Print Statements Are Useful
Understanding the Performance Impact
The dramatic slowdown you’re experiencing (from 600ms to 6s) is not unusual when adding I/O operations to high-frequency code paths. Directory traversal inherently visits many files and directories—potentially thousands in a typical filesystem. Each fmt.Println()
call adds overhead that compounds across all these visited files.
When you use filepath.WalkDir()
, it visits every file and directory in the tree, not just the PDF files you’re collecting. If you have a directory with 10,000 files (including non-PDF files), you’re adding 10,000 print statements, each with its own performance cost.
Why fmt.Println() Is Expensive
The fmt.Println()
function appears simple but performs several expensive operations:
- String formatting: It needs to format the
path
string into bytes for output - System acquisition: It acquires a lock on
os.Stdout
to ensure thread-safe output - System call: It makes a system call to write the data to the output stream
- Lock release: It releases the stdout lock
- Buffer management: It handles output buffering, which may result in immediate flushing
// Simplified view of what fmt.Println() does internally
func Println(a ...interface{}) (n int, err error) {
return fmt.Fprintln(os.Stdout, a...)
}
The system call to write to stdout is particularly expensive because it requires transitioning from user space to kernel space, which is one of the most expensive operations in operating systems.
The Compound Effect in Directory Traversal
Directory traversal involves visiting many filesystem entries, and the performance impact compounds with each entry:
- Base traversal cost: Your program already spends 600ms just walking the directory structure
- Per-entry overhead: Each
fmt.Println()
adds microseconds of overhead - Multiplicative effect: 1000 files × 1ms per print = 1000ms additional time
In your example, the 10x slowdown suggests that either:
- You’re traversing thousands of files/directories
- The underlying filesystem is slow (network, cloud, or mechanical storage)
- The terminal processing adds additional overhead
Operation | Cost (approximate) | Impact in WalkDir |
---|---|---|
Filesystem stat | 0.01-1ms | High (happens for each entry) |
String comparison | <0.001ms | Negligible |
fmt.Println() | 0.1-5ms | Very high (compounds across entries) |
Solutions and Optimizations
If you need to output progress information during directory traversal, consider these optimized approaches:
1. Use a Buffered Writer
buf := bufio.NewWriter(os.Stdout)
defer buf.Flush()
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
// ... your existing code ...
fmt.Fprintln(buf, path) // Much faster than fmt.Println()
return nil
})
2. Use a Logging Package
Structured logging packages like logrus
or zap
are optimized for performance:
import "github.com/sirupsen/logrus"
log.SetOutput(os.Stdout)
log.SetFormatter(&logrus.TextFormatter{})
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
// ... your existing code ...
log.Info(path) // Optimized for performance
return nil
})
3. Conditional Output
Make the output optional or rate-limited:
var printCounter int
const printEvery = 100 // Print only every 100th file
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
// ... your existing code ...
printCounter++
if printCounter%printEvery == 0 {
fmt.Println(path)
}
return nil
})
4. Redirect Output When Performance Matters
// For production runs, disable or redirect output
if os.Getenv("PERFORMANCE_MODE") == "true" {
os.Stdout = nil // Or redirect to /dev/null
}
When Print Statements Are Useful
Despite the performance cost, print statements can be valuable for:
- Debugging: Understanding what files are being processed
- Progress tracking: Showing users that your program is working
- Error reporting: Highlighting permission issues or inaccessible files
The key is to use them judiciously:
- Only enable them when needed (via flags or environment variables)
- Use rate limiting for large directory trees
- Consider alternative output methods for production code
In summary, the dramatic slowdown from adding fmt.Println()
in filepath.WalkDir()
occurs because print statements involve expensive I/O operations that compound across every file and directory visited. For production code, avoid direct print statements in performance-critical paths like directory traversal callbacks.