Fix Snakemake RuntimeError: Can't Start New Thread
Resolve Snakemake RuntimeError can't start new thread in large workflows with many studies. Use --max-threads 1, ulimit -u 4096, optimize jobs/cores for job_scheduler.py thread exhaustion at 98% complete.
Snakemake RuntimeError: can’t start new thread when executing large workflow with many studies
I’m running a Snakemake workflow that performs the following steps:
- Downloads studies from a database (each study contains data for multiple individuals).
- Splits each study into one file per individual.
- Filters data for each individual.
The workflow completes successfully with a few studies, but fails when processing many studies, raising the error:
RuntimeError: can't start new thread
This occurs near the end of execution (e.g., 98% complete), during job scheduling reevaluation in job_scheduler.py. Full traceback points to threading.py failing to start a new thread.
Command used:
snakemake --cores 3 --jobs 3 --resources mem_mb=7000 download_slots=1 \
--config config_path="shiny-app/configs/config.yaml"
It appears Snakemake may be retaining open job threads after completion, exhausting the thread limit. How can I resolve this issue and successfully run the workflow on large datasets?
The Snakemake RuntimeError: can’t start new thread strikes when your large workflow—processing tons of studies, splitting files per individual, and filtering data—hits Python’s OS thread limits late in the run, like at 98% complete in job_scheduler.py. This happens because Snakemake’s scheduler spins up threads for job monitoring and DAG reevaluation, exhausting available threads in massive DAGs with thousands of jobs. Fix it fast by adding --max-threads 1 to cap scheduler threads, bumping ulimit -u 4096 for more headroom, and tweaking --jobs lower; your updated command becomes snakemake --cores 3 --jobs 2 --max-threads 1 --resources mem_mb=7000 download_slots=1 ulimit -u 4096 --config ....
Contents
- Understanding the Snakemake RuntimeError: can’t start new thread
- Root Causes: Thread Limits in Snakemake Large Workflows
- Quick Fix: Using --max-threads in Snakemake CLI
- System-Level Solution: Adjusting ulimit for Snakemake
- Optimizing Snakemake Jobs, Cores, and Resources for Many Studies
- Best Practices for Large DAGs and Checkpoints in Snakemake
- Monitoring and Troubleshooting Snakemake Thread Exhaustion
- Sources
- Conclusion
Understanding the Snakemake RuntimeError: can’t start new thread
Picture this: your workflow hums along perfectly for a handful of studies—downloads zip through, individuals get split out, filters apply without a hitch. Scale up to hundreds or thousands, and bam, right around 98% done, Snakemake chokes with RuntimeError: can't start new thread deep in job_scheduler.py and Python’s threading.py.
Why late? The scheduler reevaluates the DAG after jobs finish, trying to queue more while holding threads from earlier ones. In this exact matching case on Stack Overflow, a similar pipeline failed at job 3476/3532. Your command (--cores 3 --jobs 3) lets Snakemake parallelize nicely at first, but the thread pile‑up kills it. Not a bug in your rules—it’s resource exhaustion in big workflows.
Root Causes: Thread Limits in Snakemake Large Workflows
Snakemake large workflows expose Python’s threading ceiling, typically around 900‑1000 threads per process before OS limits kick in. Each job needs monitor threads; with --jobs 3 and --cores 3, plus scheduler overhead, you’re burning through them fast on 3500+ jobs from many studies.
Key culprits? The job scheduler in Snakemake creates threads for output checking, DAG updates, and retries—exacerbated in huge DAGs as noted in Snakemake GitHub issue #2354 for 40k‑job runs. OS enforces ulimit -u (max user processes/threads), often defaulting to 1024‑2048 on Linux. Python GIL doesn’t help; threads compete for kernel resources. Ever seen it in Docker? Containers tighten limits further, per Docker forums.
And here’s the kicker: small tests mask it. Few studies mean tiny DAGs, low thread use. Ramp up, and threads linger post‑job, starving new ones.
Quick Fix: Using --max-threads in Snakemake CLI
Don’t overthink it—--max-threads 1 is your immediate savior. This caps the maximum threads any single job requests, overriding rule‑level threads: directives and forcing the scheduler to serialize threading aggressively.
Update your command:
ulimit -u 4096 && snakemake --cores 3 --jobs 2 --max-threads 1 --resources mem_mb=7000 download_slots=1 \
--config config_path="shiny-app/configs/config.yaml"
Per the official Snakemake CLI docs, --max-threads defaults to min(cores, jobs) but overrides everything when set low. Users in that Stack Overflow thread confirmed it pushed their run to completion. Why --jobs 2 too? Reduces concurrent schedulers. Test it—your workflow should cruise past 98%.
Pro tip: Run snakemake --help | grep threads to verify.
System-Level Solution: Adjusting ulimit for Snakemake
CLI tweaks help, but if threads still cap out, crank the OS limit. ulimit -u governs user threads/processes—check yours with ulimit -a. Defaults like 1024? No wonder it fails on large datasets.
Boost it temporarily:
ulimit -u 4096 # Or 8192 for massive runs
snakemake ... # Your workflow here
For persistence, edit /etc/security/limits.conf:
youruser hard nproc 4096
Reboot or log out/in. SuperUser diagnostics show monitoring via ps -fLu $PPID | wc -l (threads under your process) and cat /proc/sys/kernel/threads-max. Pair with --max-threads 1; one report fixed a Python app hitting 1000+.
Containers? Docker defaults tighter—docker run --ulimit nproc=4096:4096 ....
Optimizing Snakemake Jobs, Cores, and Resources for Many Studies
Your command uses --cores 3 --jobs 3, but for Snakemake large workflows, decouple them. --jobs controls concurrent jobs (scheduler threads), --cores total CPUs across jobs.
Tune like this:
| Flag | Role | Recommendation for Large Runs |
|---|---|---|
--jobs |
Concurrent jobs | 2‑4 (lower = fewer scheduler threads) |
--cores |
Total CPUs | Match your machine (e.g., 12) |
--max-threads |
Per‑job threads | 1 (eliminates intra‑job parallelism) |
--resources |
Custom limits | Add io=2 for downloads, scale mem_mb |
In rules, declare conservatively:
rule split_individuals:
input: "study_{study}.zip"
output: "individual_{individual}.txt"
resources:
mem_mb=1000,
threads=1 # Max, scaled by --cores
shell: "..."
Snakemake GitHub #2532 highlights threads: bugs forcing high --jobs. Advanced tutorial explains --cores distributes threads: with --cores 12 --max-threads 4, jobs get up to 4 each. Dial back parallelism for stability.
Best Practices for Large DAGs and Checkpoints in Snakemake
Massive study counts explode your DAG. Break it: use checkpoints to split downloads/splits/filters into phases.
Example Snakefile snippet:
checkpoint download_studies:
input: config["studies"]
output: directory("studies/")
...
def aggregate_individuals(wildcards):
checkpoint_output = checkpoints.download_studies.get(**wildcards).output[0]
return expand("individual_{id}.txt", id=glob_wildcards(os.path.join(checkpoint_output, "{id}.zip")).id)
rule filter_individual:
input: "individual_{id}.txt"
output: "filtered_{id}.txt"
...
This rebuilds DAG incrementally—fewer threads at once. From the advanced docs, checkpoints shine for dynamic outputs. Also, --latency-wait 60 for file system lag in splits.
Why bother? Smaller DAGs mean less scheduler strain, dodging thread errors entirely.
Monitoring and Troubleshooting Snakemake Thread Exhaustion
Spot issues early. Before runs:
ulimit -a | grep process— Check limits.cat /proc/sys/kernel/threads-max— System max.
During: htop or ps -eLf | grep snakemake | wc -l for thread counts. If spiking near ulimit, throttle more.
Logs? --debug --stats for DAG/job insights. Still failing? Reddit thread on Python limits pegs ~1000 as common wall. Docker? Upgrade base image, as in forums.
Incremental tests: 10 studies, 100, then all. You’ll nail it.
Sources
- Snakemake RuntimeError can’t start new thread — Exact traceback match for large workflow failure at 98%: https://stackoverflow.com/questions/79867047/snakemake-runtimeerror-cant-start-new-thread
- Error: can’t start new thread — Explains Python process thread limits and ulimit checks: https://stackoverflow.com/questions/1834919/error-cant-start-new-thread
- Snakemake CLI Arguments — Official docs on --max-threads, --jobs, --cores usage: https://snakemake.readthedocs.io/en/stable/executing/cli.html
- RuntimeError: can’t start new thread — Linux diagnostics with ulimit -a and thread counting: https://superuser.com/questions/1682148/runtimeerror-cant-start-new-thread
- Extreme scheduler strain in large workflows — GitHub discussion on 40k-job DAG overhead: https://github.com/snakemake/snakemake/issues/2354
- Snakemake Tutorial: Advanced Features — Guidance on threads, resources, and checkpoints: https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html
- RuntimeError: can’t start new thread (Docker) — Container-specific threading fixes: https://forums.docker.com/t/runtimeerror-cant-start-new-thread/138142
Conclusion
Tame the Snakemake RuntimeError: can’t start new thread in large workflows by slashing --max-threads 1, hiking ulimit -u 4096, and trimming --jobs to 2 while keeping --cores higher for throughput. Add checkpoints to tame DAG bloat, monitor threads religiously, and test incrementally—you’ll process those thousands of studies smoothly. These tweaks turn failures into reliable pipelines; give the updated command a spin and watch it finish.