OS

Fix PyCUDA cuInit Failed Error on Ubuntu CUDA GPUs

Resolve 'pycuda._driver.Error: cuInit failed: unknown error' on Ubuntu with NVIDIA P102-100 GPUs. Reload nvidia-uvm modules, verify drivers, reinstall CUDA toolkit. Step-by-step fix for ubuntu cuda setups where nvidia-smi works but PyCUDA fails.

1 answer 1 view

How to fix pycuda._driver.Error: cuInit failed: unknown error when importing pycuda.autoinit on Ubuntu with NVIDIA P102-100 GPUs?

Problem Description

I encounter this error on a specific Ubuntu server (works on others):

>>> import pycuda.autoinit
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/grl00/local_env/lib/python3.12/site-packages/pycuda/autoinit.py", line 9, in <module>
 cuda.init()
pycuda._driver.Error: cuInit failed: unknown error

System Details

  • NVIDIA Driver: 580.95.05 (CUDA Version: 13.0)
  • GPUs: 3x NVIDIA P102-100

NVIDIA-SMI output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA P102-100 Off | 00000000:04:00.0 Off | N/A |
| 51% 27C P8 9W / 250W | 5MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA P102-100 Off | 00000000:83:00.0 Off | N/A |
| 51% 20C P8 8W / 250W | 5MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA P102-100 Off | 00000000:84:00.0 Off | N/A |
| 50% 14C P8 8W / 250W | 5MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2005 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2005 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2005 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+

Attempts

  1. Installed NVIDIA driver: sudo apt install nvidia-driver-580-server
  2. CUDA 12.0 via apt: sudo apt install nvidia-cuda-toolkit (nvcc --version: 12.0)
  3. CUDA 13.0/13.1 via NVIDIA repo (detailed steps followed, nvcc --version: 13.0)
  4. PyCUDA compiles successfully via pip in both cases.
  5. Multiple uninstall/reinstall cycles.

PyCUDA works on other servers with similar setups. What could cause cuInit to fail, and how to resolve it?

The “cuInit failed: unknown error” when importing pycuda.autoinit on Ubuntu CUDA systems with NVIDIA P102-100 GPUs usually points to the nvidia-uvm kernel module failing to load, even when nvidia-smi shows your GPUs fine. A quick fix? Run sudo nvidia-modprobe -u to reload NVIDIA modules, check with lsmod | grep nvidia, and reboot—works for most users on Ubuntu 24.04 or similar. If that doesn’t stick, purge and reinstall the NVIDIA driver like 580-server, as P102-100 needs specific versions for stability.


Contents


Understanding the cuInit Failed Error

Ever hit that wall where nvidia-smi happily lists your three P102-100 GPUs, driver 580.95.05 reports CUDA 13.0 ready to go, but import pycuda.autoinit bombs with “cuInit failed: unknown error”? You’re not alone—it’s a classic gotcha on Ubuntu servers. PyCUDA’s autoinit calls straight into the CUDA driver to initialize the context, and if the low-level CUDA runtime can’t handshake with your kernel modules, boom.

Why this specific error? cuInit is the first CUDA API call, checking for capable devices and loading essentials. On Ubuntu CUDA setups, it often flakes because the nvidia-uvm module (handles unified virtual memory) isn’t loaded or mismatches your kernel. P102-100 cards, being Pascal-era mining beasts repurposed for compute, add quirks—they show in nvidia-smi but need explicit module support. And since it works on your other servers? Likely a per-machine driver install hiccup or kernel param difference.

Short version: nvidia-smi tests basic driver access, but cuInit probes deeper into GPU memory management. Fix the modules, and PyCUDA breathes easy.


Verify NVIDIA Driver and P102-100 Compatibility

First things first: confirm your driver plays nice with P102-100 on Ubuntu. NVIDIA’s docs confirm Pascal GPUs like P102-100 (essentially a P100 variant) get love in driver 580.95.05, but real-world tests on Ubuntu 24.04 shine with 535 or 550 too. Your 580-server install looks solid, yet inconsistencies creep in across servers.

Run this to double-check:

nvidia-smi --query-gpu=driver_version --format=csv

If it’s not exactly 580.95.05 everywhere, that’s suspect. P102-100 users report success on Ubuntu 24.04 with these drivers—no off-brand headaches.

Purge everything for a clean slate:

sudo apt purge 'nvidia-*' && sudo apt autoremove
sudo apt update
sudo apt install nvidia-driver-580-server

Reboot, then nvidia-smi. Matches your output? Great. But if cuInit still gripes, modules are next. Why purge? Half-installed packages or leftovers from apt CUDA toolkit mess with runtime detection.

One user tip: P102-100 thrives without persistence mode initially—your “Off” status is fine for headless servers.


Check and Reload NVIDIA Kernel Modules

Here’s the hero fix from countless Stack Overflow threads: nvidia-uvm isn’t loaded. nvidia-smi doesn’t need it, but CUDA init does for memory ops.

Diagnose:

lsmod | grep nvidia
dmesg | grep -i nvidia

Look for nvidia_uvm, nvidia_drm, nvidia_modeset. Missing nvidia_uvm? That’s your culprit—cuInit chokes without it.

Reload:

sudo modprobe nvidia_uvm
# Or broader:
sudo nvidia-modprobe -u

Reboot after. This flushes and reloads all NVIDIA modules, fixing “unknown error” on Ubuntu CUDA rigs. Works until reboot sometimes, so make persistent:

echo 'nvidia-uvm' | sudo tee -a /etc/modules

Users with P102-100 swear by this—especially post-upgrade when modules desync.

Still no? Check Secure Boot (disable in BIOS if on) or kernel version mismatches. uname -r should align with your driver’s supported kernels per NVIDIA release notes.


Reinstall CUDA Toolkit on Ubuntu

Your mix of apt nvidia-cuda-toolkit (12.0) and driver CUDA 13.0? Potential mismatch, though PyCUDA leans on driver libs primarily. Still, clean install CUDA toolkit prevents header/lib conflicts.

Ditch apt version—use NVIDIA’s repo for Ubuntu CUDA consistency:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-0 # Match your driver

Adjust for 22.04 if needed (ubuntu2204). nvcc --version should hit 13.0. No runtime needed—driver provides it.

Why this over apt? Distro CUDA lags and bundles old libs, tripping PyCUDA compile/link. Post-install, export PATH=/usr/local/cuda-13.0/bin:$PATH in ~/.bashrc.


Proper PyCUDA Installation

PyCUDA compiles fine, but runtime link fails? Ensure build env matches:

pip uninstall pycuda
CUDA_PATH=/usr/local/cuda pip install pycuda --no-cache-dir

Or with venv: specify NVCC=/usr/local/cuda/bin/nvcc. Pycuda.autoinit skips extra flags, but verify ldd $(python -c "import pycuda.driver as d; print(d.__file__)") links to NVIDIA libs.

Pro tip: In multi-GPU like your 3x P102-100, set CUDA_VISIBLE_DEVICES=0,1,2 if testing specific cards. Works on other servers? Copy their /etc/ld.so.conf.d/nvidia.conf or env vars.


Advanced Troubleshooting Steps

If basics flop:

  1. Kernel params: Add nvidia-drm.modeset=1 to GRUB (sudo nano /etc/default/grub, update GRUB_CMDLINE_LINUX_DEFAULT, sudo update-grub, reboot).
  2. Containers/WSL? Rare for servers, but nvidia-docker or WSL2 needs extras—your nvidia-smi says bare metal.
  3. ECC/Compute mode: P102-100 defaults compute-ready, but nvidia-smi -e 0 per GPU if ECC blocks.
  4. Logs deep dive: journalctl -u nvidia-persistenced or dmesg | grep UVM.
  5. Driver downgrade: Try 550 if 580 buggy—P102-100 loves it on Ubuntu 24.04.

Compare servers: dpkg -l | grep nvidia, kernel versions. Diff there? Harmonize.


Verification and Testing

Success markers:

lsmod | grep nvidia_uvm # Loaded?
nvidia-smi # GPUs good
python3 -c "import pycuda.autoinit; import pycuda.driver as drv; print(drv.Device(0).name())" # Prints 'NVIDIA P102-100'

Run a kernel test:

python
import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule
mod = SourceModule("__global__ void test(){}")

No traceback? You’re golden. Benchmark across servers—P102-100 hits expected FLOPS.


Sources

  1. How to remove cuInit failed: unknown error in CUDA (PyCuda) — Stack Overflow fix using nvidia-modprobe for Ubuntu: https://stackoverflow.com/questions/53369652/how-to-remove-cuinit-failed-unknown-error-in-cuda-pycuda
  2. pycuda._driver.Error: cuInit failed: unknown error — Matches P102-100 nvidia-smi success but PyCUDA fail: https://stackoverflow.com/questions/79861471/pycuda-driver-error-cuinit-failed-unknown-error
  3. Version 580.95.05(Linux) - NVIDIA Data Center GPU Driver — Official P102-100 support and module troubleshooting: https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-95-05/index.html
  4. Nvidia P102-100 10GB Mining GPU - Ubuntu 24.04 — Driver recommendations for P102-100 on recent Ubuntu: https://hardwarerecs.stackexchange.com/questions/18640/nvidia-p102-100-10gb-mining-gpu-ubuntu-24-04
  5. pycuda._driver.Error: cuInit failed: unknown error — NVIDIA forums on module issues with similar setups: https://forums.developer.nvidia.com/t/pycuda-driver-error-cuinit-failed-unknown-error/199168

Conclusion

Reload nvidia-uvm with sudo nvidia-modprobe -u, verify modules, and match CUDA toolkit to your 580 driver—that nails cuInit failed for PyCUDA on Ubuntu CUDA with P102-100 most times. Tweak for your server’s quirks, test rigorously, and it’ll sync with your other machines. Persistent headaches? Dive into dmesg—NVIDIA’s ecosystem rewards persistence.

Authors
Verified by moderation
Moderation
Fix PyCUDA cuInit Failed Error on Ubuntu CUDA GPUs