Skip to main content

Installing NVIDIA Drivers (Debian)

Introduction

To fully leverage the capabilities of your GPU-accelerated VM within the Ori Global Cloud (OGC) platform, it is essential to install the appropriate NVIDIA drivers. This ensures that your VM can efficiently utilize the underlying hardware for GPU-intensive tasks such as machine learning, deep learning, and high-performance computing. This guide will walk you through the process of installing NVIDIA GPU drivers on your VM.

Prerequisites

  • Ensure that your VM instance is available.
  • Verify that you have access to your VM via ssh. Use the Ori provided ssh command on the VM details page.

Installation Steps

All the steps are based in the following guide: https://wiki.debian.org/NvidiaGraphicsDrivers#Debian_12_.22Bookworm.22

  1. Update repositories:
sudo apt update
  1. Upgrade all packages to last version:
sudo apt upgrade -y
  1. Check if GPu is detected:
lspci | grep -i nvidia
  1. Install the Nvidia driver compilation prerequisites by using:
apt -y install linux-headers-$(uname -r) build-essential libglvnd-dev pkg-config

(https://wiki.debian.org/NvidiaGraphicsDrivers#Prerequisites)

  1. Reboot your system with:
systemctl reboot
  1. Add "contrib", "non-free" and "non-free-firmware" components to /etc/apt/sources.list, for example:
deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
  1. Update the list of available packages, then we can install the nvidia-driver package, plus the necessary firmware:
sudo apt update
sudo apt install nvidia-driver firmware-misc-nonfree
  1. Reboot your system with:
sudo reboot
  1. Now disable the default nouveau GPU driver. To do that, create and open a new configuration file:
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
  1. Add the following lines to the file. Save the changes and exit. In nano, press Ctrl+X, then confirm with Y and press Enter:
blacklist nouveau
options nouveau modeset=0
  1. Rebuild the kernel initramfs with:
sudo update-initramfs -u
  1. Reboot your system with:
sudo reboot
  1. Verify it is now working:
debian@t1-le-45-gra7:~$ nvidia-smi
Fri Feb 9 22:39:13 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:00:06.0 Off | 0 |
| N/A 32C P0 25W / 250W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Troubleshooting

If you encounter issues during the driver installation, consider the following troubleshooting steps:

  • Check the NVIDIA developer forums and knowledge base for solutions to common issues.
  • Ensure that any previous NVIDIA driver installations are completely removed before attempting a fresh installation.

Additional Resources

For more detailed instructions, advanced configurations, and troubleshooting advice, refer to the official NVIDIA documentation available on the NVIDIA developer pages. You can also find support and community discussions that may assist with unique installation scenarios or issues.