Skip to main content

Installing NVIDIA Drivers (SXM)

Introduction

NVIDIA SXM GPUs are specialized for high-performance computing and deep learning applications, offering superior performance and efficiency compared to standard PCIe GPUs.

Installation Steps for Nvidia Cuda 12.5.1 (Nvidia driver 555) on H100/H200 SXM

Ubuntu 24.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Ubuntu 22.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Ubuntu 20.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Verification

After reboot, reconnect to the VM, and verify the driver is installed and its version.

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:09:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Verify NVlink is disabled.

nvidia-smi -q | grep -A5 Fabric

Expected output:

    Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health

This needs to be N/A, NOT In Progress.