Skip to main content

Installing NVIDIA Drivers (SXM)

Introduction

NVIDIA SXM GPUs are specialized for high-performance computing and deep learning applications, offering superior performance and efficiency compared to standard PCIe GPUs. At Ori, we offer two SXM GPU types, V100 and H100. SXMs deliver more power to the GPU, enabling higher performance and better utilization of the GPU's capabilities.

Installation Steps for Nvidia Cuda 12.5.1 (Nvidia driver 555) on H100 SXM

Ubuntu 24.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Ubuntu 22.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Ubuntu 20.04

#!/bin/bash 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Verification

After reboot, reconnect to the VM, and verify the driver is installed and it's version.

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:09:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Verify NVlink is disabled.

nvidia-smi -q | grep -A5 Fabric

Expected output:

    Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health

This needs to be N/A, NOT In Progress.

Installation Steps for Nvidia Cuda 12.5.1 (Nvidia driver 555) on V100 SXM

Ubuntu 24.04

#!/bin/bash 

sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin &&
sudo mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2404-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2404-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2404-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update && sudo apt-get -y install cuda-toolkit-12-5 &&
sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update && sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Ubuntu 22.04

#!/bin/bash 

sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin &&
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2204-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update && sudo apt-get -y install cuda-toolkit-12-5 &&
sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update && sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo
sudo reboot

Ubuntu 20.04

#!/bin/bash 

sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin &&
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2004-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2004-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2004-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 && sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update
&& sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot

Verification

After reboot, reconnect to the VM, and verify the driver is installed and it's version.

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:09:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Verify NVlink is disabled.

nvidia-smi -q | grep -A5 Fabric

Expected output:

    Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health

This needs to be N/A, NOT In Progress.