Installing NVIDIA Drivers (SXM)
Introduction
NVIDIA SXM GPUs are specialized for high-performance computing and deep learning applications, offering superior performance and efficiency compared to standard PCIe GPUs. At Ori, we offer two SXM GPU types, V100 and H100. SXMs deliver more power to the GPU, enabling higher performance and better utilization of the GPU's capabilities.
Installation Steps for Nvidia Cuda 12.5.1 (Nvidia driver 555) on H100 SXM
Ubuntu 24.04
#!/bin/bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot
Ubuntu 22.04
#!/bin/bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot
Ubuntu 20.04
#!/bin/bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb &&
sudo dpkg -i cuda-keyring_1.1-1_all.deb &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 &&
sudo apt-get install -y nvidia-driver-555-open &&
sudo apt-get install -y cuda-drivers-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot
Verification
After reboot, reconnect to the VM, and verify the driver is installed and it's version.
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:09:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Verify NVlink is disabled.
nvidia-smi -q | grep -A5 Fabric
Expected output:
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
This needs to be N/A, NOT In Progress.
Installation Steps for Nvidia Cuda 12.5.1 (Nvidia driver 555) on V100 SXM
Ubuntu 24.04
#!/bin/bash
sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin &&
sudo mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2404-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2404-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2404-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update && sudo apt-get -y install cuda-toolkit-12-5 &&
sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update && sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot
Ubuntu 22.04
#!/bin/bash
sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin &&
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2204-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update && sudo apt-get -y install cuda-toolkit-12-5 &&
sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update && sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo
sudo reboot
Ubuntu 20.04
#!/bin/bash
sudo apt update && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin &&
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2004-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo dpkg -i cuda-repo-ubuntu2004-12-5-local_12.5.1-555.42.06-1_amd64.deb &&
sudo cp /var/cuda-repo-ubuntu2004-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/ &&
sudo apt-get update &&
sudo apt-get -y install cuda-toolkit-12-5 && sudo add-apt-repository ppa:graphics-drivers/ppa --yes && sudo apt update
&& sudo apt install -y nvidia-driver-555 &&
echo "blacklist nvidia_uvm" | sudo tee /etc/modprobe.d/nvlink-denylist.conf &&
echo "options nvidia NVreg_NvLinkDisable=1" | sudo tee /etc/modprobe.d/disable-nvlink.conf &&
sudo update-initramfs -u && sudo reboot
Verification
After reboot, reconnect to the VM, and verify the driver is installed and it's version.
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:09:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Verify NVlink is disabled.
nvidia-smi -q | grep -A5 Fabric
Expected output:
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
This needs to be N/A, NOT In Progress.