Installing NVIDIA Drivers (SXM)
Introduction
NVIDIA SXM GPUs are specialized for high-performance computing and deep learning applications, offering superior performance and efficiency compared to standard PCIe GPUs.
Installation Steps for Nvidia Cuda 12.8.1 (Nvidia driver 570) on H200SXM
Ubuntu 22.04
sudo apt-get update
sudo apt-get install -y nvidia-driver-570-open
sudo apt-get install nvidia-fabricmanager-570
sudo systemctl enable --now nvidia-fabricmanager
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda-repo-ubuntu2204-12-8-local_12.8.1-570.124.06-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-8-local_12.8.1-570.124.06-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
sudo apt-get install -y nvidia-driver-570-open
sudo apt-get install nvidia-fabricmanager-570
sudo systemctl enable --now nvidia-fabricmanager
sudo reboot
Verification
After reboot, reconnect to the SC cluster, and verify the driver is installed and its version.
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H200 Off | 00000000:18:00.0 Off | 0 |
| N/A 31C P0 77W / 700W | 1MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H200 Off | 00000000:2A:00.0 Off | 0 |
| N/A 31C P0 78W / 700W | 1MiB / 143771MiB | 0% Default |
| | | Disabled |
...