General info
- Our Linux machine is equipped with two NVIDIA RTX6000 ‘Ada Lovelace’ GPUs. The compute capability of each of these GPUs is 8.9 (see here)
- CUDA is a closed-source, proprietary parallel computing platform and application programming interface (API) that allows certain softwares to utilize GPUs. We currently run CUDA v12.2, which is installed in /usr/local/cuda on our Linux machine. You can always check the version by running
nvcc —version
Monitoring GPU usage
nvidia-smi
from the CUDA toolkit is provides a very basic output that is useful to check status and usage of GPUs, but several other software tools offer a better UI with the same or more info, includingnvtop
andnvitop
Troubleshooting
- Occasionally, the commands above return an error complaining that no GPU is detected. This can happen if our Ubuntu OS was updated which can break the GPU drivers (even if a minor update can do this). To fix this issue:
- Next, auto update all drivers with
sudo ubuntu-drivers autoinstall
. This often won’t solve the problem, but it’s still a good idea to do this step. - Finally, update the specific driver you need, based on what you saw with the output from the first step above. In this case, it would be
sudo apt install nvidia-driver-525
- Restart the computer for your changes to take effect. Now you should be good to go!
‣
ubuntu-drivers devices
, to produce an output showing the GPUs currently recognized on the server. Here’s an example of what you should see on our server.