Install CUDA 12.4 On Ubuntu 22.04 With Docker GPU Setup

by Alex Johnson 56 views

This comprehensive guide walks you through the process of installing CUDA 12.4 on Ubuntu 22.04 and configuring Docker to utilize your host GPU. This setup is crucial for machine learning, deep learning, and other GPU-accelerated applications. We will cover everything from driver installation to Docker configuration, ensuring a smooth and efficient workflow. By following this guide, you'll be able to harness the power of your GPU within Docker containers, optimizing performance and resource utilization.

I. System Drivers

Before diving into the installation process, it's essential to ensure that your system recognizes the NVIDIA GPU and that the appropriate drivers are installed. This foundational step ensures that CUDA can communicate with your hardware effectively.

  1. Verifying NVIDIA Graphics Card Recognition:

Start by checking if your system has detected the NVIDIA graphics card. Open your terminal and run the following command:

lspci | grep -i nvidia

This command lists all PCI devices and filters the output to show only those related to NVIDIA. If your NVIDIA card is correctly detected, you'll see an output displaying information about your GPU, such as its model name. If nothing is displayed, it indicates that the system hasn't recognized the GPU, and you may need to check your hardware connections or BIOS settings.

  1. Checking Existing Driver Installation:

Next, determine if you already have NVIDIA drivers installed. The nvidia-smi (NVIDIA System Management Interface) command provides detailed information about your GPU and the installed drivers. Run the following command in your terminal:

nvidia-smi

If the drivers are installed, the output will display information like the driver version, CUDA version, and GPU utilization. If you encounter an error such as "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver," it suggests that the drivers are either not installed or not functioning correctly. This is a critical step, as CUDA requires a compatible driver version to operate correctly.

A successful output would look similar to this:

Mon Nov  3 14:46:32 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76                 Driver Version: 550.76         CUDA Version: 12.4     |
...
  1. Adding the Official Drivers PPA (Personal Package Archive):

To ensure you have access to the latest drivers, add the official NVIDIA drivers PPA to your system's software sources. This PPA is maintained by the graphics drivers team and provides updated drivers for NVIDIA GPUs. Use the following command:

sudo add-apt-repository ppa:graphics-drivers/ppa

This command adds the PPA to your system's list of software sources. You'll be prompted to press Enter to continue. Once added, your system will be able to find and install the latest NVIDIA drivers from this source.

  1. Listing Available Drivers:

To see a list of available drivers for your GPU, use the ubuntu-drivers devices command. This command identifies your GPU and suggests suitable drivers from the added PPA. Run the following command:

sudo apt install ubuntu-drivers-common
ubuntu-drivers devices

The output will show a list of recommended and available drivers. It's crucial to choose a driver that is compatible with CUDA 12.4. The output will display various driver versions, along with their descriptions and recommendations. Pay attention to the "recommended" driver, as it is typically the most stable and suitable option for your hardware.

A sample output might look like this:

== /sys/devices/pci0000:00/0000:00:03.0 ==
modalias : pci:v000010DEd00002684sv000010DEsd000016F3bc03sc00i00
vendor   : NVIDIA Corporation
manual_install: True
driver   : nvidia-driver-580 - distro non-free recommended
driver   : nvidia-driver-570-server - distro non-free
driver   : nvidia-driver-535 - distro non-free
...
  1. Installing the Recommended Driver:

Install the recommended driver version using apt. Replace nvidia-driver-570 with the driver version you identified in the previous step. Execute the following command:

sudo apt install nvidia-driver-570

This command downloads and installs the specified NVIDIA driver along with its dependencies. During the installation, you may be prompted to configure Secure Boot. Follow the on-screen instructions to complete the process. After the installation, it's highly recommended to reboot your system to ensure the new drivers are loaded correctly.

After the installation, it's essential to reboot your system to apply the changes. Reboot your machine with the following command:

sudo reboot

By completing these steps, you ensure that your system is properly equipped with the necessary NVIDIA drivers, setting the stage for a successful CUDA installation.

II. CUDA Installation

With the drivers in place, the next step is to install the CUDA Toolkit. CUDA is a parallel computing platform and programming model developed by NVIDIA, enabling significant increases in computing performance by harnessing the power of GPUs. This section guides you through downloading and installing the CUDA Toolkit 12.4, which is essential for developing and running GPU-accelerated applications.

  1. Downloading the CUDA Toolkit:

First, you need to download the CUDA Toolkit installer from NVIDIA's website. Visit the NVIDIA CUDA Toolkit download page: NVIDIA CUDA Toolkit Download Archive. Select the appropriate target operating system, architecture, distribution, and version (Ubuntu 22.04 in this case). This will generate the commands you need to download and install CUDA. It’s critical to choose the correct options to ensure compatibility with your system.

After selecting your system specifications, the website will provide you with a series of commands. These commands include downloading the CUDA installer and running it with the necessary options. Copy these commands, as you’ll need them in the next step.

  1. Executing the Installation Commands:

Open your terminal and navigate to the directory where you want to download the CUDA installer. It’s common to use the /download directory or your home directory. Paste the commands you copied from the NVIDIA website into the terminal and execute them. The commands will typically look like this:

sudo wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run

The wget command downloads the CUDA installer, and the sh command executes the installer. The installation process will begin, and you'll be presented with a series of prompts and options.

  1. Following the Installation Prompts:

During the installation, you'll be prompted to accept the CUDA EULA (End User License Agreement). Read through the agreement and accept it to proceed. The installer will then ask you to choose the installation options. It’s crucial to pay attention to these options to avoid conflicts with existing drivers or installations.

The installer will present several options, including installing the NVIDIA drivers, the CUDA Toolkit, and the CUDA Samples. If you have already installed the NVIDIA drivers in the previous section, it is important to deselect the driver installation option in the CUDA installer. This prevents potential conflicts between different driver versions. Select the options for the CUDA Toolkit and CUDA Samples.

The installer will also ask for the installation directory. The default directory is /usr/local/cuda-12.4/, which is generally recommended. Once you’ve made your selections, the installer will proceed with the installation.

  1. Verifying Successful Installation:

After the installation completes, you should see a message confirming the successful installation of the CUDA Toolkit. The message will also provide important information about setting up environment variables. The success message will typically look like this:

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.4/

Please make sure that
 -   PATH includes /usr/local/cuda-12.4/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.4/lib64, or, add /usr/local/cuda-12.4/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.4/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 550.00 is required for CUDA 12.4 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

Pay close attention to any warnings or errors during the installation process. If you encounter issues, consult the CUDA installation guide or NVIDIA’s documentation for troubleshooting steps. The warning message in the example indicates that the CUDA driver was not installed, which is expected if you chose to skip driver installation. However, it also emphasizes the importance of having a compatible driver version (at least 550.00) for CUDA 12.4 to function correctly.

  1. Setting Up Environment Variables:

To use CUDA, you need to add the CUDA installation directory to your system’s PATH and LD_LIBRARY_PATH environment variables. This allows the system to find the CUDA binaries and libraries. Open your shell configuration file (e.g., ~/.zshrc for Zsh or ~/.bashrc for Bash) using a text editor:

vi ~/.zshrc

Add the following lines to the end of the file:

export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

These lines append the CUDA bin and lib64 directories to the PATH and LD_LIBRARY_PATH variables, respectively. Save the file and exit the text editor. Then, apply the changes to your current session by running:

source ~/.zshrc

This command reloads the shell configuration file, making the new environment variables available in your current terminal session. You can verify that the environment variables are set correctly by echoing them in the terminal:

echo $PATH
echo $LD_LIBRARY_PATH
  1. Verifying CUDA Version:

Finally, verify that CUDA is installed correctly by checking the CUDA version. Run the nvcc command, which is the NVIDIA CUDA Compiler driver:

nvcc -V

If CUDA is installed correctly, the output will display the CUDA compiler version and build information. A successful output will look similar to this:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

By following these steps, you ensure that the CUDA Toolkit is installed correctly and that your system is configured to use it. This is a crucial step for developing and running GPU-accelerated applications.

III. cuDNN Installation

cuDNN (CUDA Deep Neural Network library) is a GPU-accelerated library of primitives for deep learning. It provides highly tuned implementations of standard routines such as convolution, pooling, normalization, and activation layers. Installing cuDNN can significantly improve the performance of deep learning applications that run on CUDA-enabled GPUs. This section guides you through the process of downloading and installing cuDNN for CUDA 12.4.

  1. Downloading cuDNN:

To download cuDNN, you need to visit the NVIDIA cuDNN download page. You will need to have an NVIDIA Developer account and be logged in to access the downloads : NVIDIA cuDNN Download. Select the cuDNN version that corresponds to your CUDA version (in this case, CUDA 12.4). Download the cuDNN package for Linux.

  1. Extracting cuDNN:

Once the download is complete, navigate to the directory where you downloaded the cuDNN package. The package will be in the tar.xz format. Extract the contents of the package using the tar command:

sudo tar -xvf ./cudnn-linux-x86_64-9.10.2.21_cuda12-archive.tar.xz

This command extracts the cuDNN files into a directory named cudnn-linux-x86_64-9.10.2.21_cuda12-archive.

  1. Copying cuDNN Files to CUDA Directory:

Next, you need to copy the cuDNN files to the corresponding CUDA directories. The cuDNN package contains include files (headers) and library files. These files need to be placed in the CUDA include and lib64 directories, respectively. Use the following commands to copy the files:

sudo cp cudnn-linux-x86_64-9.10.2.21_cuda12-archive/include/cudnn*.h /usr/local/cuda-12.4/include
sudo cp -P cudnn-linux-x86_64-9.10.2.21_cuda12-archive/lib/libcudnn* /usr/local/cuda-12.4/lib64

The first command copies all header files (cudnn*.h) from the cuDNN include directory to the CUDA include directory (/usr/local/cuda-12.4/include). The second command copies the cuDNN library files (libcudnn*) from the cuDNN lib directory to the CUDA lib64 directory (/usr/local/cuda-12.4/lib64). The -P option preserves symbolic links, which is important for maintaining the correct library dependencies.

  1. Setting Permissions:

After copying the files, you need to set the appropriate permissions to ensure that the cuDNN libraries are accessible. Use the following command to set read permissions for all users:

sudo chmod a+r /usr/local/cuda-12.4/include/cudnn*.h /usr/local/cuda-12.4/lib64/libcudnn*

This command grants read permissions to all users for the cuDNN header files and library files.

  1. Verifying cuDNN Installation:

To verify that cuDNN is installed correctly, you can check the cuDNN version by examining the cudnn_version.h header file. Use the following command:

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

This command filters the contents of the cudnn_version.h file to display the cuDNN major, minor, and patchlevel versions. A successful output will look similar to this:

#define CUDNN_MAJOR 9
#define CUDNN_MINOR 10
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 10000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

/* cannot use constexpr here since this is a C-only file */

The output shows the cuDNN version, which confirms that cuDNN is installed and accessible. You can also verify the installation by running deep learning applications that utilize cuDNN. If the applications run without errors and show performance improvements, it indicates that cuDNN is working correctly.

By completing these steps, you successfully install cuDNN, which enhances the performance of your deep learning applications. cuDNN provides optimized implementations of deep learning primitives, enabling faster training and inference times.

IV. Docker and Docker Compose Installation

Docker is a platform for developing, shipping, and running applications in containers. Containers allow you to package an application with all of its dependencies, ensuring that it runs consistently across different environments. Docker Compose is a tool for defining and running multi-container Docker applications. Together, Docker and Docker Compose provide a powerful way to manage and deploy GPU-accelerated applications.

  1. Installing Docker:

To install Docker on Ubuntu, follow these steps:

  • Update the package index:

    sudo apt update
    

    This command updates the list of available packages and their versions.

  • Install Docker:

    sudo apt install docker.io
    

    This command installs the Docker package along with its dependencies. You may be prompted to confirm the installation by typing Y and pressing Enter.

  1. Installing Docker Compose:

Docker Compose is a tool for defining and running multi-container Docker applications. To install Docker Compose, use the following command:

sudo apt install docker-compose

This command installs Docker Compose from the Ubuntu repositories. Alternatively, you can install Docker Compose using pip, the Python package installer, which allows you to install the latest version.

  1. Verifying Docker Installation:

To verify that Docker is installed correctly, run the following command:

docker --version

This command displays the Docker version, confirming that Docker is installed and running. A successful output will look similar to this:

Docker version 25.0.0, build xxxxx
  1. Verifying Docker Compose Installation:

Similarly, to verify that Docker Compose is installed correctly, run the following command:

docker-compose --version

This command displays the Docker Compose version, confirming that Docker Compose is installed and running. A successful output will look similar to this:

docker-compose version v2.23.3

By installing Docker and Docker Compose, you set the foundation for containerizing and deploying your GPU-accelerated applications. Docker provides a consistent and reproducible environment, while Docker Compose simplifies the management of multi-container applications.

V. Troubleshooting Common Issues

During the installation and configuration process, you may encounter some common issues. This section provides solutions to these problems, ensuring a smooth experience.

A. Permission Denied When Uploading Files via Xftp

If you encounter a "permission denied" error when uploading files to your Ubuntu system using Xftp, there are a couple of workarounds:

  1. Upload to /home/ubuntu and Move:

Upload the files to the /home/ubuntu directory, which typically has write permissions for the ubuntu user. Then, use the mv command to move the files to the desired location:

sudo mv /home/ubuntu/yourfile /destination/directory

This method allows you to bypass permission issues by uploading to a directory you have access to and then moving the files to the final destination with elevated privileges.

  1. Modify Directory Permissions:

Change the permissions of the target directory to grant write access to your user. Use the chmod command to modify the directory permissions:

sudo chmod -R 777 /destination/directory

This command grants read, write, and execute permissions to all users for the specified directory and its contents. However, it’s crucial to use this method cautiously, as it can pose security risks. A more secure approach is to grant ownership of the directory to your user:

sudo chown -R $USER:$USER /destination/directory

This command changes the owner and group of the directory to your current user, allowing you to write files without needing elevated privileges.

B. ImportError: libGL.so.1: cannot open shared object file in Docker

This error indicates that a shared library, libGL.so.1, which is part of the OpenGL library, is missing within the Docker container. This issue typically arises when running graphical applications or applications that depend on OpenGL, such as those using OpenCV.

To resolve this issue, you need to install the libgl1-mesa-glx package (or libgl1 as a fallback) in your Dockerfile. Add the following lines to your Dockerfile before installing your Python dependencies:

RUN apt-get update && \
    apt-get install -y libgl1-mesa-glx --no-install-recommends || \
    (echo "libgl1-mesa-glx not found, trying libgl1..." && apt-get install -y libgl1 --no-install-recommends) && \
    rm -rf /var/lib/apt/lists/*

This Dockerfile snippet first updates the package index and then attempts to install libgl1-mesa-glx. If that package is not found, it tries to install libgl1. The --no-install-recommends flag reduces the size of the installed packages by avoiding the installation of recommended dependencies. After the installation, the command removes the package lists to further reduce the image size.

C. `could not select device driver