4 minute read

You have a Proxmox server with a GPU and want to enable hardware accelleration on possibly different services, such as Ollama, Plex, Frigate etc. One option is to create a VM on Proxmox and do a GPU-passthrough to that VM. Then your services may have access to the GPU in that VM.

What if you want to keep your services isolated and don’t want them in the same VM? Then GPU-passthrough is off the table since you may only pass it to a single VM. In this case, there is a second option which enables you to share your GPU with multiple services that don’t necessarily run in the same VM. The catch, however, is that you cannot use virtual machines (VM). So you have to use LXC containers with this solution. This is not necessarily bad option, since the chances are near-zero for an LXC container being unable to run standalone services.

Pros:

  • Share GPU with multiple LXC containers.
  • Isolated services with hardware acceleration.
  • LXC => Easier maintenance, deployment, backup and restore.

Cons:

  • Does not work with VMs.
  • Does not work if GPU-passthrough is done with another VM.

Shout out: These instuctions are majorly based on Yomi’s excellent blog post here. Feel free to follow that if you choose to, or fail with the instructions below.

My proxmox host is Debian (11) Bullseye, and LXC container is Ubuntu (22.04) Jammy Jellyfish.

Install Nvidia driver on the Proxmox host

Important: Make sure the GPU is not passed through any existing VM on the Proxmox supervisor.

Download driver

We will not use apt repository to install the driver, but instead download the .run file from Nvidia servers. This is crucial because we want to install the exact same driver in both Proxmox host and LXC container.

Find a recent driver from Nvidia archive:
https://download.nvidia.com/XFree86/Linux-x86_64/

Note: I used version 550.127.05. Check your application’s requirements to choose an appropriate version.

Tip: Use Nvidia’s driver search tool if the link above does not work.

Install driver

Run .run file with --dkms flag, which is important to install the kernel modules. Do not install display-related (xorg etc.) modules during installation.

$ sh NVIDIA-Linux-x86_64-550.127.05.run --dkms

Reboot proxmox host. When you log back in, run nvidia-smi to check if the driver installed correctly:

$ nvidia-smi
Sat Oct 26 23:19:53 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05   Driver Version: 550.127.05   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   47C    P8    13W / 180W |      1MiB /  8117MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Remember: You may need to repeat this process (possibly with more recent driver version) if you upgrade the proxmox kernel (e.g., upgrade proxmox from Debian 11 to Debian 12).

Share GPU with LXC Container

Next, we will share the kernel modules on the proxmox host with an LXC container.

You may use an existing LXC container. Otherwise create a new one if you haven’t done already.

LXC Configuration on the Proxmox host

Still on the proxmox host, list nvidia devices:

$ ls -al /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Oct 25 22:57 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 25 22:57 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Oct 25 22:57 /dev/nvidia-modeset
crw-rw-rw- 1 root root 234,   0 Oct 25 22:57 /dev/nvidia-uvm
crw-rw-rw- 1 root root 234,   1 Oct 25 22:57 /dev/nvidia-uvm-tools

Note down the numbers in the column next to the group column. They are 195 and 234 in my case, yours could be different.

My LXC container ID is 105. I opened its configuration file with nano. You may as well use other preferred text editor, such as vi, vim, nvim etc.

$ nano /etc/pve/lxc/105.conf

Append the following to the .conf file. Remember to use the numbers you found earlier (195 and 234).

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 234:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=f>
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

Configuration of the Proxmox host is complete here. Rest of the instructions are to be done in the LXC container.

Install driver on LXC container

Start the LXC and switch to console. Download the exact same NVIDIA-Linux-x86_64-550.127.05.run driver into the LXC container and run with --no-kernel-module flag to skip the installation of kernel modules. It will share them with the Proxmox host.

Do not install display-related (xorg etc.) modules during installation.

$ sh NVIDIA-Linux-x86_64-550.127.05.run --no-kernel-module

Reboot the LXC container and run nvidia-smi after login. You should see the same output as in the proxmox host:

$ nvidia-smi
Sun Oct 27 04:16:31 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05   Driver Version: 550.127.05   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   48C    P8    13W / 180W |      1MiB /  8117MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

You can now run your service in the LXC container with GPU hardware acceleration!

Updated: