Issue Overview
After upgrading from a nVidia Quadro M2000 to an RTX 3060 on my TrueNAS Scale system, I ran into an issue where the Plex app failed to start when selecting the GPU for hardware acceleration. The error was pretty generic but since everything else was working and based on what I was doing I assumed it had to do with my GPU upgrade.
The error itself was pretty generic, at least to me.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/job.py", line 509, in run
await self.future
File "/usr/lib/python3/dist-packages/middlewared/job.py", line 556, in __run_body
rv = await self.middleware.run_in_thread(self.method, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1367, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1364, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 268, in nf
rv = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 55, in nf
res = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 183, in nf
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/apps/crud.py", line 287, in do_update
app = self.update_internal(job, app, data, trigger_compose=app['state'] != 'STOPPED')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/apps/crud.py", line 325, in update_internal
compose_action(app_name, app['version'], 'up', force_recreate=True, remove_orphans=True)
File "/usr/lib/python3/dist-packages/middlewared/plugins/apps/compose_utils.py", line 57, in compose_action
raise CallError(
middlewared.service_exception.CallError: [EFAULT] Failed 'up' action for 'plex' app, please check /var/log/app_lifecycle.log for more details
Steps Taken to Diagnose and Resolve
1. Verified GPU Detection
First I wanted to double check that the card was being detected properly. Prior to this step I had tried going into the Plex App and removing the GPU which allowed Plex to start (further reinforcing my guess that the GPU was at fault) and enabled it again which caused the issue once more. I tried disabling, restarting and enabling but same deal.
• I ran ran nvidia-smi -L
to confirm that TrueNAS recognized the RTX 3060.

- I ran
nvidia-smi
to look for errors which showed it was being detected and seemed like it was working right.

• Checked the installed NVIDIA driver version (550.127.05) and CUDA version (12.4).
Checking the OTHER logs
There is another place to look. /var/log/app_lifecycle.log contains logs. I’m not super familiar with this log but from my understanding it is a recent list of logs from K3s updates. This includes pulls, and errors when starting apps.
Using sudo nano /var/log/app_lifecycle.log
I was able to spot the next clue. Comparing the UUID from the nvidia-smi -L
command and the GPU that this is having errors with helped me get closer to the issue.

At this point, I think the K3s configuration is somehow holding onto the old GPU configuration. I unfortunately don’t know K3s so let’s figure out where configurations are stored.
Fixing Incorrect GPU Assignments in K3s
Weirdly, finding the mnt point for the Kubernetes configuration was harder then I expected, coming from Unraid. It turns out that it was stored within /mnt/ but hidden with a .. I found this by using df -h
(Disk Filesystem command with -h to make it human readable). I saw that the configs were stored in /mnt/.ix-apps/

At this point I was in the home stretch.
This might expose my lack of Kubernetes knowledge but I needed to figure out the configuration paths and update the GPU. I assumed it would be contained within a plex folder, so I looked at /mnt/.ix-apps/app_configs/Plex/ but the only configuration files were stored in versioned folders which I think might be used for rollback functionality?
In the end it seems like K3s stores all the configuration within one file in /mnt/.ix-apps/user_config.yaml. After a little scrolling there it was.

I was a little worried the IOMMU was the 0000:81:00.0 but I guess it wasn’t. All I had to do was update the uuid value with the new gpu, save and restart the plex app.
This was a bit of a pain, but it was a great learning experience. The best part is that it really helped me understand Kubernetes configuration better. The next thing I need to do is setup regular backup of my K3s configuration to my UNAS Pro.
Leave a Reply