Skip to content

[6.17]NVIDIA: VR: SAUCE: firmware: smccc: lfa: fix work item re-initialization race#343

Open
nirmoy wants to merge 1 commit intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:lfa_fix
Open

[6.17]NVIDIA: VR: SAUCE: firmware: smccc: lfa: fix work item re-initialization race#343
nirmoy wants to merge 1 commit intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:lfa_fix

Conversation

@nirmoy
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy commented Mar 13, 2026

Move INIT_WORK() for fw_images_update_work from update_fw_images_tree() to lfa_init() so the work item is initialized once at module load rather than re-initialized on every firmware image tree update. Re-initializing a work item that may already be queued is unsafe and can corrupt the workqueue.

Add flush_workqueue() in lfa_notify_handler() before rescanning the image list to ensure any pending remove_invalid_fw_images work completes first, preventing use-after-free on the image list.

Fixes: 1dd9a8f ("NVIDIA: VR: SAUCE: firmware: smccc: add support for Live Firmware Activation (LFA)")

LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2138342

…ion race

Move INIT_WORK() for fw_images_update_work from update_fw_images_tree()
to lfa_init() so the work item is initialized once at module load rather
than re-initialized on every firmware image tree update. Re-initializing
a work item that may already be queued is unsafe and can corrupt the
workqueue.

Add flush_workqueue() in lfa_notify_handler() before rescanning the
image list to ensure any pending remove_invalid_fw_images work completes
first, preventing use-after-free on the image list.

Fixes: 1dd9a8f ("NVIDIA: VR: SAUCE: firmware: smccc: add support for Live Firmware Activation (LFA)")
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy changed the title NVIDIA: VR: SAUCE: firmware: smccc: lfa: fix work item re-initialization race [6.17]NVIDIA: VR: SAUCE: firmware: smccc: lfa: fix work item re-initialization race Mar 13, 2026
Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 13, 2026

@nirmoy I agree this fixes the issue, but the usage convention seems a bit awkward with the single global work struct.

Basically, if anyone calls update_fw_images_tree() they need to ensure the workqueue is flushed before calling again. Maybe it would be cleaner to dynamically allocate the work struct in update_fw_images_tree(), INIT/enqueue it, and then free in the handler? Then we don't need to "serialize" the flushes and calling update_fw_images_tree(). Of course, the downside to that approach is what to do if the kmalloc fails...

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 13, 2026

@nirmoy I agree this fixes the issue, but the usage convention seems a bit awkward with the single global work struct.

Basically, if anyone calls update_fw_images_tree() they need to ensure the workqueue is flushed before calling again. Maybe it would be cleaner to dynamically allocate the work struct in update_fw_images_tree(), INIT/enqueue it, and then free in the handler? Then we don't need to "serialize" the flushes and calling update_fw_images_tree(). Of course, the downside to that approach is what to do if the kmalloc fails...

Nirmoy and I met and reviewed his proposed changes and the workqueue API, specifically queue_work(). That service is tolerant of the work item already residing in the list, so I no longer have a concern about the usage convention. The key change that is being made via this PR is moving INIT_WORK to the init() path so that it is only invoked once.

Copy link
Copy Markdown
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further issues or concerns from me.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 13, 2026

PR sent to Canonical.

@KobaKoNvidia
Copy link
Copy Markdown
Collaborator

this PR is a fix for real bug I encountered

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
CPU: 0 UID: 0 PID: 4521 Comm: kworker/u1409:6
Workqueue: remove_invalid_fw_images (fw_images_update_wq)
pc : process_one_work+0xd4/0x430
lr : worker_thread+0x310/0x430
Code: 91020279 d2800401 53041ee0 b9003260 (f94006b8)

Acked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants