configd: race condition in script_output.py cached_results cleanup causes KeyError

## Important notices

- [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

## Describe the bug

The caching mechanism added in #7515 has a race condition in `script_output.py`'s `execute()` method. The class-level `Action.cached_results` dict is shared across all configd worker threads with no synchronization. When multiple API calls arrive concurrently, two threads can race on the cache cleanup loop:

```python
# Thread A iterates cached keys
for key in list(Action.cached_results.keys()):
    if Action.cached_results[key]['expire'] < now:          # Thread A reads key
        os.remove(Action.cached_results[key]['filename'])    # Thread B deletes key first → KeyError
        del Action.cached_results[key]
```

Thread A snapshots the keys and begins iterating. Thread B concurrently expires and deletes the same key. Thread A then hits a `KeyError` on line 71 or 72.

## To reproduce

Any frequent API poller triggers this reliably. In my case, [opnsense-exporter](https://github.com/AthennaMind/opnsense-exporter) scraped by Prometheus every 15 seconds. Each scrape fires multiple concurrent OPNsense API calls (gateways, services, firewall stats), each spawning a configd thread that enters the cache cleanup path.

Within a few hours of a `service configd restart`, the cache re-corrupts and tracebacks recur every few minutes:

```
unable to sendback response for <uuid>, message was Traceback (most recent call last):
  File "/usr/local/opnsense/service/modules/processhandler.py", line 188, in run
    result = self.action_handler.execute(...)
  File "/usr/local/opnsense/service/modules/processhandler.py", line 355, in execute
    return action_obj.execute(action_params, message_uuid, connection)
  File "/usr/local/opnsense/service/modules/actions/script_output.py", line 71, in execute
    os.remove(Action.cached_results[key]['filename'])
KeyError: '<sha256_hash>'
```

Also seen on line 72 (`del Action.cached_results[key]`) and occasionally `FileNotFoundError` on `/tmp/tmpcfd_*` when two threads race to `os.remove()` the same temp file.

## Expected behavior

Cache cleanup should be thread-safe. Concurrent API calls should not corrupt the shared cache dict.

## Suggested fix

Wrap the cleanup loop in a try/except:

```python
for key in list(Action.cached_results.keys()):
    try:
        if Action.cached_results[key]['expire'] < now:
            try:
                os.remove(Action.cached_results[key]['filename'])
            except (KeyError, FileNotFoundError):
                pass
            del Action.cached_results[key]
    except KeyError:
        pass
```

Or use a `threading.Lock` around all access to `Action.cached_results`.

## Environment

- OPNsense 25.1 (also reproducible on 24.7)
- Triggered by opnsense-exporter with 15s Prometheus scrape interval
- Workaround: increased scrape interval to 60s, which reduces but doesn't eliminate the race

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configd: race condition in script_output.py cached_results cleanup causes KeyError #9864

Important notices

Describe the bug

To reproduce

Expected behavior

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

configd: race condition in script_output.py cached_results cleanup causes KeyError #9864

Description

Important notices

Describe the bug

To reproduce

Expected behavior

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions