Skip to content

AE-1225: add request id to worker logs#80

Open
jhcipar wants to merge 4 commits intomainfrom
jchipar/AE-1545/add-request-id-to-worker-logs
Open

AE-1225: add request id to worker logs#80
jhcipar wants to merge 4 commits intomainfrom
jchipar/AE-1545/add-request-id-to-worker-logs

Conversation

@jhcipar
Copy link
Contributor

@jhcipar jhcipar commented Mar 16, 2026

AE-1225: add request id to worker logs

Copy link

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. ContextVar usage

Correct pattern — ContextVar.set() returns a Token, and each async task gets its own context in Python's async model. Request IDs are properly isolated across concurrent jobs.

2. Bug (Medium): _generated can be NoneTypeError on first job

_load_generated_handler() returns None if the generated handler file is missing or fails to import. _generated is typed as Optional[Callable], and the check in handle_job is:

if not self._generated:
    self._generated = self._load_generated_handler()
result = await self._generated(job)  # TypeError if still None

If _load_generated_handler() returns None (missing file, import error), the call on the next line crashes with TypeError: 'NoneType' object is not callable. The error message won't mention the real cause.

Suggest: raise a descriptive error immediately after the assignment if _generated is still None.

3. Issue: _fmt is a private CPython attribute

setup_logging accesses handler.formatter._fmt to read the current format string. _fmt is a CPython implementation detail — not part of the public logging.Formatter API. It works today but could break on PyPy or a future CPython release.

Safer alternative: store the desired format string as a constant and use it directly when constructing the formatter, rather than reading it back from the installed formatter.

4. Log format change — third column shifts

The new format inserts %(request_id)s as the third space-separated field (after timestamp and level). Any log parser or monitoring rule that extracts fields by position will silently start reading the wrong column.

Worth a note in the PR description or release notes if downstream log consumers exist (Datadog, CloudWatch, Loki pipelines).

5. Test gaps

  • _extract_request_id has three extraction paths (event["id"], event["job_id"], event["job"]["id"]) plus a UUID fallback — none are covered by tests
  • set_request_id edge cases not covered: empty string, whitespace-only, None
  • No concurrency isolation test (two concurrent tasks should not bleed request IDs)
  • ensure_request_id_filter is called for both root logger and new handlers — the idempotency guard (isinstance check) prevents double-registration, but this is not tested

The first two are the higher-value gaps given the three-path extraction logic.

Verdict

PASS with two asks: fix the _generated is None crash path, and add tests for _extract_request_id's three extraction paths.

@jhcipar jhcipar changed the title AE-1545: add request id to worker logs AE-1225: add request id to worker logs Mar 16, 2026
Copy link

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: UUID fallback not tested

_extract_request_id has four paths: id, job_id, job.id, and UUID fallback. The three lookup paths are tested. The fallback (empty event → uuid.uuid4()) has no test. Worth adding a test_falls_back_to_uuid_when_no_id_fields case to confirm the fallback fires and returns a non-empty string.


Nit: get_request_id() is exported but unused in the PR

Exported from logger.py, not called anywhere in the diff. Fine if the intent is for user code to include request ID in their own log lines — but a short docstring comment would make that intent clear.


Verdict

Clean implementation. ContextVar + token-based reset is the right approach for concurrent request isolation. The filter deduplication, the None handler early-fail with a clear error message, and the LogStreamer hookup are all correct.

@jhcipar
Copy link
Contributor Author

jhcipar commented Mar 18, 2026

AE-1225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants