Skip to content

Warnings and errors are not persisted between runs #49

@SeanClay10

Description

@SeanClay10

Problem

When running the pipeline, especially in batch mode over large folders of PDFs, warnings and errors are only visible in the terminal during the run. If a PDF fails to extract, triggers an OCR fallback, or causes an LLM extraction error, that information is lost as soon as the terminal session ends. There is currently no way to review what went wrong after the fact without re-running the pipeline.

Impact

  • Silent failures in overnight or unattended batch runs are undetectable after the fact
  • No audit trail of which files produced warnings across multiple runs
  • Debugging recurring issues requires re-running the pipeline and watching the terminal

Proposed solution
Add persistent logging using Python's standard logging module. All existing terminal output should remain unchanged. Warnings and errors should additionally be written to a rotating log file (e.g. logs/fracfeed.log) that accumulates across runs.

  • No changes to CLI output or researcher-facing behaviour
  • Clean runs should produce no log entries
  • Log file should rotate automatically to avoid unbounded growth
  • Should follow standard Python logging conventions — no third-party dependencies

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions