-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
When running the pipeline, especially in batch mode over large folders of PDFs, warnings and errors are only visible in the terminal during the run. If a PDF fails to extract, triggers an OCR fallback, or causes an LLM extraction error, that information is lost as soon as the terminal session ends. There is currently no way to review what went wrong after the fact without re-running the pipeline.
Impact
- Silent failures in overnight or unattended batch runs are undetectable after the fact
- No audit trail of which files produced warnings across multiple runs
- Debugging recurring issues requires re-running the pipeline and watching the terminal
Proposed solution
Add persistent logging using Python's standard logging module. All existing terminal output should remain unchanged. Warnings and errors should additionally be written to a rotating log file (e.g. logs/fracfeed.log) that accumulates across runs.
- No changes to CLI output or researcher-facing behaviour
- Clean runs should produce no log entries
- Log file should rotate automatically to avoid unbounded growth
- Should follow standard Python logging conventions — no third-party dependencies
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels