Skip to content

feat(#131): checkpointed pipeline + fix import path issues#398

Draft
redbbean wants to merge 2 commits intofireform-core:mainfrom
redbbean:feat/checkpoint-tests-docs
Draft

feat(#131): checkpointed pipeline + fix import path issues#398
redbbean wants to merge 2 commits intofireform-core:mainfrom
redbbean:feat/checkpoint-tests-docs

Conversation

@redbbean
Copy link
Copy Markdown

@redbbean redbbean commented Mar 31, 2026

Description

Closes #131

Implements a checkpointed LLM extraction pipeline so that an interrupted run (container crash, Ollama timeout, Ctrl+C) resumes exactly where it left off without re-processing already-extracted fields.

Also fixes a ModuleNotFoundError: No module named 'src' that prevented make exec from running, caused by an incomplete PYTHONPATH. Fixes #116 and #118.

src/llm.py

  • Session ID computed lazily at start of main_loop()
  • Session ID hashes transcript and field names, so two different form templates on the same transcript won't share a checkpoint (addition to PR #131- stateful saving #133, which only hashes the transcript)
  • _get_field_names() handles _target_fields as either a dict (from pypdf) or a list
  • Atomic checkpoint writes via .tmp + os.replace()
  • SIGINT handler so Ctrl+C flushes the checkpoint before exit
  • Retry loop (3 attempts, backoff) on Ollama timeouts with checkpoint saved
  • JSONL error log for -1 responses to help with debugging
  • Renamed json=None param to json_data=None

src/main.py

  • Added missing from typing import Union (pre-existing bug from issue #

docker-compose.yml + Dockerfile

  • PYTHONPATH=/app/src changed to PYTHONPATH=/app:/app/src

docs/session_resumption.md (new)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • PYTHONPATH fix: make exec runs successfully

  • Checkpointing logic: has not been tested

  • Test A — make exec completes without import errors

Test Configuration: Docker Desktop, python:3.11-slim, Ollama + Mistral

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Tests for the checkpoint logic are in progress.

Coming across this issue was interesting because I've implemented a similar checkpointing and resuming process in LLM benchmarking, so I was able to reuse some of the error log, checkpoint write, and retry loop logic. Please let me know if there's anything I can improve!

implemented checkpointing/resuming pipeline:
- hash transcript + field names to avoid collisions from different transcripts
- atomic checkpoint writes to .tmp, then replacing previous checkpoint file
- handling for Ctrl+C using SIGINT
- retry loop for Ollama timeouts
- JSONL error logging for Ollama -1 responses

added docs required by issue fireform-core#131/fireform-core#133
have NOT added testing for the checkpointing yet

added import error resolution from issue fireform-core#116/fireform-core#117 and fireform-core#118/fireform-core#119
added updated Makefile test path from issue fireform-core#380/fireform-core#381

haven't completely closed fireform-core#131/fireform-core#133 yet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Stateful Resumption & Checkpointed Pipeline for LLM Extraction [BUG] : Docker PYTHONPATH Inconsistency

1 participant