Skip to content

fix: update packet evaluation metric#95

Open
regularizer wants to merge 3 commits intoawslabs:devfrom
regularizer:fix/packet_evaluation
Open

fix: update packet evaluation metric#95
regularizer wants to merge 3 commits intoawslabs:devfrom
regularizer:fix/packet_evaluation

Conversation

@regularizer
Copy link
Copy Markdown

Updated packet evaluation metric in doc_split module to cover single page documents in the packet.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@adiadd
Copy link
Copy Markdown
Contributor

adiadd commented Mar 31, 2026

Thanks for this @regularizer - would we be able to soundly address/change/adjust the failing test cases to be correct and aligned?

Comment on lines -311 to -321
def test_single_page_groups_excluded(self):
"""Groups with only 1 page should not appear in ordering scores."""
data = [
_page("invoice", "inv-01", 1, "invoice", "inv-01", 1),
_page("form", "form-01", 2, "form", "form-01", 2),
]
df = pd.DataFrame(data)
scores = calculate_ordering_score_per_group(df)
assert len(scores) == 0
assert calculate_average_ordering_score(scores) == 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic change looks good but we're deleting the test for this code path without replacing it - can you swap in a test that asserts single-page groups now get 1.0 instead? something like:

   def test_single_page_groups_score_perfect(self):
       """Single-page groups should receive a perfect ordering score of 1.0."""
       data = [
           _page("invoice", "inv-01", 1, "invoice", "inv-01", 1),
           _page("form", "form-01", 2, "form", "form-01", 2),
       ]
       df = pd.DataFrame(data)
       scores = calculate_ordering_score_per_group(df)
       assert len(scores) == 2
       assert all(v == 1.0 for v in scores.values())
       assert calculate_average_ordering_score(scores) == 1.0

just wanna make sure the path we're changing stays covered. rest lgtm

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants