Skip to content

feat(detect): Handle rounded-rect redactions and nearly-unicolor bars#198

Open
mlissner wants to merge 5 commits into196-skip-white-on-white-redactions-20260326from
detect-rounded-rect-redactions-20260327
Open

feat(detect): Handle rounded-rect redactions and nearly-unicolor bars#198
mlissner wants to merge 5 commits into196-skip-white-on-white-redactions-20260326from
detect-rounded-rect-redactions-20260327

Conversation

@mlissner
Copy link
Copy Markdown
Member

@mlissner mlissner commented Mar 28, 2026

Summary

  • Detect redaction bars drawn as rounded rectangles (lines + bezier curves) by falling back to drawing["rect"] when no re items exist
  • Replace strict pixmap.is_unicolor with _is_nearly_unicolor(), which checks interior pixels within a tolerance of 5 per channel, excluding a 1px border where rendering artifacts appear
  • Add tools/inspect-pdf.py and tools/debug-pipeline.py for investigating PDFs
  • Detection on the test PDF went from 4 to 85 bad redactions

Test plan

🤖 Generated with Claude Code

mlissner and others added 5 commits March 27, 2026 23:01
This PDF uses rounded rectangles (lines + bezier curves) with slight
color variations for its redaction bars, exercising both the
rounded-rect fallback and the nearly-unicolor pixmap check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes to catch more bad redactions:

1. Fall back to drawing["rect"] when a drawing has no "re" items.
   Some tools draw redaction bars as rounded rectangles using lines
   and bezier curves instead of the "re" command.

2. Replace strict pixmap.is_unicolor with _is_nearly_unicolor(),
   which checks interior pixels (excluding a 1px border) within a
   tolerance. Some PDFs render solid bars as two nearly-identical
   colors (e.g., RGB(34,31,31) vs RGB(35,31,32)).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
inspect-pdf.py dumps redaction-relevant PDF structure (drawings with
fill colors and item types, annotations, text spans).

debug-pipeline.py walks x-ray's detection pipeline step by step,
showing what gets kept/dropped at each stage with pixmap color
analysis for filtered entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mlissner mlissner marked this pull request as ready for review March 28, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant