Skip to content

Fix/20260408#832

Merged
paullizer merged 2 commits intoDevelopmentfrom
fix/20260408
Apr 8, 2026
Merged

Fix/20260408#832
paullizer merged 2 commits intoDevelopmentfrom
fix/20260408

Conversation

@paullizer
Copy link
Copy Markdown
Collaborator

  • Legacy Office Binary Upload Support
    • Added native OLE-based support for older Word .doc and PowerPoint .ppt files instead of relying on OOXML-only assumptions during processing.
    • Legacy .doc uploads now extract available metadata and follow the same shared document-processing workflow used for richer Office files, so enhanced citations and final metadata extraction stay consistent when those features are enabled.
    • Legacy .ppt uploads now extract slide text and available summary metadata from the OLE presentation streams while keeping the same enhanced-citation and final-metadata workflow used by .pptx uploads.
    • .pptx uploads now also populate presentation metadata such as title, author, subject, and keywords during the initial metadata update when metadata extraction is enabled.
    • (Ref: functions_content.py, functions_documents.py, test_legacy_doc_ole_extraction.py, test_legacy_ppt_ole_extraction.py, legacy Office OLE support and metadata parity)

@paullizer paullizer merged commit 6d0b627 into Development Apr 8, 2026
1 check passed
@paullizer paullizer deleted the fix/20260408 branch April 8, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant