You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Supporting Purview/RMS-encrypted and sensitivity-labeled documents in SimpleChat involves three capabilities with different technical approaches and complexity levels:
Reading sensitivity labels from documents sourced from OneDrive/SharePoint (via Microsoft Graph API) — most feasible, moderate effort
Decrypting RMS/MIP-protected documents server-side so they can be processed through the existing pipeline — feasible but complex, significant prerequisites
No handling of encrypted/protected documents exists. If a password-protected or RMS-protected file is uploaded, Azure Document Intelligence will fail and the processing errors out with no special handling.
No OneDrive/SharePoint file browsing or import integration exists.
All MS Graph API calls use delegated (user session) auth only — no app-only client_credentials flow.
The app already requests User.Read, User.ReadBasic.All, People.Read.All, Group.Read.All scopes via MSAL.
Documents have an existing tag system (tags array in Cosmos DB, propagated to search index chunks) and metadata fields (document_classification, title, authors, etc.).
SDK/API landscape:
Approach
SDK/API
Python Support
Notes
MIP SDK (File SDK)
C++, .NET, Java — no Python
No native Python SDK
Can decrypt, read labels, remove protection. Requires C++ wrapper or .NET subprocess.
MS Graph API — driveItem/extractSensitivityLabels
REST API
Yes (via msgraph-sdk-python or raw HTTP)
Reads sensitivity labels from files already in OneDrive/SharePoint. Cannot decrypt files.
MS Graph API — driveItem/assignSensitivityLabel
REST API
Yes
Assigns labels to files in OneDrive/SharePoint. Metered API (charges apply).
MS Graph API — driveItem/content (download)
REST API
Yes
Downloads file content. If the file is RMS-encrypted, the downloaded bytes are still encrypted.
Azure RMS Super User
PowerShell cmdlets
PowerShell only (subprocess)
Can bulk-decrypt files. Requires Enable-AipServiceSuperUserFeature + admin config.
Set-FileLabel PowerShell cmdlet
PowerShell
PowerShell only
Part of PurviewInformationProtection module — can apply/remove labels and encryption.
Critical constraint:
The MIP SDK has no Python bindings. The three available language bindings are C++, .NET, and Java. For a Flask/Python backend, this means decryption requires either:
A .NET/Java microservice sidecar that the Python app calls
A subprocess shelling out to PowerShell or a .NET CLI tool
Using the MS Graph API approach (which only works for files already in OneDrive/SharePoint, not arbitrary uploads)
Steps
Phase 1: Detect Protected Documents on Upload (Low effort, high value)
Add a protection detection step in process_document_upload_background() in functions_documents.py — before dispatching to the format-specific handler, inspect the uploaded file for RMS/MIP encryption signatures:
For Office files (DOCX/XLSX/PPTX): check for the EncryptedPackage stream in the OLE compound file, or check for LabelInfo/MSOEncryptionInfo XML parts in the OOXML package
For PDF files: check for /Encrypt dictionary entries and Microsoft IRM markers
Use the olefile Python library to detect compound files with encrypted streams
Use python-docx / zipfile to check for [Content_Types].xml containing customXml with MIP label metadata
Add new fields to the Cosmos DB document schema: protection_status (enum: none, rms_encrypted, sensitivity_labeled, password_protected), sensitivity_labels (array of label objects with id, name, assignment_method), protection_source (e.g., purview, rms, onedrive)
When a protected document is detected but cannot be decrypted, set status to a new value like "Protected - requires decryption" instead of erroring out, and store the protection metadata. Return this status to the UI so the user sees a clear explanation.
Phase 2: Read Sensitivity Labels via Graph API (Moderate effort)
Add a new OneDrive/SharePoint file import route — rather than uploading a file from the user's machine, allow importing a file directly from OneDrive/SharePoint via Graph API:
GET /me/drive/root/children to browse OneDrive files
POST /drives/{drive-id}/items/{item-id}/extractSensitivityLabels to read labels
GET /drives/{drive-id}/items/{item-id}/content to download the file content
This requires adding Files.Read.All scope to the MSAL configuration in config.py and the app registration
After extracting sensitivity labels via Graph, resolve label IDs to human-readable names using the Graph Information Protection API or a configured label-name mapping in admin settings.
Store the extracted label metadata on the document record in Cosmos DB (using the new sensitivity_labels field from Step 2) and propagate to search index chunks (similar to how document_tags are propagated today).
Option A — .NET MIP SDK Sidecar Microservice: Build a small ASP.NET Web API or Azure Function that:
Accepts an encrypted file via HTTP POST
Uses the MIP File SDK (Microsoft.InformationProtection.File NuGet package) to decrypt it
Returns the decrypted file content and extracted label metadata
Requires: Entra ID app registration with Azure Rights Management Services > user_impersonation + Microsoft Information Protection Sync Service > UnifiedPolicy.User.Read API permissions
Requires: The service account/app to be added as an RMS Super User via Add-AipServiceSuperUser PowerShell cmdlet + Enable-AipServiceSuperUserFeature
Requires: An Information Protection Integration Agreement (IPIA) with Microsoft if the app is released publicly
Option B — PowerShell Subprocess: Use subprocess from Python to call the Set-FileLabel PowerShell cmdlet (from the PurviewInformationProtection module) on the server to strip protection before processing. Simpler for internal-only deployments but less scalable and requires PowerShell to be installed on the app server.
Option C — Graph API "Download via SharePoint" (limited): For files sourced from OneDrive/SharePoint, SharePoint can sometimes serve unprotected content when the requesting user/app has appropriate permissions. This only works for files that use labels backed by Azure RMS (not DKE — double key encryption), and the app must have Sites.ReadWrite.All or equivalent permissions.
Integrate the chosen decryption mechanism into the existing process_document_upload_background() flow: detect protection → call decryption service → receive plaintext file → continue with normal processing pipeline → store protection metadata.
Add UI indicators in the document list views (personal, group, public workspaces) showing protection status: a badge/icon for sensitivity_labeled, rms_encrypted, etc. Modify the templates and JavaScript in workspace-documents.js and similar files.
Add sensitivity label names to the existing tag system or as a separate metadata section in the document detail view. Consider auto-tagging documents with their sensitivity label name (e.g., auto-applying a tag purview:confidential).
Add admin settings for configuring the MIP integration: enable/disable, sidecar service URL (for Option A), label-to-tag mapping rules, and accepted protection levels.
Phase 5: Configuration & Admin (Moderate effort)
Add new admin configuration settings in functions_settings.py / settings UI:
enable_mip_integration (boolean)
mip_sidecar_url (string, URL of the .NET decryption microservice)
mip_client_id, mip_tenant_id (for the separate MIP app registration)
allowed_sensitivity_levels (list — which label classifications are allowed for upload)
auto_tag_sensitivity_labels (boolean — auto-create tags from labels)
Register a new Entra ID app (or extend the existing one) with the additional API permissions: Files.Read.All, InformationProtection.Read.All (for label resolution), Azure Rights Management Services > Content.SuperUser (for the decryption service)
Verification
Phase 1: Upload a known RMS-protected DOCX and verify the app detects it and shows "Protected - requires decryption" status instead of an error
Phase 2: Browse OneDrive from the app, select a labeled file, verify label metadata appears in the document record
Phase 3: Upload an RMS-encrypted file, verify it gets decrypted and processed with full text extraction, and the original labels are stored
Phase 4: Verify sensitivity badges appear in document lists and search results include label-based filtering
Decisions
Decision: Decryption approach — Option A (.NET MIP SDK sidecar) is recommended over Option B (PowerShell subprocess) for production reliability and scalability, but Option B is faster to prototype internally
Decision: No native Python MIP SDK exists — this is the core constraint; any decryption capability requires a non-Python component
Decision: Phase 1 (detection) and Phase 4 (UI) are independent of the decryption capability and can ship first to give users visibility into why some documents fail processing
Decision: Graph API approach (Phase 2) works only for OneDrive/SharePoint-sourced files, not arbitrary file uploads from disk — both paths should be supported
Decision: RMS Super User feature is a tenant-level admin decision with security implications (audit logging, restricted access) — this must be documented and configured by the customer's IT admin, not automated
TL;DR
Supporting Purview/RMS-encrypted and sensitivity-labeled documents in SimpleChat involves three capabilities with different technical approaches and complexity levels:
Key Findings from Research
Current state:
client_credentialsflow.User.Read,User.ReadBasic.All,People.Read.All,Group.Read.Allscopes via MSAL.tagsarray in Cosmos DB, propagated to search index chunks) and metadata fields (document_classification,title,authors, etc.).SDK/API landscape:
driveItem/extractSensitivityLabelsmsgraph-sdk-pythonor raw HTTP)driveItem/assignSensitivityLabeldriveItem/content(download)Enable-AipServiceSuperUserFeature+ admin config.Set-FileLabelPowerShell cmdletPurviewInformationProtectionmodule — can apply/remove labels and encryption.Critical constraint:
The MIP SDK has no Python bindings. The three available language bindings are C++, .NET, and Java. For a Flask/Python backend, this means decryption requires either:
Steps
Phase 1: Detect Protected Documents on Upload (Low effort, high value)
Add a protection detection step in
process_document_upload_background()infunctions_documents.py— before dispatching to the format-specific handler, inspect the uploaded file for RMS/MIP encryption signatures:EncryptedPackagestream in the OLE compound file, or check forLabelInfo/MSOEncryptionInfoXML parts in the OOXML package/Encryptdictionary entries and Microsoft IRM markersolefilePython library to detect compound files with encrypted streamspython-docx/zipfileto check for[Content_Types].xmlcontainingcustomXmlwith MIP label metadataAdd new fields to the Cosmos DB document schema:
protection_status(enum:none,rms_encrypted,sensitivity_labeled,password_protected),sensitivity_labels(array of label objects withid,name,assignment_method),protection_source(e.g.,purview,rms,onedrive)When a protected document is detected but cannot be decrypted, set
statusto a new value like"Protected - requires decryption"instead of erroring out, and store the protection metadata. Return this status to the UI so the user sees a clear explanation.Phase 2: Read Sensitivity Labels via Graph API (Moderate effort)
Add a new OneDrive/SharePoint file import route — rather than uploading a file from the user's machine, allow importing a file directly from OneDrive/SharePoint via Graph API:
GET /me/drive/root/childrento browse OneDrive filesPOST /drives/{drive-id}/items/{item-id}/extractSensitivityLabelsto read labelsGET /drives/{drive-id}/items/{item-id}/contentto download the file contentFiles.Read.Allscope to the MSAL configuration inconfig.pyand the app registrationAfter extracting sensitivity labels via Graph, resolve label IDs to human-readable names using the Graph Information Protection API or a configured label-name mapping in admin settings.
Store the extracted label metadata on the document record in Cosmos DB (using the new
sensitivity_labelsfield from Step 2) and propagate to search index chunks (similar to howdocument_tagsare propagated today).Phase 3: Decrypt RMS-Protected Documents (High effort, significant prerequisites)
Option A — .NET MIP SDK Sidecar Microservice: Build a small ASP.NET Web API or Azure Function that:
Microsoft.InformationProtection.FileNuGet package) to decrypt itAzure Rights Management Services>user_impersonation+Microsoft Information Protection Sync Service>UnifiedPolicy.User.ReadAPI permissionsAdd-AipServiceSuperUserPowerShell cmdlet +Enable-AipServiceSuperUserFeatureOption B — PowerShell Subprocess: Use
subprocessfrom Python to call theSet-FileLabelPowerShell cmdlet (from thePurviewInformationProtectionmodule) on the server to strip protection before processing. Simpler for internal-only deployments but less scalable and requires PowerShell to be installed on the app server.Option C — Graph API "Download via SharePoint" (limited): For files sourced from OneDrive/SharePoint, SharePoint can sometimes serve unprotected content when the requesting user/app has appropriate permissions. This only works for files that use labels backed by Azure RMS (not DKE — double key encryption), and the app must have
Sites.ReadWrite.Allor equivalent permissions.Integrate the chosen decryption mechanism into the existing
process_document_upload_background()flow: detect protection → call decryption service → receive plaintext file → continue with normal processing pipeline → store protection metadata.Phase 4: UI & Metadata Display (Low-moderate effort)
Add UI indicators in the document list views (personal, group, public workspaces) showing protection status: a badge/icon for
sensitivity_labeled,rms_encrypted, etc. Modify the templates and JavaScript inworkspace-documents.jsand similar files.Add sensitivity label names to the existing tag system or as a separate metadata section in the document detail view. Consider auto-tagging documents with their sensitivity label name (e.g., auto-applying a tag
purview:confidential).Add admin settings for configuring the MIP integration: enable/disable, sidecar service URL (for Option A), label-to-tag mapping rules, and accepted protection levels.
Phase 5: Configuration & Admin (Moderate effort)
Add new admin configuration settings in
functions_settings.py/ settings UI:enable_mip_integration(boolean)mip_sidecar_url(string, URL of the .NET decryption microservice)mip_client_id,mip_tenant_id(for the separate MIP app registration)allowed_sensitivity_levels(list — which label classifications are allowed for upload)auto_tag_sensitivity_labels(boolean — auto-create tags from labels)Register a new Entra ID app (or extend the existing one) with the additional API permissions:
Files.Read.All,InformationProtection.Read.All(for label resolution),Azure Rights Management Services>Content.SuperUser(for the decryption service)Verification
"Protected - requires decryption"status instead of an errorDecisions