-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Open
Copy link
Labels
Description
Description
Add support for translating completed transcriptions into other languages. The translation should preserve the format of the original output (either plain text or subtitle formats such as .srt, .vtt, etc.), and be available both as a single-file action and as a bulk operation.
The source language is already known from transcription metadata, so the user only needs to select the target language from a dropdown list to trigger the translation.
Expected Behavior
- User selects one or more completed transcriptions
- A “Translate” button becomes available
- User chooses a target language (e.g., English, German, etc.)
- The system translates:
- full text files (
.txt,.docx) - or subtitle files (
.srt,.vtt) depending on transcription mode used
- full text files (
- Output files retain original formatting and structure but are translated
- Translated files are stored and/or made available for download alongside the originals
Bulk Support
- User can select multiple transcriptions (using the checkboxes in the list view)
- All selected files are translated in batch to the selected target language
- Translation progress and status should be displayed per file
Technical Notes
- Translation runs as a post-processing step (after transcription)
- Suggested on-prem translation tools:
- Helsinki-NLP MarianMT models via HuggingFace Transformers
- Argos Translate (Python, offline-capable)
- Translation must preserve:
- Unicode integrity
- Line structure and timing for subtitle files
- Subtitle text should retain block structure and timing of the original transcribed subtitle file
- Long documents should be segmented and translated in batches for efficiency
Priority
Medium — to be implemented after core transcription and formatting workflows are stable
Reactions are currently unavailable