-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Improve the quality of automatically generated subtitles by introducing a language-aware post-processing step that refines block segmentation and line breaks.
The goal is to move beyond purely rule-based splitting and achieve more natural, readable subtitles that better follow linguistic structure.
Background
Current subtitle generation is primarily driven by deterministic rules such as:
- character limits
- reading speed
- timing and pauses
While this ensures technically valid subtitles, it often leads to:
- unnatural line breaks
- splitting of phrases that should stay together
- suboptimal grouping of text into blocks
- reduced readability, especially for longer or more complex speech
High-quality subtitle segmentation requires an understanding of language, not just timing and length.
Proposed Functionality
Introduce a refinement step that:
- reviews an already generated subtitle segmentation
- adjusts block boundaries where appropriate
- improves line breaks within blocks
- takes into account both timing and linguistic structure
This step should work conservatively:
- existing segmentation should be kept if already acceptable
- changes should only be made where there is clear improvement
Suggested Approach
To achieve language-aware segmentation, this feature may leverage a language model (e.g. LLM or similar) capable of evaluating linguistic structure in context.
The model would act as a refinement layer on top of the existing rule-based segmentation, focusing on improving readability rather than generating new text.
The exact implementation is intentionally left open.
Important Constraints
- The original transcription must remain unchanged
- No words may be added, removed, or altered
- Only segmentation (blocks and line breaks) may be adjusted
Linguistic Guidelines
The refinement step should aim to follow established subtitle and readability principles, including:
- Keep syntactic units together where possible
- Avoid splitting:
- verb + particle (e.g. "ta upp", "gå igenom")
- auxiliary + main verb
- prepositional phrases
- names and fixed expressions
- Prefer breaks at natural clause or sentence boundaries
- Avoid leaving very short trailing lines
- Aim for balanced line lengths within a block
- Ensure each subtitle block forms a coherent and readable unit
- Use pauses in speech as guidance for segmentation, but not as the sole deciding factor
Expected Improvements
- More natural line breaks
- Better grouping of words into readable units
- Reduced need for manual editing
- Subtitles that align better with established captioning practices
Acceptance Criteria
- Subtitles remain technically valid (timing, line limits, etc.)
- No changes to the underlying text content
- Improved readability compared to current output
- Clear reduction in manual corrections needed in the editor
Notes
This feature is intended as a quality improvement layer on top of the existing subtitle generation pipeline, not a replacement of it.
Implementation details (e.g. model choice, architecture, prompt design) are intentionally left open.