The Caption Processor is a command-line utility designed to process SMPTE Timecode-based captions. It reads an input file containing timecode, speaker names, and speech text, and outputs a formatted text file with the specified modifications.
- Removes all lines containing timecode.
- Formats speaker names in bold followed by a colon.
- Consolidates speech from the same speaker into a single block.
- Convert first character of appended text to lowercase if previous text doesn't end with period
To install the required dependencies, run:
pip install -r requirements.txt
A txt file formatted as follows
00:00:00:00 - 00:00:22:19
Speaker 1
I'm going to tell you something very important
00:00:22:20 - 00:00:27:22
Speaker 1
but I think you'll have to a wait a while before I tell you
00:00:27:24 - 00:00:30:02
Speaker 2
This is very frustrating.
To run the utility, use the following command in the terminal:
python src/main.py <input_file> <output_file>.md
Replace <input_file> with the path to your input file containing the SMPTE Timecode-based captions, and <output_file> with the desired path for the output file.
brew install pipx
pipx ensurepath
cd to this application directory
pipx install .
pipx install --force .
To run the tests, navigate to the tests directory and execute:
pytest
This will run all unit tests defined in the test_processor.py file.
Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.