Web-based Interrogation and Testimony via a Neural Engaged Speech System
W.I.T.N.E.S.S. is a fully offline, web-based simulation platform designed to emulate realistic police witness interviews. It combines a local language model (via Ollama) with open-source speech technology including Piper TTS and Faster-Whisper STT, producing authentic dialogue interactions between an interviewer and a virtual witness persona.
All AI logic, persona data, and audio processing occur locally β no external APIs or cloud services are required. The system features both text-based and voice-based interview modes with full speech-to-text and text-to-speech capabilities.
Developed on an M1 Mac using locally-run software. There is significant potential to enhance it further on other platforms using alternate software and internet-enabled services (such as cloud TTS/STT and LLM APIs).
witness-promo.mp4
Static site served by FastAPI from /frontend/.
All pages use a black background with white text (no light/dark mode).
| Page | Function |
|---|---|
index.html |
Home page with persona upload/validation before navigation to Create, Edit, or Interview modes. |
create-persona.html |
Create a new persona from template with inline field validation. |
edit-persona.html |
Edit an existing persona file with validation, export, and cancel safeguards. |
conduct-interview.html |
Main interaction page β conducts text or voice-based interviews with real-time audio. |
admin/index.html |
Admin dashboard providing access to system utilities (hidden from main navigation). |
admin/system-check.html |
Dependency version checker showing installed Python packages and system components. |
admin/voice-demos.html |
Voice preview tool for testing all available Piper TTS voices. |
admin/custom-dictionary.html |
Spell checker configuration for custom words and suspicious word warnings. |
frontend/
βββ index.html
βββ create-persona.html
βββ edit-persona.html
βββ conduct-interview.html
βββ admin/
β βββ index.html
β βββ system-check.html
β βββ voice-demos.html
β βββ custom-dictionary.html
βββ styles.css
βββ images/
βββ temp_audio/ (auto-generated for TTS output)
Production-ready persona files for testing and demonstration:
sample_data/
βββ Sally-Ann-Smith-1993-04-12.json
βββ Rick-Sampson-1987-02-16.json
βββ Mereana-Rangi-1990-06-14.json
βββ Liam-O'Connor-1985-12-02.json
βββ Priya-Patel-1993-09-07.json
βββ Alex-Taylor-1978-04-19.json
βββ Rob-McFlinty-2004-05-07.json
Each persona includes detailed background, facts with certainty/reason fields, and behavioral modifiers.
Located in /backend/, powered by FastAPI (served via uvicorn).
| File | Purpose |
|---|---|
api.py |
Main FastAPI app with all endpoints, text normalization functions (NZ spelling, TTS, Unicode safety), persona validation (4-stage check + spell checking), and route handlers. Runs on port 8010. |
models/llm/llm_handler.py |
Hybrid deterministic + LLM response system. 30+ fact extraction functions, conversation memory, post-processing guardrails. Communicates with local Ollama. |
models/audio/audio_handler.py |
Speech-to-Text (Faster-Whisper) and Text-to-Speech (Piper) integrations with adaptive sample rate handling. |
persona_template.json |
JSON template defining required and optional persona fields. Used for validation and creation workflows. |
custom_dictionary.txt |
User-editable spell checker dictionary for persona names, NZ terms, and technical vocabulary. Hot-reloads on save. |
suspicious_words.txt |
List of valid words that should trigger warnings (e.g., common typos). Hot-reloads on save. |
backend/models/
βββ audio/
β βββ audio_handler.py
β βββ stt/
β β βββ whisper/ (Faster-Whisper cache)
β βββ tts/
β βββ piper-voices/ (ONNX voice models + configs)
βββ common/
β βββ utils.py
βββ llm/
βββ llm_handler.py
All personas are .json files following the persona_template.json structure.
persona_type(Witness/Suspect)persona_voice_model(ONNX filename)persona_voice_speaker_id(integer or null)full_namedate_of_birthhome_addressinterview_instructionspersona_prompt
Each fact in facts_to_provide array contains:
fact- The information the persona knowscertainty- Confidence level (certain/pretty sure/unsure)reason- Why they have that certainty level
| Condition | Message |
|---|---|
| Wrong schema | π΄ "Sorry, this is not a valid W.I.T.N.E.S.S. persona file." |
| Missing key fields | π΄ "Sorry, this W.I.T.N.E.S.S. persona file is missing required fields and cannot be used." |
| All fields empty | π΄ "Sorry, this W.I.T.N.E.S.S. persona file has no information in it and cannot be used." |
| Some fields empty | |
| Spelling warnings | |
| Valid and complete | β Displays name, DoB, address with Proceed/Cancel options. |
All validation messages appear inline under the relevant action buttons. All content uses New Zealand English conventions (e.g., "111" emergency number, metric units).
The system includes an integrated spell checker for persona files:
- Custom Dictionary (
backend/custom_dictionary.txt) - Add persona names, NZ place names, MΔori terms, and technical vocabulary - Suspicious Words (
backend/suspicious_words.txt) - Flag valid words that are commonly typos (e.g., "dunner" for "dinner") - Curly Quote Normalization - Automatically handles both straight (
') and curly (') apostrophes - Export Normalization - All exported persona files use straight quotes to prevent compatibility issues
- Admin Interface - Manage dictionaries via
admin/custom-dictionary.htmlwith hot-reload on save
The LLM handler is the reasoning engine of W.I.T.N.E.S.S.
- Builds context-aware prompts using persona data.
- Maintains dialogue memory and contextual consistency.
- Enforces strict persona immersion (no βI am an AIβ statements).
- Handles contradiction resolution, direction/distance reasoning, and object/scene coherence.
- Uses local Ollama instance for model execution.
OLLAMA_BASE_URL=http://localhost:11434
uvicorn backend.api:app --host 0.0.0.0 --port 8010- Press-to-Talk - Browser-based microphone recording with visual feedback
- STT Pipeline - WebM/Opus upload β WAV conversion β Faster-Whisper transcription
- TTS Pipeline - Piper synthesis with persona-specific voices and adaptive resampling
- End-to-End Voice - Complete speech-to-speech interview workflow
- Text Normalization - NZ English spelling, TTS number/date handling, Unicode safety
Voice model files (.onnx + .onnx.json) are NOT INCLUDED in this repository due to their size. Place downloaded models in backend/models/audio/tts/piper-voices/. See the README.md and piper_curated_audio.csv in that folder for the list of voices used in v1.0 and the URLs from where you can download the voice files.
witness/
βββ backend/
βββ frontend/
βββ sample_data/
βββ venv/
βββ start.sh
βββ requirements.txt
# 1. Create and activate venv (Python 3.10 recommended)
python3.10 -m venv venv
source venv/bin/activate
# 2. Install Python dependencies
pip install -r requirements.txt3. Install system dependencies
macOS (Homebrew):
brew install ollama ffmpeg espeak-ngLinux β Ubuntu/Debian:
curl -fsSL https://ollama.ai/install.sh | sh
sudo apt install ffmpeg espeak-ng libespeak-ng-devLinux β Fedora/RHEL:
curl -fsSL https://ollama.ai/install.sh | sh
sudo dnf install ffmpeg espeak-ng espeak-ng-develNote for macOS users: If Homebrew reports permission errors during install (e.g. for
/opt/homebrew/share/zsh), fix them before continuing:sudo chown -R $(whoami) /opt/homebrew/share/man/man8 /opt/homebrew/share/zsh /opt/homebrew/share/zsh/site-functions brew upgrade
# 4. Pull Ollama model
ollama pull llama3.2:3b-instruct-q4_K_M
# 5. Set environment variables (optional β start.sh sets sensible defaults)
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=llama3.2:3b-instruct-q4_K_M
export PIPER_VOICES_DIR="$PWD/backend/models/audio/tts/piper-voices"The espeak-ng library path must match your platform. start.sh defaults to the macOS Homebrew paths; Linux users should override before running:
macOS (Homebrew, Apple Silicon):
export PHONEMIZER_ESPEAK_LIBRARY=/opt/homebrew/lib/libespeak-ng.dylib
export ESPEAK_DATA_PATH=/opt/homebrew/share/espeak-ng-dataLinux (x86_64):
export PHONEMIZER_ESPEAK_LIBRARY=/usr/lib/x86_64-linux-gnu/libespeak-ng.so.1
export ESPEAK_DATA_PATH=/usr/lib/x86_64-linux-gnu/espeak-ng-dataLinux (ARM64 β e.g. Raspberry Pi):
export PHONEMIZER_ESPEAK_LIBRARY=/usr/lib/aarch64-linux-gnu/libespeak-ng.so.1
export ESPEAK_DATA_PATH=/usr/lib/aarch64-linux-gnu/espeak-ng-dataRecommended (Unified Startup):
./start.shThis script handles Ollama startup and launches the FastAPI server.
Manual Startup:
# Start Ollama separately, then:
uvicorn backend.api:app --host 0.0.0.0 --port 8010 --reload- Main Interface: http://localhost:8010/
- Admin Dashboard: http://localhost:8010/admin/
No internet connection required during operation with the exception of the Admin Dashboard - System Check feature.
- NZ English mandatory - Use NZ spelling and context (metres, licence, 111, favour, colour)
- No embedded personas - Require user upload each session; personas stored in sessionStorage only
- Scenario-agnostic design - All fact extraction uses generic patterns; no hardcoded scenario references
- Facts-first philosophy - Attempt deterministic extraction before LLM calls
- Text normalization required - Apply
normalize_en_nz(),normalize_tts_nz(),normalize_unicode_safe()to all outputs - Stateless backend - Always send full message history; no server-side session state
- Three-field facts - All facts must include: fact, certainty, reason
- CSS centralized - All styles in
styles.cssunless inline specifically required - Frontend = static HTML/JS; Backend = FastAPI only - Clear separation of concerns
- This project is licensed under the GNU General Public Licence v3.0 (GPL v3).
- Voice models are not distributed with this repository; only the metadata CSV (
piper_curated_audio.csv) is included, with download URLs for each voice used. - See the
LICENSEfile in the root of the repository for full licence terms.
- Python: 3.10.19 (3.10 recommended for stability)
- LLM: Ollama (llama3.2:3b-instruct-q4_K_M)
- TTS: Piper TTS 1.4.1 (ONNX voice models)
- STT: Faster-Whisper 1.2.1 (small.en with int8 quantization)
- Inference Engine: CTranslate2 4.6.3
- Backend: FastAPI 0.129.0 + Uvicorn 0.41.0 + Pydantic 2.12.5
- ML Framework: PyTorch 2.7.1
- Frontend: Vanilla HTML/CSS/JavaScript
- Audio: ffmpeg, espeak-ng (phonemizer)
- Platform: Developed on macOS M1 (Apple Silicon)
For full functionality, use a modern browser that supports the WebRTC and MediaRecorder APIs:
- Google Chrome (recommended)
- Mozilla Firefox
Voice mode (microphone recording) requires browser permission for localhost:8010. Safari has limited support for the audio formats used and is not recommended.
- Make sure Ollama is installed:
ollama --version - Make sure the correct model is downloaded:
ollama list - Check the terminal output from
start.shfor error messages - Visit the System Check page:
http://localhost:8010/admin/system-check.html
Large ML packages (PyTorch, Transformers, Faster-Whisper) take longer to load on first launch than on subsequent runs. If start.sh reports "Timed out waiting for FastAPI to become healthy" with an empty log, this is the most likely cause.
- Wait a few seconds after
pip installfinishes, then run./start.shagain β the second run is significantly faster - Check what FastAPI is actually doing:
cat /tmp/witness_fastapi.log - On macOS, ensure Homebrew finished upgrading cleanly (no permission errors) before running
./start.shβ see the installation note above
- Check that your browser has microphone permission for
localhost:8010 - Make sure you are pressing and holding the microphone button while speaking
- Ensure voice model files (
.onnx+.onnx.json) are present inbackend/models/audio/tts/piper-voices/ - Use Chrome or Firefox β Safari has limited MediaRecorder support
- Check the file is a
.jsonfile exported from W.I.T.N.E.S.S. (or matching the template structure) - Ensure all 7 required fields are completed and non-empty
- Review any spelling warnings β these do not prevent validation but may indicate data entry errors
- Review the
persona_promptβ make sure it is specific and detailed - Add more facts to
facts_to_provide, particularly for details the persona should know precisely - Check that certainty and reason fields are completed for each fact
- Ensure the interview has at least one exchange before exporting
- Check that
python-docxis installed: visible on the System Check page
Developed by Philip Roy Architectural and technical development support by ChatGPT (OpenAI) and Claude (Anthropic) All design and testing localised for New Zealand English environment.