Protecting you during the call, not after. When a scammer asks for your OTP, VoxGuard steps in before you hand it over.
To our knowledge, the first real-time multimodal scam detection agent with active intervention. Gemini Live API + Rust WASM + Psychological AI + Natural Voice TTS = Protection in under 80ms.
TL;DR: Open the demo, click START, pick a demo script, watch real-time scam detection happen, then watch the system intervene and block you from giving away your money - with a natural human voice warning you in your language.
1. Open https://voxguard-kappa.vercel.app
2. Click the "MONITOR" tab (default view)
3. Click "START"
4. Click any Demo Script (e.g., "Bank Impersonation")
5. Watch: alerts fire in real-time, threat score rises, psych vectors light up
6. When the caller asks for your OTP -> INTERVENTION OVERLAY fires automatically
7. HEAR: VoxGuard speaks a natural voice warning via Gemini TTS
8. SEE: Multimodal Explanation Card shows WHY this is a scam (audio + visual signals)
9. Take the Verification Challenge or hit SAFE EXIT
10. SEE: Guided Action Agent gives you step-by-step recovery actions for your country
11. Click "REPORT" tab -> see forensic report with intervention history -> export PDF
| Tab | What It Demonstrates |
|---|---|
| MONITOR | 2-way dialog (ME + CALLER), waveform, <80ms alerts, caller HUD, live intervention overlay, natural voice warnings, explanation cards |
| PSYCH | 6 Cialdini vectors + 5 lie detection indicators + user vulnerability (world first) |
| PATTERNS | 50+ grounded patterns with fullscreen detail view + interpretation |
| REPORT | Full transcript, intervention history, guided action plan, forensic export (PDF + HTML), session gallery |
| ABOUT | Architecture + data sources + why this is unprecedented |
Every other tool either blocks calls before they connect or sends a passive alert after the damage is done. VoxGuard is the first system that protects you while the scammer is talking - with a natural voice that warns you, an AI that explains why this is dangerous, and an agent that walks you step-by-step to safety.
Every 30 seconds, someone somewhere in the world loses money to a phone or video call scam.
Last year, my neighborโs father wired $12,000 to someone impersonating a bank representative. He knew scams were everywhere. He had seen the warnings, read the advice, and understood the basics. But when the caller said his account would be frozen within ten minutes and asked for his one-time password, he handed it over immediately.
I kept thinking about that moment. Not because of the money, but because of what it revealed: most anti-scam advice fails at the exact moment it matters. He had a phone, life experience, and enough caution to know better. None of that helped in the thirty seconds when pressure, fear, and urgency took over.
And that gap is far bigger than one family. According to the FBIโs IC3 2024 Annual Report, reported internet crime losses in the United States hit $16.6 billion in 2024. In its 2024 report, the Global Anti-Scam Alliance estimated that consumers worldwide lost more than $1.026 trillion to scams.
So I built VoxGuard: a real-time multimodal AI agent that listens to live conversations, detects scam patterns as they emerge, and intervenes before the damage is done. Not after the call. Not the next day. Right at the moment the scammer asks for your one-time password.
Every existing tool shares one fatal flaw: they act after the damage is done.
"The difference between a scam succeeding and failing is often a single moment of doubt. VoxGuard creates that moment, then forces you to think before you act."
Wiqi Lee
| Feature | Truecaller | Hiya | ScamShield (SG) | VoxGuard |
|---|---|---|---|---|
| Pre-call blocking | Yes | Yes | Yes | No (by design) |
| During-call analysis | No | No | No | Yes (First) |
| Live intervention (blocks fatal actions) | No | No | No | Yes (First) |
| Natural voice intervention (Gemini TTS) | No | No | No | Yes (First) |
| Multimodal explanation cards (audio + vision) | No | No | No | Yes (First) |
| Guided anti-scam action agent | No | No | No | Yes (First) |
| Scenario-based verification challenge | No | No | No | Yes (First) |
| Auto-disconnect countdown | No | No | No | Yes (First) |
| Multimodal (audio + vision) | No | No | No | Yes (First) |
| 2-way transcript (ME + CALLER) | No | No | No | Yes (First) |
| Screen share scam detection | No | No | No | Yes (First) |
| Sub-100ms alert latency | No | No | No | Yes (Rust WASM) |
| Psychological manipulation scoring | No | No | No | Yes (First) |
| Lie detection analysis | No | No | No | Yes (First) |
| User vulnerability scoring | No | No | No | Yes (First) |
| Multi-language support | Partial | Partial | SG only | Yes, 40 languages |
| Per-country recommended actions | No | No | No | Yes (9 countries) |
| Country-specific emergency contacts + reporting | No | No | No | Yes (9 countries) |
| Intervention history in forensic reports | No | No | No | Yes (First) |
| Session gallery with playback | No | No | No | Yes |
| Audio session recording (REC) | No | No | No | Yes (First) |
| Forensic export (PDF + HTML) | No | No | No | Yes |
| Grounded to global scam databases | No | Partial | Partial | Yes |
| Works on any call platform (desktop + mobile) | No | No | No | Yes (browser-based, responsive) |
VoxGuard operates in two modes that can run simultaneously:
Open voxguard-kappa.vercel.app, click START, and select a demo script.
VoxGuard plays a simulated scam conversation with realistic 2-way dialog (ME + CALLER), fires real-time alerts, and triggers live interventions when danger is detected. Demo Mode runs entirely in the browser โ no microphone, no backend, no API key required. This is the fastest way for judges to see the full detection โ intervention โ action agent flow.
Available Demo Scripts: Bank Impersonation, Tech Support Scam, Government/Tax Scam, AI Voice Clone, Digital Arrest, Job Offer Scam, Family Emergency โ plus 9 regional variants in Indonesian, Chinese, Japanese, Korean, Spanish, French, Hindi, Arabic, Malay, and Portuguese.
Click START, then click the ๐ LIVE MIC button. Grant microphone permission.
VoxGuard captures your actual microphone audio at 16kHz Mono PCM via the Web Audio API. Audio is preprocessed through the Rust WASM engine (Wiener noise reduction, spectral subtraction, VAD) and streamed to the backend through WebSocket. The backend sends audio to Gemini for real-time transcription and threat analysis.
How to use with a real call:
- Click START to begin a session
- Click ๐ LIVE MIC โ grant microphone permission
- A green status banner appears:
LIVE MICROPHONE โ Real audio capture active - Put your phone on speaker and place it near your device, or use the same device for both the call and VoxGuard
- VoxGuard listens to the live call and detects scam patterns as they happen
- Optionally click REC to record session audio into the forensic report
- Click ๐ MIC ON again to disconnect the microphone
Requirements: Live Mode requires a running backend on Google Cloud Run with a valid Gemini API key (VITE_WS_URL environment variable pointing to the backend WebSocket endpoint).
Live Mic + Demo Scripts: You can run a demo script while Live Mic is active. The demo provides simulated dialog and alerts, while the microphone captures real ambient audio in parallel. This is useful for demonstrations where you want to show both the scripted flow and the live microphone capability side by side.
Live Mic + REC: Enable both Live Mic and REC to record your actual call audio into the forensic report. REC is only available when Live Mic is active โ it records directly from the microphone. This is the recommended setup for real-world use.
No existing tool we surveyed does this. This is what sets VoxGuard apart.
When VoxGuard's threat engine determines you are about to take an irreversible action, it does not wait for you to check a dashboard. It takes over your screen, speaks to you in a natural human voice, explains exactly why this is dangerous, and forces a decision point.
| Level | Trigger | What Happens |
|---|---|---|
| Threat score crosses 55, or a high-risk manipulation pattern is detected | Amber warning banner. Natural voice: calm, advisory tone. Verify Caller + Safe Exit + Continue With Caution. Speech pauses. | |
| ๐ BLOCK | Threat score crosses 75, or the caller requests OTP / account credentials / gift cards / crypto transfer | Full-screen red overlay. Natural voice: firm, urgent tone. Fatal patterns (OTP/transfer/crypto): Safe Exit only. Verifiable patterns: Verification Challenge + Safe Exit. |
| ๐จ LOCKDOWN | Threat score crosses 90. Confirmed scam with maximum confidence. | Full-screen red lockdown with 30-second auto-disconnect countdown. Natural voice: commanding, sharp tone. Safe Exit only. No challenge - too dangerous. |
Some patterns are so dangerous that VoxGuard does not wait for the threat score to accumulate. These high-lethality patterns trigger an immediate BLOCK-level intervention the moment they are detected, regardless of cumulative score:
- OTP / Credential Extraction ("Read me the code", "Confirm your PIN") โ Safe Exit only
- Safe Account Transfer ("Transfer your funds to this protection account") โ Safe Exit only
- Gift Card Demand ("Purchase prepaid cards and read me the numbers") โ Safe Exit only
- Crypto Transfer Scam ("Send Bitcoin to this wallet address") โ Safe Exit only
These fatal patterns skip the Verification Challenge entirely - when someone is actively extracting your credentials, the only safe action is to disconnect.
This works across all supported languages. If the caller asks for your OTP in Indonesian, Chinese, Japanese, Korean, Spanish, French, Hindi, or Arabic, VoxGuard blocks it instantly.
VoxGuard doesn't just show you a warning - it speaks to you.
When an intervention fires, gemini-2.5-flash-preview-tts generates a natural human voice warning that matches the detected scam type, the urgency level, and the user's language. This is not robotic text-to-speech - it sounds like a real person warning you.
| Level | Voice Profile | Example |
|---|---|---|
| WARN | Calm, advisory (Kore) | "Caution. VoxGuard has detected suspicious patterns in this call. The caller may not be who they claim to be." |
| BLOCK | Firm, authoritative (Puck) | "Stop immediately. The caller is asking for your one-time password. A real bank will never ask for this. Hang up now." |
| LOCKDOWN | Sharp, commanding (Charon) | "Emergency. VoxGuard has confirmed this is a scam. This call will disconnect in 30 seconds." |
Voice warnings are:
- Contextual - different scripts for bank impersonation, OTP extraction, gift card demand, government scam, tech support, and more
- Localized - fully scripted in 9 languages (EN, ID, ZH, JA, KO, ES, FR, HI, AR)
- Graceful fallback - if Gemini TTS is unavailable, falls back to browser speech synthesis automatically
After a high-severity alert, VoxGuard generates an Explanation Card that tells you in plain language why this is dangerous - combining evidence from both audio and visual analysis.
Example Explanation Card:
๐จ CRITICAL: Bank Impersonation + Fake Login Page
๐๏ธ Audio | ๐ฅ๏ธ Screen | 95% confidence
The caller claims to be from your bank's fraud department (Authority tactic) while your screen shows a login page at 'bank-secure-verify.com' - this is NOT your real bank's domain. The combination of voice impersonation and a phishing page is a confirmed scam technique.
โ End this call and call your bank using the number on the BACK of your card.
Each card shows:
- Signal badges - which sources detected the threat (Audio, Screen, or both)
- Confidence score - how certain VoxGuard is
- Expandable signals - detailed breakdown of each detected pattern with source and severity
- Risk factors - specific quotes or elements that triggered the alert
- Recommended action - one clear thing to do right now
When a non-fatal WARN or BLOCK-level intervention fires (e.g., impersonation, fake support, government scare), the user can take a Verification Challenge. Unlike generic questionnaires, VoxGuard's challenges are contextual to the detected scam type:
Bank Impersonation (2-3 questions):
- "Did this caller contact you first, or did you call them?"
- "Are they asking you to share your OTP, PIN, or password?"
- "Did they tell you NOT to call your bank directly?"
Government Impersonation (2 questions):
- "Is this caller threatening arrest or legal action if you don't pay now?"
- "Are they demanding payment via gift cards, crypto, or wire transfer?"
Tech Support Impersonation (3 questions):
- "Did this caller contact you first about a 'virus' or 'security issue'?"
- "Are they asking you to install remote access software?"
- "Are they rushing you to act immediately?"
Each scenario includes 7 supported scam types (bank, government, tech support, investment, family impersonation, prize/lottery, urgency) with a generic fallback for unknown patterns.
After the user answers, VoxGuard provides a clear result:
- โ LIKELY SCAM โ recommends immediate disconnection
- โก EXERCISE CAUTION โ offers "I Will Verify Through Official Channel" action with specific guidance (e.g., "Call the number on the BACK of your bank card")
Challenges are fully localized in 9 languages (EN, ID, ZH, JA, KO, ES, FR, HI, AR).
The End Call - Safe Exit button is the primary protective action. When pressed:
- Voice warning stops immediately (Gemini TTS + browser synthesis)
- Speech synthesis stops immediately
- Demo/call playback halts completely
- Detection stream terminates
- Session status changes to terminated
- Guided Action Agent launches with personalized recovery steps
- App switches to the REPORT tab automatically
- Forensic report displays with full session data, intervention history, and export options
This ensures the user experiences a clear, decisive break from the scam call - not just a UI dismiss.
After Safe Exit, VoxGuard becomes your recovery coach.
Instead of just saying "call your bank", VoxGuard generates a personalized, step-by-step action plan based on the scam type, your country, and how severe the threat was. Each step has a checkbox so you can track your progress.
Example Action Plan (Bank Impersonation, Indonesia, Critical):
๐ก๏ธ Anti-Scam Action Plan
๐จ CRITICAL | ๐ฎ๐ฉ Indonesia | โฑ 15-30 minutes
โ ๏ธ Act within the next 15 minutes. Time is critical.
๐ค Based on the call pattern, the caller was impersonating a bank officer
and attempted to extract your OTP. If you shared any codes, contact
your bank immediately to freeze your account.
โ ๐ต Blokir nomor penelepon di pengaturan HP Anda [immediate]
โ ๐ฆ Hubungi bank menggunakan nomor di BELAKANG kartu ATM [critical]
โ ๐ Minta bank untuk blokir sementara rekening [critical]
โ ๐ Laporkan ke OJK: 157 atau konsumen@ojk.go.id [high]
โ ๐ Laporkan ke Bareskrim: patrolisiber.id atau 110 [recommended]
โ ๐ Ganti password semua akun yang dibicarakan [recommended]
๐จ Emergency: Call 110 if you feel in danger
Supported countries (9):
| Country | Flag | Emergency | Reporting Channels |
|---|---|---|---|
| United States | ๐บ๐ธ | 911 | FTC, FBI IC3, Credit bureaus |
| Indonesia | ๐ฎ๐ฉ | 110 | OJK, Bareskrim, Bank Indonesia |
| China | ๐จ๐ณ | 110 | National Anti-Fraud Center app |
| Japan | ๐ฏ๐ต | 110 | NPA #9110, Consumer Hotline 188 |
| South Korea | ๐ฐ๐ท | 112 | FSS 1332, Cyber investigation |
| Spain | ๐ช๐ธ | 112 | Guardia Civil, INCIBE 017 |
| France | ๐ซ๐ท | 17 | PHAROS, Info Escroqueries |
| India | ๐ฎ๐ณ | 112 | Cybercrime 1930, cybercrime.gov.in |
| Singapore | ๐ธ๐ฌ | 999 | ScamShield app, SPF, NCPC ScamAlert |
Action plans are:
- Personalized - steps prioritized based on what was detected (OTP extraction gets bank-first, gift card gets report-first)
- AI-enhanced - Gemini adds personalized advice based on the specific call transcript
- Interactive - checkbox progress tracking with completion percentage
- Localized - action text in the user's language with local phone numbers and websites
Every intervention displays localized safe exit actions specific to the user's country:
- ๐ต Hang up now
- ๐ฆ Call your bank's real number (the one on the back of your card)
- ๐ฅ Call a trusted family member before doing anything
- ๐ซ Never share OTP / PIN / password on a call
These are available in English, Indonesian, Chinese, Japanese, Korean, Spanish, French, Hindi, and Arabic.
Scammers succeed because they keep victims in a state of panic that shuts down rational thinking. The caller manufactures urgency ("your account will be frozen in 10 minutes"), establishes false authority ("I am calling from the fraud department"), and enforces isolation ("do not contact your bank directly").
VoxGuard's three-layer defense breaks that panic loop:
- Voice Intervention physically interrupts the scammer's narrative with a calm, authoritative human voice speaking directly to the victim
- Explanation Card replaces confusion with clarity, showing exactly which signals triggered the alert and why
- Action Agent replaces helplessness with a concrete plan, providing step-by-step instructions with local emergency numbers
The intervention overlay physically breaks that panic loop. It pauses the conversation. It forces a moment of reflection. The scenario-based verification challenge makes the victim confront the reality of what is happening, with questions tailored to the specific scam type they are experiencing. And if the victim is too far gone to respond, the 30-second lockdown countdown ends the call automatically.
No existing scam detection tool we surveyed does this. Every other system either blocks calls before they connect (which misses new numbers) or sends a passive notification after the call ends (which is too late). VoxGuard is, to our knowledge, the first system that intervenes during the critical moment when the victim is about to hand over their money.
Every intervention event is tracked and preserved:
- Intervention counter displays in the monitor tab during active sessions
- Intervention history is included in the forensic report with level, trigger pattern, threat score at time of firing, and user response (dismissed, safe exit, challenge passed, challenge failed)
- Alert cards that triggered an intervention display a ๐ INTERVENED badge
- HTML and PDF exports include a dedicated intervention section
- Session gallery shows intervention count per saved session
- Action plan history is preserved with completion status
VoxGuard is a four-layer real-time pipeline that processes audio, video, and screen input from any call platform and delivers protection in under 80 milliseconds.
Three concurrent input streams feed the system:
- Caller Audio - the scammer's voice or video call (phone, WhatsApp, Zoom, Teams, any platform)
- User Microphone - the protected party's microphone via Web Audio API for 2-way transcript
- Screen Share - optional screen capture (JPEG 1280px every 2 seconds) for visual scam detection
| Component | Technology | Role |
|---|---|---|
| React UI | Vite 5 + JSX | 5-tab interface, live alerts, intervention overlay, action plan display |
| Rust WASM | wasm-pack, Wiener NR | Spectral subtraction, Float32 PCM, zero-copy audio at <100ms latency |
| Web Audio | MediaStream API | 16kHz Mono PCM capture in 250ms frames via ScriptProcessor |
| WebSocket | useWebSocket hook | Bidirectional: sends audio/screen, receives alerts/TTS/explanations/actions |
| Intervention | InterventionOverlay.jsx | WARN / BLOCK / LOCKDOWN overlay with verification challenge and auto-disconnect |
| Action Agent | ActionAgent.jsx | Post-session guided recovery: 9 languages, 9 countries, step-by-step checklist |
| Screen Capture | getDisplayMedia API | Opt-in only, Base64 JPEG, 2-second interval |
| Service | File | Role |
|---|---|---|
| FastAPI | main.py | REST + WebSocket /ws/session, auto-scaling on Cloud Run, health check, CORS |
| Threat Engine | threat_engine.py | Weighted scoring (0.45 Language + 0.35 Behavioral + 0.20 Visual), 500ms cycle, intervention triggers at score 55/75/90 + instant pattern matching |
| Audio Analyzer | audio_analyzer.py | VAD + buffer management, streams to Gemini Live API for real-time transcription |
| Vision Analyzer | vision_analyzer.py | Screenshot analysis via Gemini Vision: fake login pages, phishing domains, QR codes |
| Psych Analyzer | psych_analyzer.py | Single Gemini call returns 6 Cialdini vectors + 5 lie detection indicators + intervention recommendation |
| TTS Service | tts_service.py | Natural voice intervention via Gemini TTS with 3 voice profiles (Kore/Puck/Charon) |
| Explanation Service | explanation_service.py | Combines audio + visual analysis into plain-language explanation cards |
| Action Agent | action_agent.py | Generates personalized recovery plans with country-specific emergency contacts and reporting channels for 9 countries |
| Storage Service | storage_service.py | Persists session data and forensic reports to Cloud Firestore. Stores audio recordings to Cloud Storage for forensic export and replay. |
| Model | Version | Purpose |
|---|---|---|
| Gemini Audio | gemini-2.5-flash | Real-time audio analysis via generate_content_async with inline 16kHz PCM audio data. Transcription + scam pattern detection. |
| Gemini Vision | gemini-2.5-flash | Screenshot analysis: fake UI detection, phishing domain identification, QR code scanning |
| Gemini Text | gemini-2.5-flash | Transcript analysis, psychological scoring, 50+ pattern matching, explanation generation |
| Gemini TTS | gemini-2.5-flash-preview-tts | Natural voice intervention with contextual scripts in 9 languages |
| Grounding DB | scam_patterns.json | 50+ verified patterns from FTC, FBI IC3, GASA, MAS, ACCC - zero hallucination |
Caller Voice โโ
โโโ Rust WASM (spectral analysis, noise reduction)
User Mic โโโโโโ โ
โผ
WebSocket (audio chunks + screen frames)
โ
โผ
FastAPI Backend (Cloud Run)
โโโโโโโโโโโผโโโโโโโโโโ
โผ โผ โผ
Audio SVC Vision SVC Psych SVC
(Gemini (Gemini (Gemini
Live) Vision) Text)
โโโโโโโโโโโผโโโโโโโโโโ
โผ
Threat Engine (scoring + intervention triggers)
โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโ
โผ โผ โผ
Alert Intervention Explanation
Event Event + TTS Card Event
โโโโโโโโโโโผโโโโโโโโโโโโโโโ
โผ
WebSocket (back to browser)
โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโ
โผ โผ โผ
AlertCard Intervention Explanation
(live) Overlay + Card +
Voice Warning Action Agent
The Rust WebAssembly audio engine captures microphone input at the browser level with zero-copy processing. Audio is downsampled to 16kHz Mono PCM, processed through Wiener noise reduction, and streamed to Gemini Live API in 250ms frames, achieving <80ms alert latency from speech to alert.
With explicit user consent, VoxGuard captures screen frames (JPEG 1280px) every 2 seconds and sends them to Gemini Vision for analysis: fake bank login pages, fraudulent investment dashboards, malicious QR codes, and spoofed government portals.
A weighted composite scoring system running every 500ms. Language score (45%) handles transcript pattern matching against 50+ verified patterns. Behavioral score (35%) tracks urgency signals, isolation tactics, and impersonation markers. Visual score (20%) covers screen analysis results when active. Output: 0-100 threat score with severity classification. The engine also evaluates every alert for intervention triggers, checking both score thresholds (55/75/90) and instant-pattern matches.
All patterns grounded to published sources: FTC Consumer Sentinel, FBI IC3 2024, GASA Global Scam Report, MAS ScamShield (SG), and ACCC ScamWatch. No hallucination. Verified structured knowledge only. Each pattern includes an intervention_level field that determines whether detection triggers WARN, BLOCK, or no automatic intervention.
The only scam detection system we are aware of that maps psychological manipulation vectors in real-time using three analytical frameworks:
Framework 1: Cialdini's 6 Influence Principles maps which persuasion vectors the caller is deploying:
| Vector | Trigger Example |
|---|---|
| SCARCITY | "This offer expires in 10 minutes" |
| AUTHORITY | "I'm calling from the tax office" |
| FEAR | "Your account will be frozen" |
| RECIPROCITY | "We already helped you, now you must..." |
| ISOLATION | "Don't tell your family about this" |
| COMMITMENT | "You already agreed to verify your identity" |
Framework 2: User Vulnerability State derives metrics showing how the manipulation is affecting the user's decision-making (Panic Level, Compliance Risk, Misplaced Trust). These scores directly feed the intervention system: high panic combined with high authority triggers earlier intervention.
Each vector includes real-time interpretation (Inactive, Low, Moderate, Elevated, High, Critical) with explanations, plus a pie chart distribution view.
5 behavioral deception indicators based on FBI Criteria-Based Content Analysis (CBCA) methodology:
| Indicator | What It Detects |
|---|---|
| Inconsistency | Contradictions between claims made at different points |
| Strategic Vagueness | Deliberately avoids specifics when challenged |
| Excessive Detail | Unprompted flood of irrelevant details (overcompensation) |
| Question Deflection | Changes subject or responds with new claims |
| Pressure to Comply | Uses urgency to prevent verification |
Lie detection scores are displayed in the PSYCH tab alongside manipulation vectors, included in forensic reports (PDF/HTML), and saved to the session gallery. The Psych Analyzer service returns both psych scores and lie scores in a single Gemini call, along with an intervention recommendation that feeds back into the threat engine.
Both sides of the conversation are transcribed in real-time:
- ME (user), displayed in green
- CALLER (scammer), displayed in orange with flag markers
Flagged statements trigger real-time alerts. Full 2-way transcript is preserved in session reports and gallery.
Every session generates a complete forensic report with:
- Full 2-way transcript with timestamps
- Intervention history with level, trigger type, pattern, score at time of firing, and user response
- Action plan history with completion status and country-specific steps taken
- Alert timeline with confidence scores (alerts that triggered intervention are marked with ๐)
- Psychological vector breakdown + lie detection scores
- Country-specific recommended actions with local emergency numbers
- Country flag and language indicator
- Session audio recording (when REC is enabled)
Export: Dark-theme HTML or print-ready PDF with colored bars, dedicated intervention section, all analytical sections, and "Built by Wiqi Lee" footer.
Session Gallery: Saved sessions with threat score preview, country label, duration, and intervention count. Click any session for fullscreen detail view with tabs (Transcript, Alerts, Interventions, Psych + Lie Detection, Action Plan, Recommended Actions). Audio playback when recording is available.
The REC button in the Monitor tab captures live microphone audio for forensic export. REC requires Live Mic mode โ it records directly from your device microphone via the Web Audio API. In Demo Mode (no microphone), the REC button is disabled because browser security restrictions prevent capturing audio played through new Audio() or speechSynthesis.
The recording is saved as a WebM or MP4 blob (depending on browser support) and embedded in the forensic report. When you export to HTML, the audio player is included so you can replay the session later.
How to record:
- Click START to begin a session
- Click ๐ LIVE MIC and grant microphone permission
- Click the red REC button (it will pulse to indicate recording is active)
- Put your phone on speaker โ VoxGuard records the ambient audio from the live call
- When you click STOP or SAFE EXIT, the recording is automatically saved
- Open the REPORT tab to see the audio player and export to HTML
Note: REC is greyed out until Live Mic is enabled. This is by design โ there is no audio to record in Demo Mode.
VoxGuard supports real-time audio capture from your device microphone, in addition to demo mode. Both can run simultaneously.
How to use Live Mic:
- Click START to begin a session
- Click the ๐ LIVE MIC button in the Monitor controls
- Grant microphone permission when your browser asks
- A green status banner appears: "LIVE MICROPHONE - Real audio capture active"
- Put your phone on speaker and place it near your device, or use the same device for both the call and VoxGuard
- VoxGuard captures your microphone audio at 16kHz Mono PCM via the Web Audio API
- In production mode (with a running backend), the audio is streamed to the Gemini Live API for real-time analysis
- Click ๐ MIC ON again to disconnect the microphone
Live Mic + REC together: Enable both Live Mic and REC at the same time to record your actual call audio into the forensic report. This is the recommended setup for real-world use.
Live Mic + Demo Scripts: You can also run a demo script while Live Mic is active. The demo script provides simulated 2-way dialog and alerts, while the microphone captures real ambient audio in parallel. This is useful for demonstrations where you want to show both the scripted flow and the live microphone capability.
Gemini Live API supports 40 languages natively. VoxGuard includes region-specific scam patterns, localized alerts, localized intervention UI, natural voice warnings (Gemini TTS), scenario-based verification challenges, safe exit actions, and guided action plans.
| Language | Flag | Demo Scripts | Regional Scams | Intervention | Voice TTS | Action Agent |
|---|---|---|---|---|---|---|
| English | ๐บ๐ธ | Bank Fraud, Tech Support, Gov/Tax, Investment | FTC/FBI patterns | Full | Full (3 voices) | Full (US) |
| Indonesian | ๐ฎ๐ฉ | Bank XYZ, Pinjol, Mama Minta Pulsa, Giveaway Palsu | OJK/Bareskrim | Full | Full | Full (ID) |
| Chinese | ๐จ๐ณ | ๅ ฌๅฎๅฑ่ฏ้ช (Police Impersonation) | MPS Advisory | Full | Full | Full (CN) |
| Japanese | ๐ฏ๐ต | ใชใฌใชใฌ่ฉๆฌบ (Ore Ore) | NPA patterns | Full | Full | Full (JP) |
| Korean | ๐ฐ๐ท | ๋ณด์ด์คํผ์ฑ (Voice Phishing) | FSS patterns | Full | Full | Full (KR) |
| Spanish | ๐ช๐ธ | Fraude Bancario | Guardia Civil | Full | Full | Full (ES) |
| French | ๐ซ๐ท | Arnaque CPF | DGCCRF | Full | Full | Full (FR) |
| Hindi | ๐ฎ๐ณ | Digital Arrest Fraud | MHA/RBI | Full | Full | Full (IN) |
| Arabic | ๐ธ๐ฆ | ุงุญุชูุงู ู ุตุฑูู (Bank Fraud) | GASA | Full | Full | Full (SA) |
| Language | Flag | Demo Scripts | Regional Scams | Intervention | Voice TTS | Action Agent |
|---|---|---|---|---|---|---|
| Malay | ๐ฒ๐พ | Bank, Polis Diraja, Hadiah Palsu | BNM patterns | Full | Full | English fallback |
| Portuguese | ๐ง๐ท | Fraude Bancรกria, Polรญcia Federal, Prรชmio Falso | GASA patterns | Full | Full | English fallback |
Filipino ๐ต๐ญ, Thai ๐น๐ญ, Vietnamese ๐ป๐ณ, German ๐ฉ๐ช, Italian ๐ฎ๐น, Dutch ๐ณ๐ฑ, Turkish ๐น๐ท, Polish ๐ต๐ฑ, Russian ๐ท๐บ, Ukrainian ๐บ๐ฆ, Romanian ๐ท๐ด, Czech ๐จ๐ฟ, Hungarian ๐ญ๐บ, Swedish ๐ธ๐ช, Danish ๐ฉ๐ฐ, Finnish ๐ซ๐ฎ, Greek ๐ฌ๐ท, Hebrew ๐ฎ๐ฑ, Persian ๐ฎ๐ท, Bengali ๐ง๐ฉ, Urdu ๐ต๐ฐ, Tamil ๐ฑ๐ฐ, Swahili ๐ฐ๐ช, Amharic ๐ช๐น, Yoruba ๐ณ๐ฌ, Hausa ๐ณ๐ฌ, Afrikaans ๐ฟ๐ฆ, Norwegian ๐ณ๐ด
โ ๏ธ English fallback languages show a yellow notice in the app. Full native support planned for future release.
Fully optimized for desktop and mobile browsers. Works on any smartphone via the browser โ no app installation required. On phones: header wraps, tabs scroll horizontally, content padding reduced, footer stacks vertically. Touch-friendly buttons and controls throughout.
voxguard/
โโโ .github/workflows/
โ โโโ ci.yml # CI: WASM build, frontend build, backend tests
โ โโโ deploy.yml # CD: deploy backend to GCP Cloud Run
โ
โโโ frontend/ # React SPA (Vite 5 + JSX)
โ โโโ src/
โ โ โโโ components/
โ โ โ โโโ PixelLogo.jsx # Animated pixel shield logo with color cycling
โ โ โ โโโ Primitives.jsx # Reusable UI: PBox (bordered panel), PBtn, StatCard
โ โ โ โโโ AlertCard.jsx # Expandable threat alert card with intervention badge
โ โ โ โโโ InterventionOverlay.jsx # Live Scam Intervention: WARN/BLOCK/LOCKDOWN overlay
โ โ โ โโโ ExplanationCard.jsx # Multimodal explanation card (audio + vision signals)
โ โ โ โโโ ActionAgent.jsx # Guided anti-scam action agent with step checklist
โ โ โ โโโ ThreatMeter.jsx # SVG arc gauge: composite threat score 0-100
โ โ โ โโโ WaveformVisualizer.jsx # Real-time audio waveform bar visualization
โ โ โ โโโ LanguageSelector.jsx # Language dropdown (40 languages)
โ โ โ โโโ PixelParticles.jsx # Animated pixel particles (header + monitor)
โ โ โโโ pages/
โ โ โ โโโ MonitorTab.jsx # Main dashboard: waveform, alerts, explanation cards
โ โ โ โโโ Tabs.jsx # Psych, Patterns, Report (with action plan), About
โ โ โโโ hooks/
โ โ โ โโโ useWebSocket.js # WebSocket client: alerts, TTS audio, explanations, actions
โ โ โ โโโ useAudioEngine.js # Mic capture + Rust WASM bridge (Web Audio fallback)
โ โ โ โโโ useScreenCapture.js # Screen share via getDisplayMedia, 2s JPEG frames
โ โ โโโ wasm/ # Generated by wasm-pack (gitignored, built in CI)
โ โ โ โโโ scam_shield_audio.js # JS bindings for the Rust WASM module
โ โ โ โโโ scam_shield_audio_bg.wasm # Compiled WASM binary
โ โ โโโ utils/
โ โ โ โโโ constants.js # Patterns, psych tactics, intervention config, challenges
โ โ โโโ App.jsx # Root component: tab routing, state management, effects
โ โ โโโ main.jsx # React DOM mount point
โ โโโ package.json # Dependencies: React 18, Vite 5
โ โโโ vite.config.js # Dev server proxy, WASM support, build config
โ
โโโ rust-engine/ # Rust WASM audio preprocessor
โ โโโ src/
โ โ โโโ lib.rs # DSP pipeline: Wiener NR, spectral sub, VAD, RMS norm
โ โโโ Cargo.toml # Deps: wasm-bindgen, web-sys, js-sys, serde
โ โโโ Cargo.lock # Locked dependency versions
โ
โโโ backend/ # Python FastAPI backend
โ โโโ app/
โ โ โโโ api/
โ โ โ โโโ websocket.py # WebSocket: /ws/session + TTS + explanations + actions
โ โ โโโ services/
โ โ โ โโโ threat_engine.py # Scoring + intervention triggers + session state tracking
โ โ โ โโโ audio_analyzer.py # VAD + buffer management, Gemini audio streaming
โ โ โ โโโ vision_analyzer.py # Screenshot analysis via Gemini Vision API
โ โ โ โโโ psych_analyzer.py # Cialdini + lie detection + intervention recommendation
โ โ โ โโโ tts_service.py # Natural voice intervention via Gemini TTS
โ โ โ โโโ explanation_service.py # Multimodal explanation card generation
โ โ โ โโโ action_agent.py # Guided anti-scam action plans (9 countries)
โ โ โ โโโ storage_service.py # Cloud Firestore (sessions) + Cloud Storage (audio recordings)
โ โ โโโ core/
โ โ โโโ config.py # Pydantic settings from env vars (incl. TTS, Firestore, Storage)
โ โ โโโ gemini_client.py # Google GenAI SDK wrapper (audio + vision)
โ โโโ data/
โ โ โโโ scam_patterns.json # 50+ patterns with intervention_level field per pattern
โ โโโ tests/
โ โ โโโ test_threat_engine.py # Unit tests for scoring logic and intervention triggers
โ โโโ main.py # FastAPI entry (legacy, redirects to app.main)
โ โโโ requirements.txt # Python deps: FastAPI, google-generativeai, google-cloud-firestore, google-cloud-storage, numpy, scipy
โ โโโ Dockerfile # Cloud Run container: Python 3.11-slim, PORT=8080
โ
โโโ docs/svgs/
โ โโโ architecture-badge.svg # Animated pipeline badge for README header
โ โโโ features-badge.svg # Animated capabilities overview
โ โโโ intervention.svg # Animated intervention tiers (WARN/BLOCK/LOCKDOWN)
โ โโโ threat-demo.svg # Threat score gauge demo graphic
โ โโโ psych-vectors.svg # Psychological vector bar chart
โ โโโ lie-detection.svg # Lie detection indicator chart
โ โโโ audio-stream.svg # Animated audio waveform graphic
โ
โโโ scripts/
โ โโโ deploy.sh # One-command GCP Cloud Run deployment
โ
โโโ .env.example # Root template (shared reference for all env vars)
โโโ frontend/.env.example # Frontend: VITE_GEMINI_API_KEY, VITE_WS_URL
โโโ backend/.env.example # Backend: GOOGLE_API_KEY, GEMINI_MODEL, GEMINI_TTS_MODEL
โโโ docker-compose.yml # Local dev: backend + frontend orchestration
โโโ vercel.json # Vercel config for frontend deployment
โโโ .gitignore # Ignores: node_modules, .env, target/, wasm/
โโโ LICENSE # MIT License
โโโ README.md # You are here
Note:
frontend/src/wasm/is gitignored. It is generated bywasm-pack buildduring CI. The Frontend Build job depends on the Rust WASM Build job in the CI/CD pipeline.
VoxGuard uses three .env files to separate frontend, backend, and shared configuration. Only .env.example templates are committed to git โ actual .env files are gitignored.
| File | Variable | Description |
|---|---|---|
frontend/.env |
VITE_GEMINI_API_KEY |
Gemini API key for demo mode TTS (client-side) |
frontend/.env |
VITE_WS_URL |
Backend WebSocket URL (e.g., wss://your-backend.run.app/ws/session) |
backend/.env |
GOOGLE_API_KEY |
Gemini API key for backend services (audio, vision, TTS, psych) |
backend/.env |
GEMINI_MODEL |
Audio model (default: gemini-2.5-flash) |
backend/.env |
GEMINI_VISION_MODEL |
Vision model (default: gemini-2.5-flash) |
backend/.env |
GEMINI_TTS_MODEL |
TTS model (default: gemini-2.5-flash-preview-tts) |
backend/.env |
GOOGLE_CLOUD_PROJECT |
GCP project ID for Cloud Run, Firestore, and Storage |
backend/.env |
FIRESTORE_ENABLED |
Enable Cloud Firestore session persistence (default: true) |
backend/.env |
STORAGE_ENABLED |
Enable Cloud Storage audio uploads (default: true) |
backend/.env |
STORAGE_BUCKET |
Cloud Storage bucket name for audio recordings |
.env |
(shared reference) | Root template combining all variables for quick setup |
Judges: Just open the live demo โ no setup required for Demo Mode.
1. Open https://voxguard-kappa.vercel.app
2. Click START
3. Click any Demo Script (e.g., "Bank Impersonation")
4. Watch the intervention fire when the scammer asks for your OTP
Note: The Vercel deployment runs in Demo Mode with simulated 2-way dialog. Full real-time audio analysis (Live Mic mode) requires the backend running on Cloud Run with a Gemini API key. See below for local setup.
- Node.js 20+, Python 3.11+, Rust 1.75+ with
wasm32-unknown-unknowntarget wasm-packinstalled- Google Gemini API key
git clone https://github.com/wiqilee/VoxGuard.git
cd VoxGuard
# Configure environment variables
cp .env.example .env # Root (shared reference)
cp frontend/.env.example frontend/.env # Frontend: add VITE_GEMINI_API_KEY
cp backend/.env.example backend/.env # Backend: add GOOGLE_API_KEY
# Edit each .env file and add your Gemini API keycd rust-engine
wasm-pack build --target web --out-dir ../frontend/src/wasm
cd ..cd frontend && npm install && npm run devcd backend && pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000Three pre-loaded scripts for Demo Mode (no microphone needed):
English (EN) - 7 Scripts:
Script A: Bank Impersonation (Critical)
"Hello, I'm calling from your bank's fraud prevention department. We've detected suspicious activity on your account. Your account will be frozen in 10 minutes unless you verify your identity. Please provide your account number and the OTP."
Intervention trigger: When the caller asks for OTP, a BLOCK-level intervention fires instantly - Safe Exit only (fatal pattern). Voice warning: "Stop immediately. The caller is asking for your one-time password. A real bank will never ask for this. Hang up now." Explanation Card: Shows combined audio signals (Authority + Fear tactics) with confidence score. Action Agent: Country-specific steps to secure your bank account.
Script B: Tech Support Scam (High)
"Your computer has been compromised. I'm calling from the Security Center. You must install our remote access tool immediately or we cannot protect your credit cards."
Intervention trigger: WARN fires on impersonation detection. User can Verify Caller or Safe Exit. Voice warning: "Caution. No legitimate company will cold-call you about a virus on your computer."
Script C: Government / Tax Scam (Critical)
"This is an officer from the tax enforcement division. A warrant has been issued for your arrest. Settle this balance right now or face arrest. Purchase prepaid debit cards and read me the card numbers."
Intervention trigger: Gift Card Demand fires instant BLOCK - Safe Exit only (fatal pattern). Voice warning: "Stop. No legitimate organization accepts payment through gift cards. This is a confirmed scam."
Script D: AI Voice Clone (Critical)
The caller uses an AI-generated voice to impersonate the victim's child, fabricating a car accident and demanding $8,000 in bail via wire transfer.
Intervention trigger: Wire Transfer Instruction + Isolation Tactic. BLOCK fires on the bail demand. Voice warning: "Stop. This person is impersonating a family member. Verify their identity by calling them on their known number."
Script E: Digital Arrest (Critical)
The caller poses as a federal cyber crime officer, claims the victim's identity was used in money laundering, and demands the victim stay on video call while transferring funds to a "government-secured holding account."
Intervention trigger: Government Impersonation + Safe Account Transfer fires instant BLOCK. Voice warning: "Stop immediately. Law enforcement never demands money transfers by phone. This is a scam."
Script F: Job Offer Scam (High)
The caller offers a remote data entry position paying $5,000/month, then demands a $299 "training kit" payment via gift cards or cryptocurrency.
Intervention trigger: Gift Card Demand fires instant BLOCK when gift card payment is requested. Voice warning: "Stop. No legitimate employer asks you to pay upfront for a job. This is employment fraud."
Script G: Family Emergency (High)
The caller impersonates a family member in distress, claims a car accident and hospital emergency, and demands wire transfer.
VoxGuard has no text box. The user never types. The interface is entirely driven by audio (microphone stream via Rust WASM to Gemini Live API), vision (screen capture to Gemini Vision API), voice (Gemini TTS for spoken intervention), and inference (psychological vector scoring via Gemini Text). The interaction is ambient: the AI listens, watches, speaks, explains, and guides while the user is on their call.
The Live Scam Intervention system is entirely new. No existing scam detection product actively blocks the user from completing a dangerous action during a live call. VoxGuard's three-tier escalation (WARN, BLOCK, LOCKDOWN) with natural voice warnings, multimodal explanation cards, scenario-based verification challenges, guided action agent, and auto-disconnect represents a fundamental shift from passive detection to active protection.
- Google GenAI SDK: All Gemini functionality is implemented with the official Google Generative AI SDK for Python (
google-generativeai==0.5.4) on Google Cloud Run. Audio analysis usesgenerate_content_asyncwith inline 16kHz PCM audio data buffered from the Rust WASM engine. Session data and forensic reports are persisted via Cloud Firestore (storage_service.pyโgoogle-cloud-firestore). Session audio recordings are stored via Cloud Storage (storage_service.pyโgoogle-cloud-storage). Authentication is handled bygoogle-auth. - Gemini Audio Analysis:
gemini-2.5-flashfor audio analysis viagenerate_content_asyncwith inline 16kHz PCM audio data. Audio chunks are buffered (2-second flush with VAD), sent as base64 to the standard Gemini API, and return structured JSON with transcript, scam indicators, tactics, and lie indicators. - Gemini Text/Vision:
gemini-2.5-flashfor screenshot analysis, transcript analysis, psychological scoring, and multimodal explanation generation. - Gemini TTS:
gemini-2.5-flash-preview-ttsfor natural voice intervention with 3 voice profiles (Charon for scammer simulation, Kore for user, Puck for warm advisory) and contextual scripts in 9 languages. - Rust WASM: Zero-copy audio processing, Wiener NR, Float32 PCM, <100ms latency
- Cloud Run + Cloud Firestore + Cloud Storage: Fully containerized backend on Cloud Run with auto-scaling, health check endpoints, and session affinity for WebSocket.
storage_service.pyimplements both persistence layers: session data (alerts, interventions, psych scores, transcripts, action plans) persisted to Cloud Firestore viagoogle-cloud-firestore, and audio recordings uploaded to Cloud Storage viagoogle-cloud-storage. Three Google Cloud services in production. - Grounding: Reasoning against 50+ verified patterns with zero hallucination.
- Intervention Engine: Backend evaluates every alert for intervention eligibility, emitting
intervention+intervention_audio+explanation_cardevents via WebSocket. Frontend renders the overlay with scenario-appropriate UI, plays TTS audio, shows explanation cards, and sendsintervention_responseback. The full loop is tracked in session state. - Explanation Service: Combines audio transcript analysis + screenshot analysis into a single Gemini call, producing plain-language explanation cards with signal badges, confidence scores, and recommended actions.
- Action Agent: Generates personalized step-by-step recovery plans with country-specific emergency contacts, reporting channels, and AI-enhanced advice based on the call transcript.
- Psych Analyzer: Single Gemini call returns both Cialdini scores and lie detection indicators, plus an intervention recommendation.
- Demo Mode on Vercel: The live demo runs with simulated 2-way dialog and TTS alerts. Full real-time analysis requires a running backend with a valid Gemini API key.
- Browser Speech Synthesis: Demo voice quality varies by browser/OS. When Gemini TTS is unavailable, falls back to browser speech synthesis.
- English fallback: 31 languages use English voice and alerts in demo. 9 languages have full native support (EN, ID, ZH, JA, KO, ES, FR, HI, AR).
- Browser-only: No native mobile or desktop clients yet.
- Latency depends on network: <80ms measured locally; 100-300ms over public internet with Cloud Run.
- No persistent storage in demo: Session reports use localStorage only.
- Screen capture requires user consent: Vision analysis is opt-in and desktop-only.
- No brand names in demos: All demo scripts use generic institution names to avoid trademark issues.
- TTS voice availability: Gemini TTS voice profiles may vary by region. The system gracefully falls back to browser speech synthesis if TTS is unavailable.
- Audio recording requires Live Mic: The REC button is disabled in Demo Mode. Browser security restrictions prevent capturing audio played through
new Audio()(Gemini TTS) orspeechSynthesis(browser TTS). To record session audio, enable Live Mic + REC together so the microphone input is recorded directly from your device. This is a browser platform limitation, not a bug in VoxGuard. - Live Mic requires HTTPS: The
getUserMediaAPI requires a secure context (HTTPS or localhost). The Vercel deployment and local dev server both satisfy this requirement.
- Native mobile app: iOS and Android with platform-level call interception for always-on protection.
- Carrier-level integration: Deploying VoxGuard as an inline telecom network service.
- Expanded pattern library: Growing from 50 to 500+ patterns with global regional coverage.
- On-device WASM inference: Running scam classification directly in Rust WASM for offline-capable protection.
- Community pattern submissions: Crowd-sourced, continuously updated threat intelligence.
- Enterprise API: Hosted API for banks, telcos, and contact centers.
- Real-time video deepfake detection: Detect AI-generated video in video call scams.
- Auto-detect call platform: Automatically identify if user is on phone, Zoom, WhatsApp, or Teams.
- Emotional contagion scoring: Measure how the caller's emotional state transfers to the victim.
- Intervention learning: Track which intervention levels and challenge questions are most effective at stopping victims from complying with scammers, and adapt the system over time.
- Full native support for all 40 languages: Extend localized demo scripts, alerts, voice TTS, and intervention UI beyond the current 9 languages.
- Expanded action agent: Add more countries, integrate with local banking APIs for one-click account freeze, and provide follow-up reminder notifications.
Wiqi Lee - Data Scientist, AI/ML Researcher, Software Engineer, Cellist
Programming Languages: Python, Java, Rust, Julia
Submitted to: Gemini Live Agent Challenge 2026 #GeminiLiveAgentChallenge
"This is not a hackathon project. This is infrastructure for human safety."
| Source | URL | Usage |
|---|---|---|
| FBI IC3 2024 Annual Report | ic3.gov/AnnualReport/Reports/2024_IC3Report.pdf | Statistics ($16.6B), scam categories |
| FBI IC3 Annual Reports Index | ic3.gov/annualreport/reports | All yearly reports archive |
| FTC Consumer Sentinel | ftc.gov/enforcement/consumer-sentinel-network | Pattern taxonomy, linguistic markers |
| GASA Global Scam Report | gasa.org | Global $1T+ loss estimates |
| MAS ScamShield (SG) | scamshield.org.sg | Southeast Asian variants |
| ACCC ScamWatch (AU) | scamwatch.gov.au | Australian variant patterns |
| OJK Indonesia | ojk.go.id | Indonesian financial authority patterns |
| Bareskrim Cyber (ID) | patrolisiber.id | Indonesian cybercrime reporting |
| NPA Japan (่ญฆๅฏๅบ) | npa.go.jp | Japanese ore-ore sagi patterns |
| FSS South Korea (๊ธ๊ฐ์) | fss.or.kr | Korean voice phishing patterns |
| MHA India Cyber Crime | cybercrime.gov.in | Indian digital arrest scam data |
| INCIBE Spain | incibe.es | Spanish cybersecurity incident data |
| PHAROS France | internet-signalement.gouv.fr | French online fraud reporting |
| SAMA Saudi Arabia | sama.gov.sa | Saudi monetary authority fraud alerts |
| China National Anti-Fraud Center | mps.gov.cn | Chinese anti-fraud app & public security data |
| Interpol Financial Crime | interpol.int | International scam pattern intelligence |
No proprietary or licensed data. No personal victim data. All examples reconstructed from published public reports.
- No audio persisted by default: Audio is processed in real-time streams and discarded immediately after analysis. Raw audio is only retained when the user explicitly enables the REC recording feature.
- Minimal data transmission: Rust WASM preprocesses audio locally. Only the necessary audio frames are streamed to the backend for Gemini Live API analysis. No raw audio is stored server-side.
- No TTS audio stored: Voice intervention audio is generated on demand and not persisted.
- Explicit screen consent: Screen capture requires explicit user activation.
- No PII collection: No personally identifiable information is collected by VoxGuard.
- No brand names: Demo scripts use generic institution names.
- Intervention is protective, not punitive: The system helps users make informed decisions. It never prevents them from continuing a call if they choose to after the verification challenge.
MIT License. See LICENSE for details.
VOXGUARD 2026 ยท WIQI LEE ยท MIT LICENSE ยท #GeminiLiveAgentChallenge
Built to protect the people who need it most.


