pipecat-ai · markbackman · Mar 24, 2026
diff --git a/docs.json b/docs.json
@@ -266,6 +266,7 @@
                   "server/utilities/audio/audio-buffer-processor",
                   "server/utilities/audio/koala-filter",
                   "server/utilities/audio/krisp-viva-filter",
+                  "server/utilities/audio/krisp-viva-vad-analyzer",
                   "server/utilities/audio/silero-vad-analyzer",
                   "server/utilities/audio/soundfile-mixer"
                 ]

diff --git a/guides/features/krisp-viva.mdx b/guides/features/krisp-viva.mdx
@@ -6,12 +6,13 @@ description: "Learn how to integrate Krisp's VIVA voice isolation and turn detec
 
 ## Overview
 
-Krisp's VIVA SDK provides two capabilities for Pipecat applications:
+Krisp's VIVA SDK provides three capabilities for Pipecat applications:
 
 - **Voice Isolation** — Filter out background noise and voices from the user's audio input stream, yielding clearer audio for fewer false interruptions and better transcription.
 - **Turn Detection** — Determine when a user has finished speaking using Krisp's streaming turn detection model, as an alternative to the [Smart Turn model](/server/utilities/turn-detection/smart-turn-overview).
+- **Voice Activity Detection** — Detect speech in audio streams using Krisp's VAD model, supporting sample rates from 8kHz to 48kHz.
 
-You can use either or both features together.
+You can use any combination of these features together.
 
 <CardGroup cols={2}>
   <Card
@@ -28,12 +29,19 @@ You can use either or both features together.
   >
     API reference for turn detection
   </Card>
+  <Card
+    title="KrispVivaVadAnalyzer Reference"
+    icon="code"
+    href="/server/utilities/audio/krisp-viva-vad-analyzer"
+  >
+    API reference for voice activity detection
+  </Card>
   <Card
     title="Krisp VIVA Example"
     icon="play"
     href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07p-interruptible-krisp-viva.py"
   >
-    Complete example with voice isolation and turn detection
+    Complete example with Krisp features
   </Card>
   <Card
     title="Krisp Developers"
@@ -102,12 +110,15 @@ KRISP_VIVA_FILTER_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-tel-v2.kef
 
 # Turn detection model path
 KRISP_VIVA_TURN_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-tt-v2.kef
+
+# Voice activity detection model path (optional)
+KRISP_VIVA_VAD_MODEL_PATH=/PATH_TO_UNZIPPED_MODELS/krisp-viva-vad-v2.kef
 ```
 
 <Note>
-  The voice isolation and turn detection features use **different models**. Set
-  `KRISP_VIVA_FILTER_MODEL_PATH` for voice isolation and
-  `KRISP_VIVA_TURN_MODEL_PATH` for turn detection.
+  Each feature uses a **different model**. Set `KRISP_VIVA_FILTER_MODEL_PATH`
+  for voice isolation, `KRISP_VIVA_TURN_MODEL_PATH` for turn detection, and
+  `KRISP_VIVA_VAD_MODEL_PATH` for voice activity detection.
 </Note>
 
 ## Test the integration
@@ -170,3 +181,27 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
 ```
 
 See the [KrispVivaTurn reference](/server/utilities/turn-detection/krisp-viva-turn) for configuration options.
+
+## Voice Activity Detection
+
+`KrispVivaVadAnalyzer` detects speech in audio streams using Krisp's VAD model. It supports sample rates from 8kHz to 48kHz, making it suitable for a wide range of applications including telephony and high-quality audio.
+
+Configure it as a VAD analyzer:
+
+```python
+from pipecat.audio.vad.krisp_viva_vad import KrispVivaVadAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+
+user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+    context,
+    user_params=LLMUserAggregatorParams(
+        vad_analyzer=KrispVivaVadAnalyzer(params=VADParams(stop_secs=0.2)),
+    ),
+)
+```
+
+See the [KrispVivaVadAnalyzer reference](/server/utilities/audio/krisp-viva-vad-analyzer) for configuration options.
diff --git a/guides/learn/speech-input.mdx b/guides/learn/speech-input.mdx
@@ -26,9 +26,9 @@ Custom strategies can also be implemented for specific use cases. By combining t
 
 ### What VAD Does
 
-VAD is responsible for detecting when a user starts and stops speaking. Pipecat uses the [Silero VAD](https://github.com/snakers4/silero-vad), an open-source model that runs locally on CPU with minimal overhead.
+VAD is responsible for detecting when a user starts and stops speaking. Pipecat includes [Silero VAD](https://github.com/snakers4/silero-vad), an open-source model that runs locally on CPU with minimal overhead. [Krisp VIVA VAD](/server/utilities/audio/krisp-viva-vad-analyzer) is also available for applications requiring support for higher sample rates.
 
-**Performance characteristics:**
+**Silero VAD performance characteristics:**
 
 - Processes 30+ms audio chunks in less than 1ms
 - Runs on a single CPU thread

diff --git a/server/utilities/audio/krisp-viva-vad-analyzer.mdx b/server/utilities/audio/krisp-viva-vad-analyzer.mdx
@@ -0,0 +1,104 @@
+---
+title: "KrispVivaVadAnalyzer"
+description: "Voice Activity Detection analyzer using the Krisp VIVA SDK"
+---
+
+## Overview
+
+`KrispVivaVadAnalyzer` is a Voice Activity Detection (VAD) analyzer that uses the Krisp VIVA SDK to detect speech in audio streams. It provides high-accuracy speech detection with support for multiple sample rates.
+
+## Installation
+
+```bash
+pip install "pipecat-ai[krisp]"
+```
+
+## Prerequisites
+
+You need a Krisp VIVA VAD model file (`.kef` extension). Set the model path via:
+
+- The `model_path` constructor parameter, or
+- The `KRISP_VIVA_VAD_MODEL_PATH` environment variable
+
+## Constructor Parameters
+
+<ParamField path="model_path" type="str" default="None">
+  Path to the Krisp model file (`.kef` extension). If not provided, uses the
+  `KRISP_VIVA_VAD_MODEL_PATH` environment variable.
+</ParamField>
+
+<ParamField path="frame_duration" type="int" default="10">
+  Frame duration in milliseconds. Must be 10, 15, 20, 30, or 32ms.
+</ParamField>
+
+<ParamField path="sample_rate" type="int" default="None">
+  Audio sample rate in Hz. Must be 8000, 16000, 32000, 44100, or 48000.
+</ParamField>
+
+<ParamField path="params" type="VADParams" default="VADParams()">
+  Voice Activity Detection parameters object
+  <Expandable title="properties">
+    <ParamField path="confidence" type="float" default="0.7">
+      Confidence threshold for speech detection. Higher values make detection more strict. Must
+      be between 0 and 1.
+    </ParamField>
+
+    <ParamField path="start_secs" type="float" default="0.2">
+      Time in seconds that speech must be detected before transitioning to SPEAKING state.
+    </ParamField>
+
+    <ParamField path="stop_secs" type="float" default="0.2">
+      Time in seconds of silence required before transitioning back to QUIET state.
+    </ParamField>
+
+    <ParamField path="min_volume" type="float" default="0.6">
+      Minimum audio volume threshold for speech detection. Must be between 0 and 1.
+    </ParamField>
+
+  </Expandable>
+</ParamField>
+
+## Usage Example
+
+```python
+from pipecat.audio.vad.krisp_viva_vad import KrispVivaVadAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+
+context = LLMContext(messages)
+user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+    context,
+    user_params=LLMUserAggregatorParams(
+        vad_analyzer=KrispVivaVadAnalyzer(
+            model_path="/path/to/model.kef",
+            params=VADParams(stop_secs=0.2)
+        ),
+    ),
+)
+```
+
+## Technical Details
+
+### Sample Rate Requirements
+
+The analyzer supports five sample rates:
+
+- 8000 Hz
+- 16000 Hz
+- 32000 Hz
+- 44100 Hz
+- 48000 Hz
+
+### Model Requirements
+
+- Model files must have a `.kef` extension
+- Model path can be specified via constructor or environment variable
+- Model is loaded once during initialization
+
+## Notes
+
+- High-accuracy speech detection using Krisp VIVA SDK
+- Supports multiple sample rates (8kHz to 48kHz)
+- Requires external `.kef` model file
+- Thread-safe for pipeline processing
+- Automatic session management
+- Configurable frame duration