feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) by mahimairaja · Pull Request #4858 · livekit/agents

mahimairaja · 2026-02-16T16:43:07Z

What does this PR does?

Adds Speech-to-Text (STT) support to the livekit-plugins-smallestai plugin using Smallest AI's Pulse STT API. The existing plugin only supported TTS, this PR brings it to parity with plugins like Deepgram, ElevenLabs, and Soniox that offer both TTS and STT.

Closes #4856

Summary of Changes

New: STT class (stt.py)

Pre-recorded transcription via HTTP POST (/api/v1/pulse/get_text)
Real-time streaming via WebSocket (wss://waves-api.smallest.ai/api/v1/pulse/get_text)
~64ms TTFB streaming, word-level timestamps, speaker diarization
32+ languages with auto-detection (language="multi")
Capabilities: streaming=True, interim_results=True

New: SpeechStream class (stt.py)

WebSocket-based streaming with concurrent send/recv/keepalive tasks
Audio chunking via AudioByteStream (~4096 byte chunks per Smallest AI docs)
Full speech event lifecycle: START_OF_SPEECH → INTERIM_TRANSCRIPT / FINAL_TRANSCRIPT → END_OF_SPEECH
Graceful shutdown with {"type": "end"} signaling

Usage

from livekit.plugins import smallestai

# Pre-recorded
stt = smallestai.STT(language="en")

# Streaming (used in AgentSession)
session = AgentSession(
    stt=smallestai.STT(language="en"),
    llm=...,
    tts=smallestai.TTS(),
)

Configuration via SMALLEST_API_KEY environment variable (same key used for TTS).

Testing

Verified pre-recorded transcription with WAV audio files
Verified real-time streaming with live microphone input via LiveKit Agents Playground
Tested interim + final transcript emission and speech event lifecycle
Tested with language="en" and language="multi" (auto-detection)
Ran ruff format and check

❯ uv run ruff check .
All checks passed!

❯ uv run ruff format .
629 files left unchanged

Ran type checking

❯ uv pip install pip && uv run mypy --install-types --non-interactive \
    -p livekit.agents \
    -p livekit.plugins.smallestai
Audited 1 package in 5ms
Success: no issues found in 169 source files

API Reference

CLAassistant · 2026-02-16T16:43:15Z

All committers have signed the CLA.

…estamps

mahimairaja · 2026-02-16T17:17:53Z

Tested prerecorded:

import asyncio
from pathlib import Path

import aiohttp
from dotenv import load_dotenv

from livekit.agents import utils
from livekit.plugins import smallestai

load_dotenv()


async def main():
    wav = Path(__file__).resolve().parent / "sample.wav"

    async with aiohttp.ClientSession() as session:
        stt = smallestai.STT(language="en", http_session=session)
        frames = [
            f
            async for f in utils.audio.audio_frames_from_file(
                str(wav), sample_rate=16000, num_channels=1
            )
        ]
        event = await stt.recognize(frames)

    print(event.alternatives[0].text if event.alternatives else "")


if __name__ == "__main__":
    asyncio.run(main())

mahimairaja · 2026-02-16T17:18:23Z

Testing streaming:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import Agent, AgentServer, AgentSession, room_io
from livekit.plugins import silero
from livekit.plugins.openai.llm import LLM
from livekit.plugins.smallestai.stt import STT
from livekit.plugins.smallestai.tts import TTS
from livekit.plugins.turn_detector.english import EnglishModel

load_dotenv()


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.""",
        )


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=STT(),
        llm=LLM(model="gpt-4.1-mini"),
        tts=TTS(),
        vad=silero.VAD.load(),
        turn_detection=EnglishModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(),
    )

    await session.generate_reply(instructions="Greet the user and offer your assistance.")


if __name__ == "__main__":
    agents.cli.run_app(server)

…_options

…lose

mahimairaja · 2026-02-16T19:35:29Z

After conversations with @ harshitajain165 from smallest.ai, I came to know that few more steps needed for streaming support from the smallest server. for now I am moving this PR to draft.

devin-ai-integration

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-03-03T23:08:47Z

livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/stt.py

+                if self._is_last_event.is_set():
+                    closing_ws = True
+                    return


🔴 recv_task early return on is_last leaves keepalive_task blocking tasks_group, causing stream to hang

After both send_task and recv_task complete normally, keepalive_task continues running indefinitely, preventing asyncio.gather from completing. This causes _run to never return, _event_ch to never close, and any consumer iterating the speech stream to hang after receiving all transcripts.

Root Cause and Detailed Walkthrough

The shutdown sequence proceeds as follows:

send_task exhausts self._input_ch, sends the END message (stt.py:366-368), and returns normally.

The server processes the end signal and sends a final transcript with is_last=True.

recv_task processes this event, _process_stream_event sets self._is_last_event (stt.py:524-525), and recv_task returns early at stt.py:400-402:
if self._is_last_event.is_set(): closing_ws = True return

keepalive_task (stt.py:327-333) is still running — it pings every 30 seconds and only exits when ws.ping() raises an exception.

tasks_group = asyncio.gather(*tasks) at stt.py:416 requires ALL three tasks to complete. Since keepalive_task is still alive, the gather never resolves.

asyncio.wait at stt.py:419-422 blocks forever (or until keepalive_task's next ping detects a closed connection, which may take up to 30 seconds — or never, if the server doesn't close the WebSocket).

_run never returns → _main_task never returns → _event_ch never closes → consumer's async for event in stream hangs after the last real event.

Compare with the Deepgram plugin (livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py:531-559): Deepgram's recv_task does NOT exit early — it continues to call ws.receive() until the WebSocket is actually closed by the server, which naturally causes keepalive_task to fail on its next send and exit.

Impact: In standalone stream usage (e.g. async for event in stt.stream(): ...), the iteration hangs indefinitely after all transcripts are received. In AgentSession usage, the hang is masked by explicit aclose() calls, but cleanup is still delayed.

Prompt for agents

In livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/stt.py, the recv_task function (lines 371-402) returns early when is_last_event is set (lines 400-402), but this leaves keepalive_task running and blocking asyncio.gather from completing. The fix: instead of returning early from recv_task, continue the while loop to let the WebSocket close naturally. Since closing_ws is already set to True at line 401, the existing close-frame handler at lines 376-382 will cleanly return when the server closes the connection. This matches the Deepgram plugin's pattern. Replace lines 400-402: if self._is_last_event.is_set(): closing_ws = True return With: if self._is_last_event.is_set(): closing_ws = True # Don't return early; continue loop so ws.receive() sees # the server-side close frame, which also lets keepalive_task # detect the closed connection and exit.

Was this helpful? React with 👍 or 👎 to provide feedback.

mahimairaja added 6 commits February 16, 2026 11:12

fix: typo

a2e01a4

feat: added speech to text plugin for smallest ai

e584bea

feat: exposed to module api

14b2c40

docs: updated docs w/r speech to text

614408d

fix: handle cancelled gather cleanup

2a959ce

fix: types - safe float conversion

539a870

This comment was marked as resolved.

Sign in to view

mahimairaja added 4 commits February 16, 2026 12:09

fix: transcription key stt

a0f128a

fix: avoid double-applying start_time_offset to fallback word end tim…

23fe8ed

…estamps

fix: handled stream encoding format standard

c6a30ab

fix: update stream resampler target when sample_rate changes

d3ded10

This comment was marked as resolved.

Sign in to view

mahimairaja added 2 commits February 16, 2026 12:22

fix: move sample-rate resampler reset from STT to SpeechStream update…

86a0bd8

…_options

fix: aiohttp params types

d7631f3

This comment was marked as resolved.

Sign in to view

mahimairaja added 3 commits February 16, 2026 12:41

fix: transcript on streaming api

f18ac3a

fix: reset closing_ws on reconnect and send end only on final input c…

d9dcaf9

…lose

fix: ruff formatting

a84cc6d

mahimairaja marked this pull request as draft February 16, 2026 19:35

Merge branch 'livekit:main' into feat/smallest-ai-stt

ffeacfe

mahimairaja mentioned this pull request Feb 19, 2026

feat: Add LLM support to livekit-plugins-smallestai (Electron / Hydra) #4898

Open

mahimairaja added 2 commits February 20, 2026 20:15

Merge branch 'livekit:main' into feat/smallest-ai-stt

e4463c1

Merge branch 'livekit:main' into feat/smallest-ai-stt

e32ee52

mahimairaja marked this pull request as ready for review March 2, 2026 20:53

mahimairaja added 2 commits March 2, 2026 16:31

fix: Language typing

0f6d901

Merge branch 'livekit:main' into feat/smallest-ai-stt

0442a65

devin-ai-integration bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858
mahimairaja wants to merge 20 commits intolivekit:mainfrom
mahimairaja:feat/smallest-ai-stt

mahimairaja commented Feb 16, 2026

Uh oh!

CLAassistant commented Feb 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

mahimairaja commented Feb 16, 2026 •

edited

Loading

Uh oh!

mahimairaja commented Feb 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

mahimairaja commented Feb 16, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mahimairaja commented Feb 16, 2026

What does this PR does?

Summary of Changes

Usage

Testing

API Reference

Uh oh!

CLAassistant commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

mahimairaja commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mahimairaja commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

mahimairaja commented Feb 16, 2026

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Feb 16, 2026 •

edited

Loading

mahimairaja commented Feb 16, 2026 •

edited

Loading

mahimairaja commented Feb 16, 2026 •

edited

Loading