Skip to content

cschuman/voice-form

Repository files navigation

voice-form

CI npm version bundle size license

Drop-in voice input for web forms. Speak naturally. Fill forms intelligently.

voice-form transforms the way users interact with forms. Instead of typing, clicking, and tabbing through fields, users speak — and your forms fill themselves. Built on the Web Speech API and LLM-powered parsing, it's framework-agnostic, requires zero API keys in the browser, and everything happens in under 30 milliseconds.

Install

npm install @voiceform/core
# or
pnpm add @voiceform/core

Quickstart

Minimal example — contact form with voice input in 15 lines:

import { createVoiceForm } from '@voiceform/core'

const voiceForm = createVoiceForm({
  endpoint: '/api/voice-parse', // Your backend endpoint
  schema: {
    formName: 'Contact Form',
    fields: [
      { name: 'fullName', label: 'Full Name', type: 'text', required: true },
      { name: 'email', label: 'Email', type: 'email', required: true },
      { name: 'message', label: 'Message', type: 'textarea' },
    ],
  },
  events: {
    onDone: (result) => {
      if (result.success) {
        document.querySelector('form')?.submit()
      }
    },
  },
})

The mic button appears automatically. User taps it, speaks "John Smith, john at example dot com, tell them I'm interested", and the form fields fill before they even finish speaking. One confirmation tap and you're done.

How It Works

1. Tap mic  →  2. Speak naturally  →  3. Confirm values  →  4. Fields fill
  • Capture: Web Speech API captures audio from the user's microphone
  • Transcribe: Raw speech is converted to text — via Web Speech or your own STT backend
  • Parse: You send the transcript to your backend endpoint, which calls an LLM to extract structured field values (see OpenAI Structured Outputs or Anthropic Tool Use)
  • Confirm: Users verify what was heard before the form fills — no surprises
  • Inject: Form fields update with sanitized values; synthetic input and change events fire so frameworks stay in sync

BYOE: Bring Your Own Endpoint

voice-form never touches your API keys. You own the endpoint, the LLM calls, and all the secrets. The browser sends only the transcript and form schema — both visible on the Network tab and intentionally public-safe.

// Your backend (Node, Python, Go, whatever)
// POST /api/voice-parse
import { buildSystemPrompt, buildUserPrompt } from '@voiceform/server-utils'

export async function parseVoice(req) {
  const { transcript, schema } = req.body

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: buildSystemPrompt(schema) },
      { role: 'user', content: buildUserPrompt(transcript) },
    ],
  })

  return {
    fields: {
      fullName: { value: 'John Smith' },
      email: { value: 'john@example.com' },
    },
  }
}

Full reference endpoints for SvelteKit, Next.js, and Express in examples/.

Performance

Everything voice-form does takes under 30ms. That's one animation frame at 60fps. The LLM inference time (200-500ms), network latency, and user reading time are all on your backend — not us. Check our bundle size on Bundlephobia.

Package Size (gzip)
@voiceform/core (headless) ~7.2 KB
@voiceform/core/ui ~4.8 KB
@voiceform/svelte ~171 B

Svelte Integration

First-class Svelte 5 support:

pnpm add @voiceform/svelte
<script>
  import { VoiceForm } from '@voiceform/svelte'
</script>

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
/>

See the API Reference for the full component API, snippets, and store integration.

React Integration

First-class React 18+ support:

pnpm add @voiceform/react
# peer deps: react >=18, react-dom >=18, @voiceform/core >=2.0.0

Hook API:

import { useVoiceForm } from '@voiceform/react'

function ContactForm() {
  const { state, instance } = useVoiceForm({
    endpoint: '/api/voice-parse',
    schema: {
      formName: 'Contact Form',
      fields: [
        { name: 'fullName', label: 'Full Name', type: 'text', required: true },
        { name: 'email', label: 'Email', type: 'email', required: true },
      ],
    },
  })

  return (
    <button
      onClick={() => instance.start()}
      disabled={state.status !== 'idle'}
    >
      {state.status === 'recording' ? 'Listening…' : 'Speak to fill form'}
    </button>
  )
}

Component API (with default UI):

import { VoiceForm } from '@voiceform/react'

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
  onDone={(result) => console.log('Filled:', result)}
/>

With React Hook Form (onFieldsResolved bypasses DOM injection):

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
  onFieldsResolved={(fields) => {
    Object.entries(fields).forEach(([name, value]) => setValue(name, value))
  }}
/>

See the API Reference for the full hook and component API.

What's New in V2

  • React integration@voiceform/react with useVoiceForm hook and VoiceForm component
  • Whisper STT adapterMediaRecorder-based adapter via @voiceform/core/adapters/whisper
  • Append mode — Speak to append to existing field values (appendMode: true)
  • Multi-step forms — Inject across wizard steps without errors on missing DOM fields (multiStep: true)
  • Field corrections — Users can edit parsed values in the confirmation panel (correctField())
  • Auto-detect schema — Scan a form element to infer schema from DOM (autoDetectSchema: true)
  • Developer tooling@voiceform/dev with schema inspector, logging middleware, and state visualizer

Security Highlights

  • No API keys in the browser. Ever. BYOE is not an option — it's the only path.
  • Output sanitization. All LLM values are sanitized before DOM injection. XSS impossible from parsed fields.
  • CSRF protection. Requests include the X-VoiceForm-Request header. Your endpoint validates it.
  • Prompt injection defense. The transcript is passed to the LLM as JSON-escaped data in a separate user message, not string-interpolated into instructions. See OWASP LLM01: Prompt Injection and the Prompt Injection Prevention Cheat Sheet.
  • Transparent data flows. Web Speech API sends audio to Google. If that concerns you, bring your own STT adapter.

See SECURITY.md for the full threat model and the OWASP Top 10 for LLM Applications.

Privacy

Voice input must be disclosed to users. The Web Speech API sends audio to Google's servers. voice-form provides built-in controls to show a privacy notice before requesting mic permission.

createVoiceForm({
  privacyNotice: 'Voice input uses your browser\'s speech recognition, processed by Google.',
  requirePrivacyAcknowledgement: true,
  // ...
})

Read PRIVACY.md for data flows, GDPR considerations, and developer responsibilities.

Accessibility

voice-form follows WAI-ARIA Authoring Practices for all interactive elements:

  • Mic button: role="button", keyboard-activatable (Enter/Space), aria-label per state
  • Confirmation panel: role="dialog", focus trap, Escape to cancel
  • Screen reader announcements via aria-live regions
  • prefers-reduced-motion respected for all animations

Browser Support

voice-form requires Web Speech API support:

  • Chrome, Edge, Safari 14.1+ (full)
  • Firefox 25+ (via flag in Nightly)

On unsupported browsers, the library gracefully disables the mic button and shows an "unavailable" message.

First Use

The first time a user taps the mic button, their browser shows a microphone permission prompt. This is one-time only — the permission is remembered per origin.

Links

  • API Reference — Complete config, methods, types, and error codes
  • Demo Site — Working contact form with voice input
  • Security Guide — Threat model, developer checklist, mitigation strategies
  • Privacy Guide — Data flows, compliance, user disclosure
  • Architecture — High-level design, state machine, data flow
  • Examples — SvelteKit, Next.js, Express endpoint implementations

Related Projects

  • form2agent-ai-react — Voice-assisted AI form filling with React and OpenAI
  • whisper-anywhere — Chrome extension for voice input using Whisper
  • Vocode — Open-source library for voice-based LLM applications
  • Ultravox — Multimodal LLM with native speech understanding

Contributing

Issues and PRs welcome. We use Changesets for versioning. Check CONTRIBUTING.md for the full workflow.

License

MIT

About

Drop-in voice-to-form component. Speak naturally, fill forms intelligently.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors