voice-form

Drop-in voice input for web forms. Speak naturally. Fill forms intelligently.

voice-form transforms the way users interact with forms. Instead of typing, clicking, and tabbing through fields, users speak — and your forms fill themselves. Built on the Web Speech API and LLM-powered parsing, it's framework-agnostic, requires zero API keys in the browser, and everything happens in under 30 milliseconds.

Install

npm install @voiceform/core
# or
pnpm add @voiceform/core

Quickstart

Minimal example — contact form with voice input in 15 lines:

import { createVoiceForm } from '@voiceform/core'

const voiceForm = createVoiceForm({
  endpoint: '/api/voice-parse', // Your backend endpoint
  schema: {
    formName: 'Contact Form',
    fields: [
      { name: 'fullName', label: 'Full Name', type: 'text', required: true },
      { name: 'email', label: 'Email', type: 'email', required: true },
      { name: 'message', label: 'Message', type: 'textarea' },
    ],
  },
  events: {
    onDone: (result) => {
      if (result.success) {
        document.querySelector('form')?.submit()
      }
    },
  },
})

The mic button appears automatically. User taps it, speaks "John Smith, john at example dot com, tell them I'm interested", and the form fields fill before they even finish speaking. One confirmation tap and you're done.

How It Works

1. Tap mic  →  2. Speak naturally  →  3. Confirm values  →  4. Fields fill

Capture: Web Speech API captures audio from the user's microphone
Transcribe: Raw speech is converted to text — via Web Speech or your own STT backend
Parse: You send the transcript to your backend endpoint, which calls an LLM to extract structured field values (see OpenAI Structured Outputs or Anthropic Tool Use)
Confirm: Users verify what was heard before the form fills — no surprises
Inject: Form fields update with sanitized values; synthetic input and change events fire so frameworks stay in sync

BYOE: Bring Your Own Endpoint

voice-form never touches your API keys. You own the endpoint, the LLM calls, and all the secrets. The browser sends only the transcript and form schema — both visible on the Network tab and intentionally public-safe.

// Your backend (Node, Python, Go, whatever)
// POST /api/voice-parse
import { buildSystemPrompt, buildUserPrompt } from '@voiceform/server-utils'

export async function parseVoice(req) {
  const { transcript, schema } = req.body

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: buildSystemPrompt(schema) },
      { role: 'user', content: buildUserPrompt(transcript) },
    ],
  })

  return {
    fields: {
      fullName: { value: 'John Smith' },
      email: { value: 'john@example.com' },
    },
  }
}

Full reference endpoints for SvelteKit, Next.js, and Express in examples/.

Performance

Everything voice-form does takes under 30ms. That's one animation frame at 60fps. The LLM inference time (200-500ms), network latency, and user reading time are all on your backend — not us. Check our bundle size on Bundlephobia.

Package	Size (gzip)
`@voiceform/core` (headless)	~7.2 KB
`@voiceform/core/ui`	~4.8 KB
`@voiceform/svelte`	~171 B

Svelte Integration

First-class Svelte 5 support:

pnpm add @voiceform/svelte

<script>
  import { VoiceForm } from '@voiceform/svelte'
</script>

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
/>

See the API Reference for the full component API, snippets, and store integration.

React Integration

First-class React 18+ support:

pnpm add @voiceform/react
# peer deps: react >=18, react-dom >=18, @voiceform/core >=2.0.0

Hook API:

import { useVoiceForm } from '@voiceform/react'

function ContactForm() {
  const { state, instance } = useVoiceForm({
    endpoint: '/api/voice-parse',
    schema: {
      formName: 'Contact Form',
      fields: [
        { name: 'fullName', label: 'Full Name', type: 'text', required: true },
        { name: 'email', label: 'Email', type: 'email', required: true },
      ],
    },
  })

  return (
    <button
      onClick={() => instance.start()}
      disabled={state.status !== 'idle'}
    >
      {state.status === 'recording' ? 'Listening…' : 'Speak to fill form'}
    </button>
  )
}

Component API (with default UI):

import { VoiceForm } from '@voiceform/react'

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
  onDone={(result) => console.log('Filled:', result)}
/>

With React Hook Form (onFieldsResolved bypasses DOM injection):

<VoiceForm
  endpoint="/api/voice-parse"
  schema={mySchema}
  onFieldsResolved={(fields) => {
    Object.entries(fields).forEach(([name, value]) => setValue(name, value))
  }}
/>

See the API Reference for the full hook and component API.

What's New in V2

React integration — @voiceform/react with useVoiceForm hook and VoiceForm component
Whisper STT adapter — MediaRecorder-based adapter via @voiceform/core/adapters/whisper
Append mode — Speak to append to existing field values (appendMode: true)
Multi-step forms — Inject across wizard steps without errors on missing DOM fields (multiStep: true)
Field corrections — Users can edit parsed values in the confirmation panel (correctField())
Auto-detect schema — Scan a form element to infer schema from DOM (autoDetectSchema: true)
Developer tooling — @voiceform/dev with schema inspector, logging middleware, and state visualizer

Security Highlights

No API keys in the browser. Ever. BYOE is not an option — it's the only path.
Output sanitization. All LLM values are sanitized before DOM injection. XSS impossible from parsed fields.
CSRF protection. Requests include the X-VoiceForm-Request header. Your endpoint validates it.
Prompt injection defense. The transcript is passed to the LLM as JSON-escaped data in a separate user message, not string-interpolated into instructions. See OWASP LLM01: Prompt Injection and the Prompt Injection Prevention Cheat Sheet.
Transparent data flows. Web Speech API sends audio to Google. If that concerns you, bring your own STT adapter.

See SECURITY.md for the full threat model and the OWASP Top 10 for LLM Applications.

Privacy

Voice input must be disclosed to users. The Web Speech API sends audio to Google's servers. voice-form provides built-in controls to show a privacy notice before requesting mic permission.

createVoiceForm({
  privacyNotice: 'Voice input uses your browser\'s speech recognition, processed by Google.',
  requirePrivacyAcknowledgement: true,
  // ...
})

Read PRIVACY.md for data flows, GDPR considerations, and developer responsibilities.

Accessibility

voice-form follows WAI-ARIA Authoring Practices for all interactive elements:

Mic button: role="button", keyboard-activatable (Enter/Space), aria-label per state
Confirmation panel: role="dialog", focus trap, Escape to cancel
Screen reader announcements via aria-live regions
prefers-reduced-motion respected for all animations

Browser Support

voice-form requires Web Speech API support:

Chrome, Edge, Safari 14.1+ (full)
Firefox 25+ (via flag in Nightly)

On unsupported browsers, the library gracefully disables the mic button and shows an "unavailable" message.

First Use

The first time a user taps the mic button, their browser shows a microphone permission prompt. This is one-time only — the permission is remembered per origin.

Links

API Reference — Complete config, methods, types, and error codes
Demo Site — Working contact form with voice input
Security Guide — Threat model, developer checklist, mitigation strategies
Privacy Guide — Data flows, compliance, user disclosure
Architecture — High-level design, state machine, data flow
Examples — SvelteKit, Next.js, Express endpoint implementations

Related Projects

form2agent-ai-react — Voice-assisted AI form filling with React and OpenAI
whisper-anywhere — Chrome extension for voice input using Whisper
Vocode — Open-source library for voice-based LLM applications
Ultravox — Multimodal LLM with native speech understanding

Contributing

Issues and PRs welcome. We use Changesets for versioning. Check CONTRIBUTING.md for the full workflow.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.changeset		.changeset
.github/workflows		.github/workflows
docs		docs
examples		examples
packages		packages
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.size-limit.json		.size-limit.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PHASE_3_COMPLETE.md		PHASE_3_COMPLETE.md
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
prettier.config.js		prettier.config.js
tsconfig.base.json		tsconfig.base.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-form

Install

Quickstart

How It Works

BYOE: Bring Your Own Endpoint

Performance

Svelte Integration

React Integration

What's New in V2

Security Highlights

Privacy

Accessibility

Browser Support

First Use

Links

Related Projects

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-form

Install

Quickstart

How It Works

BYOE: Bring Your Own Endpoint

Performance

Svelte Integration

React Integration

What's New in V2

Security Highlights

Privacy

Accessibility

Browser Support

First Use

Links

Related Projects

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages