Skip to content

yotsuda/Speech

Repository files navigation

Speech

PowerShell modules for text-to-speech (TTS) and speech-to-text (STT) across multiple providers.

.NET 8.0 PowerShell 7.4+ License: MIT

Providers

Module TTS STT Requires
Speech.Windows Offline SAPI Offline SAPI Windows 10/11
Speech.Azure 400+ neural voices Real-time streaming Azure Speech key
Speech.OpenAI 11 multilingual voices Whisper (batch) OpenAI API key
Speech.Google Standard/WaveNet/Neural2 Batch Google Cloud credential JSON
Speech.Core (shared config, microphone, output device)

Platform Support

Cmdlet Windows Linux/macOS
Out-*Speech (all providers) Yes Yes
Read-AzureSpeech Yes Yes
Read-GoogleSpeech Yes Yes
Read-WindowsSpeech Yes No (SAPI)
Read-OpenAISpeech Yes No (NAudio WinMM)

Quick Start

# Windows — no setup needed
Out-WindowsSpeech "Hello, world!"

# Azure
Set-AzureSpeechConfig -Key "your-key" -Region "eastus"
Out-AzureSpeech "Hello" -Language en-US

# OpenAI
Set-OpenAISpeechConfig -Key "sk-..."
Out-OpenAISpeech "Hello" -Voice nova

# Google
Set-GoogleSpeechConfig -Credential "path/to/key.json"
Out-GoogleSpeech "Hello"

# Speech recognition (all providers)
$text = Read-WindowsSpeech
$text = Read-AzureSpeech -Language ja-JP
$text = Read-OpenAISpeech -Language ja
$text = Read-GoogleSpeech -Language ja-JP

Installation & Configuration

Install-PSResource Speech

With PowerShell.MCP, AI can configure everything for you:

Install-PSResource PowerShell.MCP
claude mcp add PowerShell -s user -- "$(Get-MCPProxyPath)"

Then just ask:

Install the Az module and help me create an Azure Speech resource.
Help me set up OpenAI Speech. I don't have an API key yet.
Guide me through setting up Google Cloud Speech.
Say 'Hello world' using Windows Speech.

Windows SAPI works offline with zero configuration — the quickest way to get started.

Settings are stored in ~/Documents/PowerShell/Modules/Speech/SpeechConfig.json. API keys are masked when displayed.

Get-SpeechConfig          # View all settings
Get-SpeechConfig -Path    # Get config file path
Provider setup

Azure Speech Services

# Get key: Azure Portal > Create "Speech" resource > Keys and Endpoint
# Free tier (F0): 0.5M chars TTS + 5h STT / month
Set-AzureSpeechConfig -Key "your-key" -Region "eastus"
Get-AzureSpeech -Locale ja
Set-AzureSpeechConfig -Voice "ja-JP-NanamiNeural"

OpenAI

# Get key: https://platform.openai.com/api-keys
Set-OpenAISpeechConfig -Key "sk-..."
Set-OpenAISpeechConfig -Voice nova -Model tts-1

Google Cloud

# Get credential: Google Cloud Console > IAM > Service Accounts > Create key (JSON)
Set-GoogleSpeechConfig -Credential "C:\path\to\service-account.json"
Get-GoogleSpeech -Language ja-JP
Set-GoogleSpeechConfig -Voice "ja-JP-Neural2-B"

Windows

# No API key needed. Add voices: Settings > Time & language > Speech
Get-WindowsSpeech
Set-WindowsSpeechConfig -Voice "Microsoft Haruka Desktop"
Common options

All Out-*Speech cmdlets accept pipeline input and share these patterns:

# Pipeline
"Line 1", "Line 2" | Out-AzureSpeech

# Output device selection (Tab completion available)
Out-AzureSpeech "Hello" -OutputDevice "Speakers (Realtek)"
Set-SpeechConfig -OutputDevice "Speakers (Realtek)"   # persist

# Microphone selection
Read-AzureSpeech -Microphone "Headset Microphone"
Set-SpeechConfig -Microphone "Headset Microphone"     # persist

# Parameter > config priority for all settings
Out-AzureSpeech "Hello" -Key "temp-key" -Region "westus"  # one-time override

AI Voice Conversation

With PowerShell.MCP configured, AI can speak and listen through your speakers and microphone:

Let's have a voice conversation in English.
When I type 't', start listening and respond by voice.
Find me a good English voice and play a sample.

Compatible MCP Clients

Any MCP-compatible client that supports PowerShell.MCP can use Speech modules:

Cmdlet Reference

Each provider has 4 cmdlets following a consistent pattern:

Verb Purpose Example
Out-*Speech Text-to-speech Out-AzureSpeech "Hello"
Read-*Speech Speech-to-text $text = Read-AzureSpeech
Get-*Speech List voices Get-AzureSpeech -Locale ja
Set-*SpeechConfig Configure provider Set-AzureSpeechConfig -Voice "..."

Plus shared cmdlets in Speech.Core: Get-SpeechConfig, Set-SpeechConfig, Get-Microphone, Test-Microphone.

Use Get-Help <cmdlet> -Full for detailed documentation.

All 20 cmdlets

Speech.Core — Shared configuration and audio devices

  • Get-SpeechConfig — Display current configuration (-Path for file location)
  • Set-SpeechConfig — Set common settings: -Rate, -Volume, -Language, -Microphone, -OutputDevice
  • Get-Microphone — List audio input devices
  • Test-Microphone — Test microphone input level

Speech.Azure — Azure Cognitive Services

  • Out-AzureSpeech — TTS with SSML prosody (-Rate, -Volume, -Pitch, -Language, -Voice)
  • Read-AzureSpeech — Real-time streaming STT (-Language, -Detailed)
  • Get-AzureSpeech — List 400+ neural voices (-Locale to filter)
  • Set-AzureSpeechConfig — Set -Key, -Region, -Voice, -Pitch

Speech.OpenAI — OpenAI Audio API

  • Out-OpenAISpeech — TTS with 11 voices (-Voice, -Model, -Speed)
  • Read-OpenAISpeech — Whisper batch STT (-Language, -Model)
  • Get-OpenAISpeech — List available voices
  • Set-OpenAISpeechConfig — Set -Key, -Voice, -Model, -STTModel

Speech.Google — Google Cloud Speech

  • Out-GoogleSpeech — TTS with Standard/WaveNet/Neural2 (-Voice, -Language, -Speed)
  • Read-GoogleSpeech — Batch STT (-Language)
  • Get-GoogleSpeech — List available voices (-Language to filter)
  • Set-GoogleSpeechConfig — Set -Voice, -Credential

Speech.Windows — Windows SAPI

  • Out-WindowsSpeech — Offline TTS (-Voice, -Rate, -Volume)
  • Read-WindowsSpeech — Offline STT (-Language, -Confidence, -Detailed)
  • Get-WindowsSpeech — List installed SAPI voices (-Culture to filter)
  • Set-WindowsSpeechConfig — Set -Voice

Tab Completion

Most parameters support Tab or Ctrl+Space completion. Voice and language lists are fetched from each provider's API and cached for the session.

Cmdlet Tab-completable Parameters
Out-WindowsSpeech -Voice, -OutputDevice
Out-AzureSpeech -Language, -Voice, -OutputDevice
Out-OpenAISpeech -Model, -Voice, -OutputDevice
Out-GoogleSpeech -Language, -Voice, -OutputDevice
Read-WindowsSpeech -Culture, -Microphone
Read-AzureSpeech -Language, -Microphone
Read-OpenAISpeech -Language, -Model, -Microphone
Read-GoogleSpeech -Language, -Microphone
Get-WindowsSpeech -Culture
Get-AzureSpeech -Locale
Get-GoogleSpeech -Language
Set-*SpeechConfig -Voice, -Microphone, -OutputDevice
# Language narrows the voice list
Out-AzureSpeech "Hello" -Language <Tab> -Voice <Tab>
# → en-US-JennyNeural, en-US-GuyNeural, ...

Out-OpenAISpeech "Hello" -Voice <Tab>
# → alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse

Read-AzureSpeech -Language <Tab>
# → en-US, ja-JP, zh-CN, ...

Read-OpenAISpeech -Microphone <Tab>
# → Headset Microphone, Microphone Array, ...

Troubleshooting

Common issues

"key not configured" / "credential not configured" Run the provider's Set-*Config cmdlet. See Get-Help Set-AzureSpeechConfig -Full.

No microphone input

Get-Microphone       # List devices
Test-Microphone      # Check input level (> 30 = OK)

Windows STT not recognizing language Install language pack: Settings > Time & language > Language & region > Add language > "Speech" feature.

License

MIT

Third-party: NAudio (MIT), Azure Speech SDK (MIT).

About

PowerShell modules for text-to-speech (TTS) and speech-to-text (STT) across multiple providers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors