feat: Add Gemini Realtime provider implementing IRealtimeClient/IRealtimeClientSession#256
Open
tarekgh wants to merge 7 commits intogoogleapis:mainfrom
Open
feat: Add Gemini Realtime provider implementing IRealtimeClient/IRealtimeClientSession#256tarekgh wants to merge 7 commits intogoogleapis:mainfrom
tarekgh wants to merge 7 commits intogoogleapis:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
32fa581 to
1d54288
Compare
Author
|
CC @stephentoub |
jeffhandley
reviewed
Mar 19, 2026
1d54288 to
a5345ce
Compare
…timeClientSession
a5345ce to
dd1b649
Compare
- Use SendRealtimeInputAsync for all input types (text, image, audio) to avoid interleaving with SendClientContentAsync which causes WebSocket close - Fix VAD handling: use ActivityStart/ActivityEnd framing when VAD is disabled, AudioStreamEnd when VAD is enabled for push-to-talk - Fix image input: send as Video blob without activity framing, use minimal text trigger in CreateResponse since Gemini treats images as streaming context - Fix function calling: convert MEAI JsonSchema to Google Schema type with proper uppercase type names (STRING, OBJECT, etc.) - Text input auto-triggers model response without framing
dd1b649 to
6121fcf
Compare
…ction calling reliability, and test corrections - Wait for SetupComplete after ConnectAsync so tools are configured before caller sends audio/text - Add convenience constructor GoogleGenAIRealtimeClient(apiKey, defaultModelId) - Guard against null function call IDs with synthetic GUID fallback - Always store callId-to-functionName mapping regardless of null checks - Fix 5 test expectations to match actual Gemini Live API behavior: - ParametersJsonSchema -> Parameters (Google Schema) - Text auto-triggers response (no turnComplete needed) - SendRealtimeInputAsync has no role field
- Branch on SessionKind.Transcription in BuildLiveConnectConfig to create minimal config: input transcription only, text modality, no voice/tools/instructions - Map TranscriptionOptions.SpeechLanguage to AudioTranscriptionConfig.LanguageCodes for both transcription and conversation modes - Add 8 tests covering transcription mode config, language mapping, VAD, and verifying conversation-oriented options are excluded
56db929 to
7d22cc8
Compare
Add GenAIJsonContext (JsonSerializerContext) with source-generated metadata for all types used in serialization, enabling AOT (Ahead-of-Time) compilation support without duplicating the SDK code in provider implementations. Changes: - Add GenAIJsonContext.cs with [JsonSerializable] entries for 93 root types (nested types are auto-discovered by the source generator) - Wire source-gen context into JsonConfig.JsonSerializerOptions with DefaultJsonTypeInfoResolver fallback for non-AOT scenarios - Add compact InternalSerializerOptions for intermediate serialize-then-parse round-trips (avoids WriteIndented overhead on internal transforms) - Update all bare JsonSerializer.Serialize/Deserialize calls (~90 sites) to use the configured options with source-gen type metadata - Add System.Text.Json PackageReference for source generator support on both netstandard2.0 and net8.0 targets - Add AotJsonContextTest.cs with 4 tests verifying context coverage, nested type auto-discovery, and serialization round-trips
…time-provider # Conflicts: # Google.GenAI/Batches.cs # Google.GenAI/Models.cs # Google.GenAI/Tunings.cs
3a4177f to
c76d80c
Compare
…normalization - Fix SendAsync error handling: rethrow ODE as named ObjectDisposedException, swallow WebSocketException only when disposed (not blanket catch) - Add concurrent enumeration guard (_activeStreamingEnumeration) to GetStreamingResponseAsync to prevent multiple simultaneous readers - Wrap DisposeAsync resources in individual try/catch with ExceptionDispatchInfo to prevent resource leaks on partial failure - Fix CreateSessionAsync to dispose asyncSession on setup failure - Replace shallow tool result serialization with deep NormalizeToolPayload, NormalizeToolArguments, ConvertJsonElementToToolPayload - Use FunctionCallContent.CreateFromParsedArguments for tool call args (consistent with MEAI conventions, AOT-safe) - Add MaxToolPayloadDepth (64) depth guard to prevent stack overflow - Add post-lock disposed recheck for race with concurrent DisposeAsync - Add 5 regression tests, update 3 existing error handling tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Gemini Live API provider implementing the
Microsoft.Extensions.AIRealtime abstractions (IRealtimeClient/IRealtimeClientSession), enabling real-time audio, text, and function-calling conversations with Gemini models through the standardized MEAI interface.This PR also updates the repository to depend on the official
Microsoft.Extensions.AI.Abstractions 10.4.1NuGet package (replacing the private10.5.0-devbuilds).AOT Compatibility
This PR includes changes to make the Google GenAI SDK fully compatible with AOT (Ahead-of-Time) compilation. This was done to avoid duplicating the SDK's WebSocket and JSON protocol code in the realtime provider, since the MEAI realtime wrapper (
GoogleGenAIRealtimeClient/GoogleGenAIRealtimeSession) delegates all serialization and WebSocket communication to the core SDK'sLive.cs/AsyncSession/Transformers.cs/LiveConverters.cscode paths.AOT changes:
GenAIJsonContext— a source-generatedJsonSerializerContextwith[JsonSerializable]entries for 93 root types (nested property types are auto-discovered by the generator)JsonConfig.JsonSerializerOptionswithDefaultJsonTypeInfoResolverfallback for non-AOT scenarios (anonymous types, user-provided types)InternalSerializerOptionsfor intermediate serialize-then-parse round-trips to avoidWriteIndentedoverhead on internal transformsJsonSerializer.Serialize/Deserializecall sites across the SDK to use configured options with source-gen metadataSystem.Text.JsonPackageReference for source generator support on bothnetstandard2.0andnet8.0targetsAotJsonContextTest.cswith 4 tests verifying context coverage, nested type auto-discovery, and serialization round-tripsUsage Example
What's Included
New Files
GoogleGenAIRealtimeClient.cs—IRealtimeClientimplementation that wraps aGoogle.GenAI.Clientand creates realtime sessions via the Gemini Live API. Includes a convenience constructor accepting just an API key and model ID.GoogleGenAIRealtimeSession.cs—IRealtimeClientSessionimplementation that manages the WebSocket connection, audio buffering, message mapping, and function call orchestration.GoogleGenAIRealtimeTest.cs— 118 unit tests covering the full surface area.GenAIJsonContext.cs— Source-generated JSON serialization context for AOT compatibility.AotJsonContextTest.cs— 4 unit tests verifying AOT source-gen coverage and round-trip correctness.Modified Files
GoogleGenAIExtensions.cs— AddedAsIRealtimeClient()extension method.Directory.Packages.props— UpdatedMicrosoft.Extensions.AI.Abstractionsfrom10.5.0-dev→10.4.1.Google.GenAI.csproj— AddedSystem.Text.JsonPackageReference for source generator support.JsonConfig.cs— Dual options:JsonSerializerOptions(indented, for API output) +InternalSerializerOptions(compact, for internal transforms). Both use source-gen context with reflection fallback.Live.cs— Minor adjustment to exposeAsyncSessionfor the realtime provider; added serialization options.Batches.cs, Caches.cs, Files.cs, Models.cs, Operations.cs, Tunings.cs— Updated all bareJsonSerializercalls to use configured options (intermediate →InternalSerializerOptions, HTTP body/response →JsonSerializerOptions).Transformers.cs, Common.cs, TokensConverters.cs— Updated bareJsonSerializercalls to use configured options.packages.lock.jsonfiles regenerated.Features
FunctionInvokingRealtimeSessionmiddleware; tool responses are batched into a singleSendToolResponseAsynccallSemaphoreSlimserializes all WebSocket sends, safe for concurrent middleware + caller usageCreateSessionAsyncwaits for the server'sSetupCompleteacknowledgment before returning, ensuring tools and modalities are fully configured before the caller sends audio or textGoogleGenAIRealtimeClient(string apiKey, string? defaultModelId)for simple setup without manually creating aClientKey Design Decisions
SetupComplete handshake — The Google SDK's
ConnectAsyncsends the setup config but returns immediately without waiting for the server'sSetupCompleteacknowledgment. OurCreateSessionAsyncdrains this message before returning, ensuring the session is fully ready (tools configured, modalities set). Without this, function calling fails when the user speaks immediately after connecting.Tool response batching — The MEAI
FunctionInvokingRealtimeSessionmiddleware sends separateCreateConversationItemper function result. Gemini expects all results in oneSendToolResponseAsynccall. The provider buffers results and flushes them as a single batch whenCreateResponsearrives.TurnComplete suppression after tool responses — After
SendToolResponseAsync, Gemini automatically continues generating. Sendingclient_contentwithturn_complete: truecauses the server to close the WebSocket. The provider tracks this via_lastSendWasToolResponseand skips TurnComplete accordingly.Null function call ID guard — If a function call arrives without an ID, a synthetic GUID is generated to ensure the call-ID-to-function-name mapping always works for the round-trip.
VAD handling — When VAD is disabled (default), the provider wraps audio commits with explicit
ActivityStart/ActivityEndframing. When enabled, the server handles speech boundary detection automatically.Audio buffer cap — Audio appends are capped at 10 MB to prevent unbounded memory growth. Frames exceeding 32 KB are automatically split.
AOT via source-gen in core SDK — Rather than reimplementing the WebSocket+JSON protocol in the MEAI wrapper (which would duplicate ~1000 lines of code), we made the core SDK AOT-compatible by adding a
JsonSerializerContextwith source-generated metadata for all types. This ensures the MEAI realtime wrapper can delegate toLive.cs/AsyncSessionwithout any reflection-based JSON calls on the hot path.Test Coverage
122 unit tests covering:
BuildLiveConnectConfigmapping (all option combinations)