-
Notifications
You must be signed in to change notification settings - Fork 813
model_dump() in _generate_content_stream serializes full image contents per response (~32 MB wasted) #2235
Description
Describe the bug
In _generate_content_stream() (and _generate_content(), and all async variants), every streaming response creates a full parameter_model.model_dump() to pass as kwargs to GenerateContentResponse._from_response():
# models.py, line 4324 (and ~35 other identical call sites)
return_value = types.GenerateContentResponse._from_response(
response=response_dict, kwargs=parameter_model.model_dump()
)_from_response() (types.py:6943) only reads kwargs['config']['response_schema'] and kwargs['config']['response_json_schema'] from that dict. It never accesses kwargs['contents'].
When contents includes inline image data (e.g. reference images for image generation), each model_dump() call serializes the entire image bytes into a Python dict — then discards them.
Measured impact
We instrumented the SDK (monkey-patched _GenerateContentParameters.model_dump to measure json.dumps(result) size) and ran isolated benchmarks:
| Scenario | model_dump() size per call |
|---|---|
| No reference images | ~1 KB |
| 3 reference images (~3 MB each) | 31.7 MB |
| 5 reference images (~3 MB each) | 52.8 MB |
With 24 concurrent image generation threads (common for batch image generation), this creates ~760 MB of transient allocations that serve no purpose, contributing to OOM kills in memory-constrained environments.
To reproduce
- Call
generate_content_stream()with inline image data incontents(e.g. reference images viaPart.from_bytes) - Monitor RSS memory — each streaming response allocates a dict equal in size to the full serialized request including all image bytes
- With multiple concurrent requests, memory spikes rapidly
Proposed fix
Pass only the config to _from_response() instead of dumping the full parameter model:
# Instead of:
kwargs=parameter_model.model_dump()
# Pass just what _from_response needs:
kwargs={"config": parameter_model.config.model_dump() if parameter_model.config else {}}This affects ~35 call sites in models.py (all _from_response calls follow the same pattern).
Environment
- SDK version: google-genai 1.63.0 (also confirmed present in 1.68.0)
- Python: 3.12
- OS: Linux (Docker) / macOS
Related
- slow memory leak #1258 reports a slow memory leak during streaming. Our instrumentation suggests
model_dump()is a contributing factor — the large dicts hold references to image bytes that delay GC, especially under concurrent load.