Skip to content

model_dump() in _generate_content_stream serializes full image contents per response (~32 MB wasted) #2235

@feynnon

Description

@feynnon

Describe the bug

In _generate_content_stream() (and _generate_content(), and all async variants), every streaming response creates a full parameter_model.model_dump() to pass as kwargs to GenerateContentResponse._from_response():

# models.py, line 4324 (and ~35 other identical call sites)
return_value = types.GenerateContentResponse._from_response(
    response=response_dict, kwargs=parameter_model.model_dump()
)

_from_response() (types.py:6943) only reads kwargs['config']['response_schema'] and kwargs['config']['response_json_schema'] from that dict. It never accesses kwargs['contents'].

When contents includes inline image data (e.g. reference images for image generation), each model_dump() call serializes the entire image bytes into a Python dict — then discards them.

Measured impact

We instrumented the SDK (monkey-patched _GenerateContentParameters.model_dump to measure json.dumps(result) size) and ran isolated benchmarks:

Scenario model_dump() size per call
No reference images ~1 KB
3 reference images (~3 MB each) 31.7 MB
5 reference images (~3 MB each) 52.8 MB

With 24 concurrent image generation threads (common for batch image generation), this creates ~760 MB of transient allocations that serve no purpose, contributing to OOM kills in memory-constrained environments.

To reproduce

  1. Call generate_content_stream() with inline image data in contents (e.g. reference images via Part.from_bytes)
  2. Monitor RSS memory — each streaming response allocates a dict equal in size to the full serialized request including all image bytes
  3. With multiple concurrent requests, memory spikes rapidly

Proposed fix

Pass only the config to _from_response() instead of dumping the full parameter model:

# Instead of:
kwargs=parameter_model.model_dump()

# Pass just what _from_response needs:
kwargs={"config": parameter_model.config.model_dump() if parameter_model.config else {}}

This affects ~35 call sites in models.py (all _from_response calls follow the same pattern).

Environment

  • SDK version: google-genai 1.63.0 (also confirmed present in 1.68.0)
  • Python: 3.12
  • OS: Linux (Docker) / macOS

Related

  • slow memory leak #1258 reports a slow memory leak during streaming. Our instrumentation suggests model_dump() is a contributing factor — the large dicts hold references to image bytes that delay GC, especially under concurrent load.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions