model_dump() in _generate_content_stream serializes full image contents per response (~32 MB wasted)

## Describe the bug

In `_generate_content_stream()` (and `_generate_content()`, and all async variants), every streaming response creates a full `parameter_model.model_dump()` to pass as `kwargs` to `GenerateContentResponse._from_response()`:

```python
# models.py, line 4324 (and ~35 other identical call sites)
return_value = types.GenerateContentResponse._from_response(
    response=response_dict, kwargs=parameter_model.model_dump()
)
```

`_from_response()` (types.py:6943) only reads `kwargs['config']['response_schema']` and `kwargs['config']['response_json_schema']` from that dict. It never accesses `kwargs['contents']`.

When `contents` includes inline image data (e.g. reference images for image generation), each `model_dump()` call serializes the entire image bytes into a Python dict — then discards them.

## Measured impact

We instrumented the SDK (monkey-patched `_GenerateContentParameters.model_dump` to measure `json.dumps(result)` size) and ran isolated benchmarks:

| Scenario | model_dump() size per call |
|---|---|
| No reference images | ~1 KB |
| 3 reference images (~3 MB each) | **31.7 MB** |
| 5 reference images (~3 MB each) | **52.8 MB** |

With 24 concurrent image generation threads (common for batch image generation), this creates **~760 MB** of transient allocations that serve no purpose, contributing to OOM kills in memory-constrained environments.

## To reproduce

1. Call `generate_content_stream()` with inline image data in `contents` (e.g. reference images via `Part.from_bytes`)
2. Monitor RSS memory — each streaming response allocates a dict equal in size to the full serialized request including all image bytes
3. With multiple concurrent requests, memory spikes rapidly

## Proposed fix

Pass only the config to `_from_response()` instead of dumping the full parameter model:

```python
# Instead of:
kwargs=parameter_model.model_dump()

# Pass just what _from_response needs:
kwargs={"config": parameter_model.config.model_dump() if parameter_model.config else {}}
```

This affects ~35 call sites in `models.py` (all `_from_response` calls follow the same pattern).

## Environment

- **SDK version**: google-genai 1.63.0 (also confirmed present in 1.68.0)
- **Python**: 3.12
- **OS**: Linux (Docker) / macOS

## Related

- #1258 reports a slow memory leak during streaming. Our instrumentation suggests `model_dump()` is a contributing factor — the large dicts hold references to image bytes that delay GC, especially under concurrent load.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_dump() in _generate_content_stream serializes full image contents per response (~32 MB wasted) #2235

Describe the bug

Measured impact

To reproduce

Proposed fix

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	model_dump() size per call
No reference images	~1 KB
3 reference images (~3 MB each)	31.7 MB
5 reference images (~3 MB each)	52.8 MB

model_dump() in _generate_content_stream serializes full image contents per response (~32 MB wasted) #2235

Description

Describe the bug

Measured impact

To reproduce

Proposed fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions