Fix: support multimodal model configs for attention-based estimators by vndee · Pull Request #435 · IINemo/lm-polygraph

vndee · 2026-03-14T10:19:38Z

Summary

Multimodal models like Gemma-3 nest text model parameters (e.g. num_attention_heads, num_hidden_layers) under text_config instead of the top-level config object.
This causes AttributeError: 'Gemma3Config' object has no attribute 'num_attention_heads' when running attention-based uncertainty estimators.
Uses getattr(config, "text_config", config) fallback to resolve these attributes from the correct config level, supporting both standard and multimodal models.

Files changed

stat_calculators/greedy_probs.py
stat_calculators/sample.py
stat_calculators/attention_forward_pass_visual.py
estimators/attention_score.py
estimators/rauq.py

Reproduction

from transformers import AutoConfig

# Standard model - attributes at top level
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
print(config.num_attention_heads)  # works

# Multimodal model - attributes nested under text_config
config = AutoConfig.from_pretrained("google/gemma-3-12b-it")
print(config.num_attention_heads)  # AttributeError
print(config.text_config.num_attention_heads)  # works

Test plan

Verified fix works with Gemma-3-12B-IT (multimodal)
Verified fix works with standard models (Phi-4, LLaMA-3.1, Qwen-3, etc.)

…sed estimators Multimodal models like Gemma-3 nest text model parameters under `text_config` instead of the top-level config. This causes `AttributeError: 'Gemma3Config' object has no attribute 'num_attention_heads'` when running attention-based uncertainty estimators. Use `getattr(config, "text_config", config)` fallback to resolve `num_attention_heads` and `num_hidden_layers` from the correct config level for both standard and multimodal models.

ArtemVazh · 2026-03-17T13:37:56Z

@vndee Hi! Thank you for your interest in our project! These fixes are definitely valuable, but could you also address the lint and Black issues?

vndee · 2026-03-22T11:00:21Z

Hey @ArtemVazh, I have just updated it.

style: apply Black formatting to fix flake8 BLK100 errors

679aacd

vndee force-pushed the fix/multimodal-model-config-compat branch from a3b7087 to 679aacd Compare March 22, 2026 10:58

ArtemVazh self-requested a review March 23, 2026 10:20

ArtemVazh approved these changes Mar 23, 2026

View reviewed changes

ArtemVazh merged commit ec4bbd3 into IINemo:main Mar 23, 2026
1 check passed

ArtemVazh mentioned this pull request Mar 23, 2026

Support nested text_config for vision-language models #429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: support multimodal model configs for attention-based estimators#435

Fix: support multimodal model configs for attention-based estimators#435
ArtemVazh merged 2 commits intoIINemo:mainfrom
vndee:fix/multimodal-model-config-compat

vndee commented Mar 14, 2026

Uh oh!

ArtemVazh commented Mar 17, 2026

Uh oh!

vndee commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vndee commented Mar 14, 2026

Summary

Files changed

Reproduction

Test plan

Uh oh!

ArtemVazh commented Mar 17, 2026

Uh oh!

vndee commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants