Skip to content

Fix: support multimodal model configs for attention-based estimators#435

Merged
ArtemVazh merged 2 commits intoIINemo:mainfrom
vndee:fix/multimodal-model-config-compat
Mar 23, 2026
Merged

Fix: support multimodal model configs for attention-based estimators#435
ArtemVazh merged 2 commits intoIINemo:mainfrom
vndee:fix/multimodal-model-config-compat

Conversation

@vndee
Copy link
Copy Markdown
Contributor

@vndee vndee commented Mar 14, 2026

Summary

  • Multimodal models like Gemma-3 nest text model parameters (e.g. num_attention_heads, num_hidden_layers) under text_config instead of the top-level config object.
  • This causes AttributeError: 'Gemma3Config' object has no attribute 'num_attention_heads' when running attention-based uncertainty estimators.
  • Uses getattr(config, "text_config", config) fallback to resolve these attributes from the correct config level, supporting both standard and multimodal models.

Files changed

  • stat_calculators/greedy_probs.py
  • stat_calculators/sample.py
  • stat_calculators/attention_forward_pass_visual.py
  • estimators/attention_score.py
  • estimators/rauq.py

Reproduction

from transformers import AutoConfig

# Standard model - attributes at top level
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
print(config.num_attention_heads)  # works

# Multimodal model - attributes nested under text_config
config = AutoConfig.from_pretrained("google/gemma-3-12b-it")
print(config.num_attention_heads)  # AttributeError
print(config.text_config.num_attention_heads)  # works

Test plan

  • Verified fix works with Gemma-3-12B-IT (multimodal)
  • Verified fix works with standard models (Phi-4, LLaMA-3.1, Qwen-3, etc.)

…sed estimators

Multimodal models like Gemma-3 nest text model parameters under
`text_config` instead of the top-level config. This causes
`AttributeError: 'Gemma3Config' object has no attribute
'num_attention_heads'` when running attention-based uncertainty
estimators.

Use `getattr(config, "text_config", config)` fallback to resolve
`num_attention_heads` and `num_hidden_layers` from the correct
config level for both standard and multimodal models.
@ArtemVazh
Copy link
Copy Markdown
Collaborator

@vndee Hi! Thank you for your interest in our project! These fixes are definitely valuable, but could you also address the lint and Black issues?

@vndee vndee force-pushed the fix/multimodal-model-config-compat branch from a3b7087 to 679aacd Compare March 22, 2026 10:58
@vndee
Copy link
Copy Markdown
Contributor Author

vndee commented Mar 22, 2026

Hey @ArtemVazh, I have just updated it.

@ArtemVazh ArtemVazh self-requested a review March 23, 2026 10:20
@ArtemVazh ArtemVazh merged commit ec4bbd3 into IINemo:main Mar 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants