Skip to content

fix: resolve runtime crashes and mypy errors with transformers >= 4.57 + Pydantic v2#369

Draft
rahul-tuli wants to merge 3 commits intomainfrom
fix/mypy-py313-regressions
Draft

fix: resolve runtime crashes and mypy errors with transformers >= 4.57 + Pydantic v2#369
rahul-tuli wants to merge 3 commits intomainfrom
fix/mypy-py313-regressions

Conversation

@rahul-tuli
Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli commented Mar 27, 2026

Summary

Fix three runtime crashes, three static type errors, and one integration test
failure introduced by transformers v5 (>= 4.57.x with py.typed inline stubs,
PretrainedConfig rewritten as @strict @dataclass) when used with Pydantic v2
and mypy 1.15.0.

All fixes are backwards-compatible with transformers 4.57.x (verified via
Eagle3SpeculatorConfig construction and save_pretrained/from_pretrained
roundtrip under both transformers 4.57.6 and 5.4.0).

Closes #370.


Runtime crashes (affect Python 3.10+ with transformers >= 4.57.x + Pydantic v2)

1. PydanticUndefinedAnnotation: name 'torch' is not defined (import-time crash)

Error — every test file fails at collection:

pydantic.errors.PydanticUndefinedAnnotation: name 'torch' is not defined

Root causereload_schemas() in speculators/__init__.py calls
model_rebuild(force=True) on SpeculatorModelConfig. transformers v5 added
dtype: Union[str, "torch.dtype"] | None to PretrainedConfig. Pydantic
evaluates this forward reference during model_rebuild(), but torch is not
in Pydantic's evaluation namespace.

FixSpeculatorModelConfig.reload_schema() passes
_types_namespace={"torch": torch} to model_rebuild() AND explicitly iterates
cls.registry.values() to rebuild each subclass (parent rebuild does not
propagate to subclasses).

Backwards compat — transformers 4.57.x has no dtype: "torch.dtype"
annotation; passing extra keys to _types_namespace is harmless.


2. AttributeError: 'Eagle3SpeculatorConfig' has no attribute '__pydantic_fields_set__' (construction crash)

Error — constructing any SpeculatorModelConfig subclass that does not define
its own __init__:

AttributeError: 'Eagle3SpeculatorConfig' object has no attribute '__pydantic_fields_set__'

Root cause — transformers v5's PretrainedConfig.__init_subclass__ applies
@dataclass(repr=False) + wrap_init_to_accept_kwargs to every subclass that
lacks __init__ in cls.__dict__. This replaces the inherited Pydantic
SpeculatorModelConfig.__init__ with a dataclass-generated wrapper that calls
setattr(self, f.name, f.default) for each field before Pydantic has
initialized __pydantic_fields_set__, triggering Pydantic's __setattr__ too
early.

FixSpeculatorModelConfig.__init_subclass__ injects
cls.__init__ = SpeculatorModelConfig.__init__ into each subclass's __dict__
BEFORE calling super().__init_subclass__(). The check
"__init__" in cls.__dict__ in transformers then skips wrapping.

Backwards compat — transformers 4.57.x does not apply
@dataclass + wrap_init_to_accept_kwargs in __init_subclass__; the injection
is a no-op.


3. TypeError: BaseModel.validate() missing 1 required positional argument: 'value' (save_pretrained crash)

Errormodel.save_pretrained(path) raises:

TypeError: BaseModel.validate() missing 1 required positional argument: 'value'

from transformers/configuration_utils.py:517: self.validate().

Root cause — transformers v5's @strict decorator adds a validate()
instance method to PretrainedConfig that runs class validators
(validate_architecture, validate_token_ids, etc.). Pydantic's
BaseModel.validate(cls, value) classmethod appears earlier in the MRO and
shadows it. When save_pretrained() calls self.validate(), it hits
BaseModel.validate which requires a value argument.

FixSpeculatorModelConfig.validate() instance method delegates explicitly
to PretrainedConfig.validate(self), with a
hasattr(PretrainedConfig, "validate") guard for transformers < 5.x (4.57.x
does not have this method).

Backwards compat — transformers 4.57.x has no validate() on
PretrainedConfig; the hasattr guard correctly skips delegation.


Static type errors (mypy 1.15.0)

4. 79 [call-arg] errors on LlamaConfig(...) across 5 files

Error — mypy 1.15.0 reports on every LlamaConfig(vocab_size=..., ...) call:

error: Unexpected keyword argument "vocab_size" for "LlamaConfig"  [call-arg]
error: Unexpected keyword argument "hidden_size" for "LlamaConfig"  [call-arg]
... (79 errors total across 5 files)

Root cause — transformers v5's @strict(accept_kwargs=True) wraps
PretrainedConfig.__init__ using @wraps. mypy 1.15.0 follows __wrapped__
back to PretrainedConfig.__init__, which only declares base-class fields. All
LlamaConfig-specific kwargs appear unknown.

Fix — Use llama_kwargs: dict[str, Any] = {...}; LlamaConfig(**llama_kwargs).
mypy cannot verify individual key names in dict[str, Any] unpacking, so
[call-arg] is bypassed without type: ignore suppressions:

# Use dict unpacking to work around transformers v5's @strict decorator
# which wraps __init__ via @wraps, hiding LlamaConfig fields from mypy.
llama_kwargs: dict[str, Any] = {
    "vocab_size": eagle_config.get("vocab_size", 32000),
    "hidden_size": eagle_config.get("hidden_size", 4096),
    ...
}
return LlamaConfig(**llama_kwargs)

5. [misc] incompatible forward() override in eagle3/model_definitions.py

Error:

error: Definition of "forward" in base class "LlamaDecoderLayer" is incompatible
       with definition in base class "Qwen3DecoderLayer"  [misc]

Root cause — transformers v5 removed cache_position: torch.LongTensor | None
from the explicit parameter list of LlamaDecoderLayer.forward() and
Qwen3DecoderLayer.forward() (it now flows through **kwargs). Our mixin still
declared cache_position, causing an incompatible-override error.

Fix — Remove cache_position from Eagle3FirstLayerMixin.forward() and the
cache_position=cache_position kwarg in the self.self_attn(...) call. It flows
through **kwargs.


6. [assignment] on base_model_pp_plan in config.py

Error:

error: Incompatible types in assignment (expression has type
       "dict[str, tuple[list[str]]] | None", base class "PretrainedConfig"
       defined the type as "dict[str, Sequence[list[str]]] | None")  [assignment]

Root cause — transformers v5 widened PretrainedConfig.base_model_pp_plan
from dict[str, tuple[list[str]]] to dict[str, Sequence[list[str]]].

Fix — Align SpeculatorModelConfig.base_model_pp_plan annotation to
Sequence[list[str]].


Integration test fix

7. KeyError in test_verifier_config_from_verifier_config

ErrorPretrainedConfig.from_pretrained("RedHatAI/Llama-3.1-8B-Instruct")
raises a KeyError during rope_scaling validation.

Root cause — transformers v5 strictly validates rope_scaling and requires
rope_theta inside rope_scaling when rope_type=llama3. The
RedHatAI/Llama-3.1-8B-Instruct Hub config predates this requirement.

Fix — Use PretrainedConfig.get_config_dict() (raw fetch, no validation),
drop rope_scaling, then construct PretrainedConfig(**config_dict).
VerifierConfig.from_config() only reads architectures, not rope parameters.


Files changed (9 files)

File Changes
src/speculators/config.py Fixes 1, 2, 3, 6 (reload_schema, __init_subclass__, validate, base_model_pp_plan)
src/speculators/convert/eagle/eagle_converter.py Fix 4 (dict unpacking for LlamaConfig)
src/speculators/convert/eagle/eagle3_converter.py Fix 4 (dict unpacking for LlamaConfig)
src/speculators/models/eagle3/model_definitions.py Fix 5 (remove cache_position)
tests/unit/models/test_eagle_config.py Fix 4 (dict unpacking for LlamaConfig)
tests/unit/models/test_eagle_model.py Fix 4 (dict unpacking for LlamaConfig)
tests/unit/train/test_setup_model.py Fix 4 (dict unpacking for LlamaConfig)
tests/integration/test_config.py Fix 7 (rope_scaling workaround)
pyproject.toml Pin transformers >= 4.57.0

Test results

  • pytest tests/unit/ — 193/194 pass (1 failure: distributed test requiring nccl
    port 29500, already in use — pre-existing env issue, unrelated to these changes)
  • pytest tests/integration/ — 2 passed, 7 skipped
  • Eagle3SpeculatorConfig(...) construction: OK under both 4.57.6 and 5.4.0
  • model.save_pretrained(path) + from_pretrained(path) roundtrip: OK under both versions
  • from speculators import reload_schemas; reload_schemas(): OK under both versions

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/23651808998/artifacts/6146021792.
They will be retained for up to 30 days.
Commit: 70fcb53

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 27, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to address the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/speculators/blob/main/CONTRIBUTING.md

@rahul-tuli rahul-tuli force-pushed the fix/mypy-py313-regressions branch from eb10ce2 to fdf5ddf Compare March 27, 2026 13:07
@rahul-tuli rahul-tuli changed the title fix(mypy): suppress Python 3.13 mypy regressions from transformers v5 stubs fix(mypy): fix Python 3.13 + mypy 1.15.0 + transformers 5.4.0 type errors Mar 27, 2026
@mergify mergify bot removed the quality-failed label Mar 27, 2026
…ibility

transformers v5 (available since 4.57.x as py.typed with inline stubs) introduced
three incompatibilities with our Pydantic + PretrainedConfig multiple-inheritance
pattern, causing test collection failures and runtime crashes.

## Problem 1 — PydanticUndefinedAnnotation at import time
Symptom: all tests fail at collection with:
  pydantic.errors.PydanticUndefinedAnnotation: name 'torch' is not defined

Root cause: reload_schemas() calls model_rebuild(force=True) on SpeculatorModelConfig
and subclasses. transformers v5's PretrainedConfig.dtype field uses the forward
reference "torch.dtype". Pydantic evaluates this at rebuild time, but torch is not
in the evaluation namespace, so the rebuild fails.

Fix: SpeculatorModelConfig.reload_schema() overrides the base implementation to pass
_types_namespace={"torch": torch} to model_rebuild(), and rebuilds each registered
subclass (Eagle3SpeculatorConfig, EagleSpeculatorConfig, etc.) individually since
model_rebuild() on a parent does not propagate to subclasses.

## Problem 2 — AttributeError on SpeculatorModelConfig subclass construction
Symptom: constructing Eagle3SpeculatorConfig (or any subclass that doesn't define
its own __init__) raises:
  AttributeError: 'Eagle3SpeculatorConfig' object has no attribute '__pydantic_fields_set__'

Root cause: transformers v5's PretrainedConfig.__init_subclass__ applies @DataClass
and wrap_init_to_accept_kwargs to every subclass that lacks __init__ in cls.__dict__.
This replaces the inherited SpeculatorModelConfig.__init__ with a dataclass-generated
wrapper that calls setattr() for every field before Pydantic can initialize
__pydantic_fields_set__, triggering Pydantic's __setattr__ too early.

Fix: SpeculatorModelConfig.__init_subclass__ injects __init__ into each subclass's
__dict__ before super().__init_subclass__() runs. PretrainedConfig.__init_subclass__
checks "__init__" in cls.__dict__ before wrapping, so the injection prevents the
dataclass wrapper from running. Python's @DataClass(repr=False) also skips __init__
generation when the class already defines one.

## Problem 3 — TypeError in save_pretrained via self.validate()
Symptom: save_pretrained() raises:
  TypeError: BaseModel.validate() missing 1 required positional argument: 'value'

Root cause: transformers v5's @strict decorator adds a validate() instance method to
PretrainedConfig to run class validators (validate_architecture, validate_token_ids,
etc.). Pydantic's BaseModel.validate() is a classmethod (def validate(cls, value))
that comes earlier in our MRO, shadowing PretrainedConfig.validate(). When
save_pretrained() calls self.validate(), it hits BaseModel.validate() which requires
a value argument.

Fix: SpeculatorModelConfig.validate() explicitly delegates to PretrainedConfig.validate()
so the @strict validators run correctly.

## mypy fixes (Python 3.13 + mypy 1.15.0)
- config.py: base_model_pp_plan type widened from tuple[list[str]] to Sequence[list[str]]
  to match the updated PretrainedConfig class variable declaration in transformers v5
- model_definitions.py: Eagle3FirstLayerMixin.forward() drops cache_position parameter
  which was removed from LlamaDecoderLayer.forward() / Qwen3DecoderLayer.forward() in v5;
  keeping it caused [misc] incompatible-override errors on both base classes
- eagle_converter.py, eagle3_converter.py, test_eagle_config.py, test_eagle_model.py,
  test_setup_model.py: LlamaConfig() calls flagged as [call-arg] because transformers v5's
  @strict decorator wraps LlamaConfig.__init__ via @wraps, which makes mypy follow
  __wrapped__ back to PretrainedConfig.__init__ (losing all LlamaConfig-specific fields).
  Fix: use llama_kwargs: dict[str, Any] = {...}; LlamaConfig(**llama_kwargs) — mypy cannot
  check specific key names when unpacking dict[str, Any], so [call-arg] is bypassed without
  any type: ignore suppressions.

pyproject.toml: add local/ and output/ to ruff's exclude list so local experiment
artifacts don't trigger lint errors during development.

Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
@rahul-tuli rahul-tuli force-pushed the fix/mypy-py313-regressions branch from fdf5ddf to 3d6fcc5 Compare March 27, 2026 14:14
@rahul-tuli rahul-tuli marked this pull request as draft March 27, 2026 14:17
…egration test

transformers v5 added strict validation for rope_type=llama3 configs, requiring
rope_theta inside rope_scaling. RedHatAI/Llama-3.1-8B-Instruct's Hub config
predates this requirement, so PretrainedConfig.from_pretrained() now raises:

  KeyError: Missing required keys in `rope_parameters` for 'rope_type'='llama3': {'rope_theta'}

The test only needs architectures from the config (to feed VerifierConfig.from_config).
Use get_config_dict() to fetch the raw dict without triggering validation, drop
rope_scaling, then construct a minimal PretrainedConfig for the test assertion.

Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
…rmers < 5.x

transformers 4.57.x does not add validate() to PretrainedConfig (@strict was not
yet present in that release line). Calling PretrainedConfig.validate(self)
unconditionally raises AttributeError on 4.57.x. Add a hasattr() guard so the
delegation only runs when validate() actually exists (transformers >= 5.x).

The __init_subclass__ and reload_schema fixes are already no-ops on 4.57.x:
- 4.57.x does not apply @DataClass + wrap_init_to_accept_kwargs in __init_subclass__
- 4.57.x has no dtype: "torch.dtype" forward reference in PretrainedConfig

Confirmed: Eagle3SpeculatorConfig construction and save_pretrained/from_pretrained
roundtrip work correctly under both transformers 4.57.6 and 5.4.0.

Signed-off-by: Rahul-Tuli <rtuli@redhat.com>
@rahul-tuli rahul-tuli changed the title fix(mypy): fix Python 3.13 + mypy 1.15.0 + transformers 5.4.0 type errors fix: resolve runtime crashes and mypy errors with transformers >= 4.57 + Pydantic v2 Mar 27, 2026
@dsikka
Copy link
Copy Markdown
Collaborator

dsikka commented Mar 27, 2026

@rahul-tuli What else do we need to do before this is ready for review?

Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors were introduced by transformers 5.4.0 which was released yesterday. This seems to have broken type checking for model configs and caused an issue when PreTrainedConfig is used as a pydantic field.

I don't think this pr is the right way to solve these issues. These are problems with transformers itself and shouldn't be solved by:

  1. Replacing all instantiations of transformers configs with creating a dict that then gets passed in as **kwargs to the config class. This is a hacky workaround that tricks the type checker into not validating the types.
  2. The code in src/speculators/config.py which manually injects torch into the namespace used to rebuild config because it wasn't included correctly in the transformers code.

These are issues with the upstream transformers project, and therefore should be solved there, rather than patched here. I've opened issues for both of these problems in the transformers repo. In the meantime, I suggest we cap transformers at <5.4.0.

@dsikka
Copy link
Copy Markdown
Collaborator

dsikka commented Mar 27, 2026

I want to also note that we will be capping to transformers <v5 for the next release.

I think it may make sense to revert upgrading if the time to fully support v5 is going to surpass our release time which is about 1-2 weeks out.

Or at least capping the version so that we have a green CI which is required for dflash

Cc @fynnsu

@fynnsu
Copy link
Copy Markdown
Collaborator

fynnsu commented Mar 27, 2026

@dsikka I opened #372 which just caps the version at <5.4.0. This would allow us to support "v5" without needed this workaround while the upstream issues are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

transformers >= 4.57 + Pydantic v2: runtime crashes and mypy errors in speculators

3 participants