Skip to content

Support transformers v5#481

Draft
jlamypoirier wants to merge 11 commits intomainfrom
jlp_transformers_v5
Draft

Support transformers v5#481
jlamypoirier wants to merge 11 commits intomainfrom
jlp_transformers_v5

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

✨ Description

jlamypoirier and others added 11 commits April 1, 2026 17:13
- Widen transformers version constraint to >=4.57.3,<6.0.0
- Version-gate PretrainedConfig init (__init__ vs __post_init__) and dtype attribute (torch_dtype vs dtype) using dataclasses.is_dataclass detection
- Fall back to transformers.modeling_utils.no_init_weights for 4.x
- Support both rope_parameters (5.x) and rope_theta/rope_scaling (4.x) in Llama import/export config
- Handle both attribute paths for vision_tower in multimodal HF model test
- Fix mtp_llama LlamaRotaryEmbedding to handle both rope config formats
- Add _gdn_fla_available and _kda_fla_available flags to apriel2; use them to properly skip backup SSM tests when fla kernels are absent
- Update CLAUDE.md with redirect-to-file and external model test guidance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…compatibility

- apriel2/modeling_apriel2.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys
  to dict format for 5.x (list for 4.x); add rope_parameters to PixtralRotaryEmbedding
  SimpleNamespace config
- mtp_llama/modeling_mtp_llama.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys
- apriel2/conversion/llava/config.py: handle 5.x rope_parameters dict in text and
  vision configs alongside 4.x rope_theta
- apriel2/conversion/llava/plan.py: version-conditional source weight key prefixes
  (5.x LlavaForConditionalGeneration adds model. prefix to submodules)
- test_cache_contracts.py: update DynamicLayer.get_mask_sizes calls to pass int in 5.x
  (query_length) vs tensor in 4.x; update sdpa_mask signature for 5.x (q_length/q_offset)
- test_convert_from_llava.py: use version-conditional embed_tokens source key
- test_equivalence.py: fix get_image_features handling — 5.x returns BaseModelOutput
  with projected features in pooler_output (not last_hidden_state)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix num_blocks off-by-one in import_config (was subtracting 1)
- Fix num_hidden_layers off-by-one in export_config (was adding 1)
- Fix mtp_heads index off-by-one in get_converters (was prediction_distance - 1)
- Fix hidden state collection order in MTPLlamaModel: add embedding before
  trunk loop and add trunk layer outputs inside the loop, consistent with
  standard transformers @capture_outputs behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update TOKENIZER_NAME from "bigcode/santacoder" to "gpt2" and update all
hardcoded token values in data tests to match the gpt2 vocabulary.
Also fix deprecated huggingface_hub.HfFolder.get_token() → get_token().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ERS_V4

- Deduplicate rope-type dispatch in LlamaAttentionConverter.import_config by
  normalizing rope_params/rope_theta from either checkpoint format first
- Rename _TRANSFORMERS_V5 → _TRANSFORMERS_V4 (inverted flag) so v4 compat
  code is in `if _TRANSFORMERS_V4:` blocks — grep-and-delete to drop v4
- Flip all if/else so v5 code is the default path and v4 is the guarded branch
- Import _TRANSFORMERS_V4 from config.py in huggingface.py; replace try/except
  with explicit if/else
- Add comments for v5 changes that can't use the flag (TYPE_CHECKING guard,
  checkpoint format detection, model.model structure)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use tuple prefixes unpacked into W(...) instead of the / operator,
keeping the _TRANSFORMERS_V4 branching for the path prefix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep llava_layer/apriel_layer intermediate variables (with / operator)
in loops; only the layer root W() calls use *prefix unpacking.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant