Support transformers v5 by jlamypoirier · Pull Request #481 · ServiceNow/Fast-LLM

jlamypoirier · 2026-04-09T19:41:35Z

✨ Description

- Widen transformers version constraint to >=4.57.3,<6.0.0 - Version-gate PretrainedConfig init (__init__ vs __post_init__) and dtype attribute (torch_dtype vs dtype) using dataclasses.is_dataclass detection - Fall back to transformers.modeling_utils.no_init_weights for 4.x - Support both rope_parameters (5.x) and rope_theta/rope_scaling (4.x) in Llama import/export config - Handle both attribute paths for vision_tower in multimodal HF model test - Fix mtp_llama LlamaRotaryEmbedding to handle both rope config formats - Add _gdn_fla_available and _kda_fla_available flags to apriel2; use them to properly skip backup SSM tests when fla kernels are absent - Update CLAUDE.md with redirect-to-file and external model test guidance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…compatibility - apriel2/modeling_apriel2.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys to dict format for 5.x (list for 4.x); add rope_parameters to PixtralRotaryEmbedding SimpleNamespace config - mtp_llama/modeling_mtp_llama.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys - apriel2/conversion/llava/config.py: handle 5.x rope_parameters dict in text and vision configs alongside 4.x rope_theta - apriel2/conversion/llava/plan.py: version-conditional source weight key prefixes (5.x LlavaForConditionalGeneration adds model. prefix to submodules) - test_cache_contracts.py: update DynamicLayer.get_mask_sizes calls to pass int in 5.x (query_length) vs tensor in 4.x; update sdpa_mask signature for 5.x (q_length/q_offset) - test_convert_from_llava.py: use version-conditional embed_tokens source key - test_equivalence.py: fix get_image_features handling — 5.x returns BaseModelOutput with projected features in pooler_output (not last_hidden_state) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix num_blocks off-by-one in import_config (was subtracting 1) - Fix num_hidden_layers off-by-one in export_config (was adding 1) - Fix mtp_heads index off-by-one in get_converters (was prediction_distance - 1) - Fix hidden state collection order in MTPLlamaModel: add embedding before trunk loop and add trunk layer outputs inside the loop, consistent with standard transformers @capture_outputs behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update TOKENIZER_NAME from "bigcode/santacoder" to "gpt2" and update all hardcoded token values in data tests to match the gpt2 vocabulary. Also fix deprecated huggingface_hub.HfFolder.get_token() → get_token(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ERS_V4 - Deduplicate rope-type dispatch in LlamaAttentionConverter.import_config by normalizing rope_params/rope_theta from either checkpoint format first - Rename _TRANSFORMERS_V5 → _TRANSFORMERS_V4 (inverted flag) so v4 compat code is in `if _TRANSFORMERS_V4:` blocks — grep-and-delete to drop v4 - Flip all if/else so v5 code is the default path and v4 is the guarded branch - Import _TRANSFORMERS_V4 from config.py in huggingface.py; replace try/except with explicit if/else - Add comments for v5 changes that can't use the flag (TYPE_CHECKING guard, checkpoint format detection, model.model structure) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use tuple prefixes unpacked into W(...) instead of the / operator, keeping the _TRANSFORMERS_V4 branching for the path prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Keep llava_layer/apriel_layer intermediate variables (with / operator) in loops; only the layer root W() calls use *prefix unpacking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jlamypoirier and others added 11 commits April 1, 2026 17:13

Merge remote-tracking branch 'origin/main' into jlp_transformers_v5

68aad4f

Merge branch 'main' into jlp_transformers_v5

f3edd84

Replace W-object path chaining with explicit W() calls in plan.py

c928006

Use tuple prefixes unpacked into W(...) instead of the / operator, keeping the _TRANSFORMERS_V4 branching for the path prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Restore loop structure in plan.py; use prefix tuples only at layer init

7567cb8

Keep llava_layer/apriel_layer intermediate variables (with / operator) in loops; only the layer root W() calls use *prefix unpacking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix

3c8241e

Fix mtp llama test

6e4a477

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support transformers v5#481

Support transformers v5#481
jlamypoirier wants to merge 11 commits intomainfrom
jlp_transformers_v5

jlamypoirier commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlamypoirier commented Apr 9, 2026

✨ Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant