Draft
Conversation
- Widen transformers version constraint to >=4.57.3,<6.0.0 - Version-gate PretrainedConfig init (__init__ vs __post_init__) and dtype attribute (torch_dtype vs dtype) using dataclasses.is_dataclass detection - Fall back to transformers.modeling_utils.no_init_weights for 4.x - Support both rope_parameters (5.x) and rope_theta/rope_scaling (4.x) in Llama import/export config - Handle both attribute paths for vision_tower in multimodal HF model test - Fix mtp_llama LlamaRotaryEmbedding to handle both rope config formats - Add _gdn_fla_available and _kda_fla_available flags to apriel2; use them to properly skip backup SSM tests when fla kernels are absent - Update CLAUDE.md with redirect-to-file and external model test guidance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…compatibility - apriel2/modeling_apriel2.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys to dict format for 5.x (list for 4.x); add rope_parameters to PixtralRotaryEmbedding SimpleNamespace config - mtp_llama/modeling_mtp_llama.py: add _TRANSFORMERS_V5 flag; fix _tied_weights_keys - apriel2/conversion/llava/config.py: handle 5.x rope_parameters dict in text and vision configs alongside 4.x rope_theta - apriel2/conversion/llava/plan.py: version-conditional source weight key prefixes (5.x LlavaForConditionalGeneration adds model. prefix to submodules) - test_cache_contracts.py: update DynamicLayer.get_mask_sizes calls to pass int in 5.x (query_length) vs tensor in 4.x; update sdpa_mask signature for 5.x (q_length/q_offset) - test_convert_from_llava.py: use version-conditional embed_tokens source key - test_equivalence.py: fix get_image_features handling — 5.x returns BaseModelOutput with projected features in pooler_output (not last_hidden_state) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix num_blocks off-by-one in import_config (was subtracting 1) - Fix num_hidden_layers off-by-one in export_config (was adding 1) - Fix mtp_heads index off-by-one in get_converters (was prediction_distance - 1) - Fix hidden state collection order in MTPLlamaModel: add embedding before trunk loop and add trunk layer outputs inside the loop, consistent with standard transformers @capture_outputs behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update TOKENIZER_NAME from "bigcode/santacoder" to "gpt2" and update all hardcoded token values in data tests to match the gpt2 vocabulary. Also fix deprecated huggingface_hub.HfFolder.get_token() → get_token(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ERS_V4 - Deduplicate rope-type dispatch in LlamaAttentionConverter.import_config by normalizing rope_params/rope_theta from either checkpoint format first - Rename _TRANSFORMERS_V5 → _TRANSFORMERS_V4 (inverted flag) so v4 compat code is in `if _TRANSFORMERS_V4:` blocks — grep-and-delete to drop v4 - Flip all if/else so v5 code is the default path and v4 is the guarded branch - Import _TRANSFORMERS_V4 from config.py in huggingface.py; replace try/except with explicit if/else - Add comments for v5 changes that can't use the flag (TYPE_CHECKING guard, checkpoint format detection, model.model structure) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use tuple prefixes unpacked into W(...) instead of the / operator, keeping the _TRANSFORMERS_V4 branching for the path prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep llava_layer/apriel_layer intermediate variables (with / operator) in loops; only the layer root W() calls use *prefix unpacking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Description