flash feature refactor #778

michaelfeil · 2025-12-18T00:28:19Z

This PR extends the Qwen2 architecture to other models other than Alibaba-NLP/gte-Qwen2-7B-instruct, given that the prior implementation was only covering such cases so as to use causal attention on CUDA and to rely on the provided tokenizer rather than patching it.

What does this PR do?

makes flash-attn-3 and flash-attn-cpu easier to add.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

This PR extends the Qwen2 architecture to other models other than `Alibaba-NLP/gte-Qwen2-7B-instruct`, given that the prior implementation was only covering such cases so as to use causal attention on CUDA and to rely on the provided tokenizer rather than patching it. # What does this PR do? - makes flash-attn-3 and flash-attn-cpu easier to add.   Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/text-embeddings-inference/blob/main/CONTRIBUTING.md)? - [ ] Was this discussed/approved via a GitHub issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs). - [ ] Did you write any new necessary tests? If applicable, did you include or update the `insta` snapshots? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

kozistr

looks good to me!

kozistr · 2025-12-20T10:30:24Z

backends/candle/src/lib.rs

+    let flash_attn_enabled = &std::env::var("USE_FLASH_ATTENTION").unwrap_or("true".to_string()).to_lowercase() == "true";
+
+    if cfg!(not(feature = "cuda")) {
+        // if not cuda support, always false for now.
+        return false;
+    };


It's a nit; how about moving line 121 right after line 126 to the early-return non-cuda case?

Suggested change

let flash_attn_enabled = &std::env::var("USE_FLASH_ATTENTION").unwrap_or("true".to_string()).to_lowercase() == "true";

if cfg!(not(feature = "cuda")) {

// if not cuda support, always false for now.

return false;

};

if cfg!(not(feature = "cuda")) {

// if not cuda support, always false for now.

return false;

};

let flash_attn_enabled = &std::env::var("USE_FLASH_ATTENTION").unwrap_or("true".to_string()).to_lowercase() == "true";

michaelfeil and others added 3 commits December 18, 2025 00:28

add varlen attention interface

6679919

Update lib.rs

6e5b163

kozistr approved these changes Dec 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash feature refactor #778

flash feature refactor #778

michaelfeil commented Dec 18, 2025

Uh oh!

kozistr left a comment

Uh oh!

kozistr Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flash feature refactor #778

Are you sure you want to change the base?

flash feature refactor #778

Conversation

michaelfeil commented Dec 18, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

kozistr left a comment

Choose a reason for hiding this comment

Uh oh!

kozistr Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants