Skip to content

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

@Darius888

Description

@Darius888

Hello,

When trying to apply the Sine Wave example approach to a transformer based model I get the following output:

File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented

Regression task setup. Multiple sequences.

Is it possible to somehow work around this ?

Thank you,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions