You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 13, 2024. It is now read-only.
Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the policy_loss was computed based on a network which no longer exists
the following code generates an error in some of the most recent versions of
py-torch:oac-explore/trainer/trainer.py
Lines 146 to 159 in cbc0333
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operationIn order to solve it is necessary to move these lines
oac-explore/trainer/trainer.py
Lines 120 to 124 in cbc0333
between the q networks gradient steps and the steps on the policy network as so:
Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the
policy_losswas computed based on a network which no longer exists