Skip to content

Refactor JAX SAC agent's critic update to return gradients separately#426

Open
MorningFrog wants to merge 1 commit into
Toni-SM:developfrom
MorningFrog:morningfrog-develop
Open

Refactor JAX SAC agent's critic update to return gradients separately#426
MorningFrog wants to merge 1 commit into
Toni-SM:developfrom
MorningFrog:morningfrog-develop

Conversation

@MorningFrog

Copy link
Copy Markdown
Contributor

Summary

This PR fixes the JAX SAC critic update logic reported in #425 .

Changes

  • compute and keep separate gradients for critic_1 and critic_2
  • apply critic_1's gradient only to critic_1_optimizer
  • apply critic_2's gradient only to critic_2_optimizer
  • preserve the existing loss/value outputs and overall SAC training flow
  • rename gradient variables to critic_1_grad, critic_2_grad, policy_grad, entropy_grad for clarity (this can be adjusted or reverted if maintainers prefer the previous naming style)

Expected impact

Closes #425

@MorningFrog MorningFrog changed the title Refactor SAC agent's critic update function to return gradients separ… Refactor SAC agent's critic update function to return gradients separately Apr 10, 2026
@MorningFrog MorningFrog changed the title Refactor SAC agent's critic update function to return gradients separately Refactor JAX SAC agent's critic update to return gradients separately Apr 10, 2026
@MorningFrog MorningFrog force-pushed the morningfrog-develop branch from 14f0e03 to 85c3579 Compare May 15, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Critic grad in the JAX implementation of the SAC algorithm

1 participant