Skip to content

[Feature] Evaluator weight sync via WeightSyncScheme + multi-model support#3627

Open
vmoens wants to merge 3 commits intomainfrom
evaluator-weight-sync
Open

[Feature] Evaluator weight sync via WeightSyncScheme + multi-model support#3627
vmoens wants to merge 3 commits intomainfrom
evaluator-weight-sync

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Apr 13, 2026

Summary

  • Remove _ProcessEvalBackend / _process_eval_worker — the process backend is now a _ThreadEvalBackend with a MultiSyncCollector (1 worker), eliminating custom mp.Queue weight serialization
  • Add weight_sync_schemes param to Evaluator for scheme-based cross-process weight sync (reuses the collector weight-sync infrastructure)
  • Add weights_dict param to evaluate() / trigger_eval() for multi-model sync (policy + env transforms + VecNormV2 running stats)
  • Extend WeightStrategy tensordict mode to capture get_extra_state() / set_extra_state() (needed for VecNormV2 running mean/var/count which aren't in TensorDict.from_module())
  • Document multi-model weight sync for both regular collectors and evaluators in collectors_weightsync.rst

Test plan

  • 6 new TestWeightStrategyExtraState tests — extract/apply/roundtrip for get_extra_state
  • 4 new TestEvaluatorWeightsDict tests — weights_dict API (policy-only, backward compat, merged, async)
  • 5 new TestEvaluatorProcessBackendAsMultiCollector tests — process backend via MultiSyncCollector
  • All 61 existing evaluator tests pass (2 skipped: multi-CUDA)

🤖 Generated with Claude Code

…pport

Replace the process backend's custom mp.Queue weight serialization with a
MultiSyncCollector (1 worker), and add weights_dict support for syncing
multiple models (policy, env transforms, VecNormV2 running stats).

- Remove _ProcessEvalBackend / _process_eval_worker entirely
- backend="process" now creates _ThreadEvalBackend + MultiSyncCollector
- Add weight_sync_schemes param to Evaluator for scheme-based sync
- Add weights_dict param to evaluate() / trigger_eval()
- Extend WeightStrategy tensordict mode to capture get_extra_state()
- Document multi-model weight sync for both collectors and evaluators

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3627

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit b94b2fc with merge base 09ef76d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 13, 2026
@github-actions github-actions bot added Feature New feature Documentation Improvements or additions to documentation Collectors WeightUpdate Integrations/torch_geometric Integrations and removed Feature New feature labels Apr 13, 2026
Evaluator environments should not accumulate running statistics during
evaluation — they should only receive frozen stats from the training env.
This adds auto-freeze for VecNormV2 transforms in all evaluator env
creation paths (eager, lazy init, and multi-collector factory wrapping),
plus documentation for both evaluator and regular collector freeze patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the Feature New feature label Apr 13, 2026
_wrap_env_factory_frozen now returns an EnvCreator subclass when given an
EnvCreator, preserving pre-computed meta_data and shared-memory state
dicts that MultiSyncCollector relies on. Plain callables still get a
simple wrapper as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.6271μs 78.9097μs 12.6727 KOps/s 12.6760 KOps/s $\color{#d91a1a}-0.03\%$
test_tensor_to_bytestream_speed[torch.save] 0.1370ms 0.1368ms 7.3082 KOps/s 7.2810 KOps/s $\color{#35bf28}+0.37\%$
test_tensor_to_bytestream_speed[untyped_storage] 99.7659ms 99.4231ms 10.0580 Ops/s 10.1729 Ops/s $\color{#d91a1a}-1.13\%$
test_tensor_to_bytestream_speed[numpy] 2.4142μs 2.4037μs 416.0315 KOps/s 421.4131 KOps/s $\color{#d91a1a}-1.28\%$
test_tensor_to_bytestream_speed[safetensors] 34.9193μs 34.7884μs 28.7453 KOps/s 28.0645 KOps/s $\color{#35bf28}+2.43\%$
test_simple 0.5349s 0.5331s 1.8759 Ops/s 1.7846 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_transformed 1.0653s 1.0607s 0.9428 Ops/s 0.9152 Ops/s $\color{#35bf28}+3.02\%$
test_serial 1.6529s 1.6448s 0.6080 Ops/s 0.5871 Ops/s $\color{#35bf28}+3.55\%$
test_parallel 0.9966s 0.9929s 1.0072 Ops/s 0.9771 Ops/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[True-True-True-True-True] 0.1276ms 39.9757μs 25.0152 KOps/s 24.5024 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[True-True-True-True-False] 45.6920μs 21.8617μs 45.7420 KOps/s 45.2722 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-True-True-False-True] 44.9320μs 22.3060μs 44.8311 KOps/s 43.2124 KOps/s $\color{#35bf28}+3.75\%$
test_step_mdp_speed[True-True-True-False-False] 43.6620μs 12.0980μs 82.6583 KOps/s 79.6768 KOps/s $\color{#35bf28}+3.74\%$
test_step_mdp_speed[True-True-False-True-True] 66.5930μs 42.0493μs 23.7816 KOps/s 23.2292 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[True-True-False-True-False] 52.8230μs 24.0699μs 41.5456 KOps/s 40.1852 KOps/s $\color{#35bf28}+3.39\%$
test_step_mdp_speed[True-True-False-False-True] 53.0430μs 25.2006μs 39.6817 KOps/s 37.9495 KOps/s $\color{#35bf28}+4.56\%$
test_step_mdp_speed[True-True-False-False-False] 43.9620μs 14.7913μs 67.6072 KOps/s 64.1571 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_step_mdp_speed[True-False-True-True-True] 77.2340μs 45.2939μs 22.0780 KOps/s 21.7039 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[True-False-True-True-False] 64.5330μs 26.7599μs 37.3694 KOps/s 36.6670 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[True-False-True-False-True] 53.2520μs 24.8377μs 40.2614 KOps/s 38.6403 KOps/s $\color{#35bf28}+4.20\%$
test_step_mdp_speed[True-False-True-False-False] 37.3820μs 14.8485μs 67.3467 KOps/s 67.4210 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[True-False-False-True-True] 99.5650μs 47.8919μs 20.8804 KOps/s 20.6393 KOps/s $\color{#35bf28}+1.17\%$
test_step_mdp_speed[True-False-False-True-False] 63.6430μs 29.9513μs 33.3876 KOps/s 34.0058 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-False-False-False-True] 54.6630μs 27.6169μs 36.2097 KOps/s 36.9384 KOps/s $\color{#d91a1a}-1.97\%$
test_step_mdp_speed[True-False-False-False-False] 42.9020μs 17.0556μs 58.6318 KOps/s 57.5991 KOps/s $\color{#35bf28}+1.79\%$
test_step_mdp_speed[False-True-True-True-True] 86.3250μs 45.6509μs 21.9054 KOps/s 22.4015 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-True-True-True-False] 61.5730μs 27.3900μs 36.5097 KOps/s 36.6124 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[False-True-True-False-True] 2.4135ms 29.1850μs 34.2641 KOps/s 34.3735 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[False-True-True-False-False] 51.9230μs 16.4338μs 60.8502 KOps/s 60.9348 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[False-True-False-True-True] 76.1940μs 47.0445μs 21.2565 KOps/s 20.6126 KOps/s $\color{#35bf28}+3.12\%$
test_step_mdp_speed[False-True-False-True-False] 60.9040μs 29.3892μs 34.0262 KOps/s 33.7245 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-True-False-False-True] 70.7530μs 31.5854μs 31.6602 KOps/s 32.0377 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[False-True-False-False-False] 45.9220μs 18.9448μs 52.7849 KOps/s 53.3976 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[False-False-True-True-True] 84.4950μs 49.6394μs 20.1453 KOps/s 19.6786 KOps/s $\color{#35bf28}+2.37\%$
test_step_mdp_speed[False-False-True-True-False] 59.3830μs 31.6679μs 31.5778 KOps/s 31.6849 KOps/s $\color{#d91a1a}-0.34\%$
test_step_mdp_speed[False-False-True-False-True] 59.4530μs 30.8980μs 32.3646 KOps/s 32.7792 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-False-True-False-False] 49.0820μs 18.8264μs 53.1168 KOps/s 53.3085 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[False-False-False-True-True] 0.1021ms 52.7498μs 18.9574 KOps/s 19.2784 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[False-False-False-True-False] 0.1168ms 33.5083μs 29.8433 KOps/s 29.2426 KOps/s $\color{#35bf28}+2.05\%$
test_step_mdp_speed[False-False-False-False-True] 55.7630μs 32.4205μs 30.8446 KOps/s 28.9289 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_step_mdp_speed[False-False-False-False-False] 47.9530μs 20.8050μs 48.0653 KOps/s 46.1528 KOps/s $\color{#35bf28}+4.14\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7014s 0.6981s 1.4325 Ops/s 1.3552 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6961s 0.5904s 1.6938 Ops/s 1.6783 Ops/s $\color{#35bf28}+0.92\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6842s 1.5932s 0.6277 Ops/s 0.6187 Ops/s $\color{#35bf28}+1.44\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4688s 1.3812s 0.7240 Ops/s 0.7136 Ops/s $\color{#35bf28}+1.46\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9184s 1.8382s 0.5440 Ops/s 0.5365 Ops/s $\color{#35bf28}+1.40\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7067s 1.6314s 0.6130 Ops/s 0.6084 Ops/s $\color{#35bf28}+0.75\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6520s 4.4595s 0.2242 Ops/s 0.2225 Ops/s $\color{#35bf28}+0.77\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.3600s 4.2342s 0.2362 Ops/s 0.2294 Ops/s $\color{#35bf28}+2.96\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9106s 1.8115s 0.5520 Ops/s 0.5483 Ops/s $\color{#35bf28}+0.67\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6384s 1.5430s 0.6481 Ops/s 0.6508 Ops/s $\color{#d91a1a}-0.41\%$
test_values[generalized_advantage_estimate-True-True] 10.0439ms 9.8480ms 101.5431 Ops/s 101.6638 Ops/s $\color{#d91a1a}-0.12\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.2729ms 17.4666ms 57.2522 Ops/s 56.9824 Ops/s $\color{#35bf28}+0.47\%$
test_values[td0_return_estimate-False-False] 0.2397ms 0.1279ms 7.8206 KOps/s 7.7132 KOps/s $\color{#35bf28}+1.39\%$
test_values[td1_return_estimate-False-False] 27.3983ms 26.8500ms 37.2440 Ops/s 36.9636 Ops/s $\color{#35bf28}+0.76\%$
test_values[vec_td1_return_estimate-False-False] 17.9903ms 17.5297ms 57.0462 Ops/s 56.8559 Ops/s $\color{#35bf28}+0.33\%$
test_values[td_lambda_return_estimate-True-False] 40.3674ms 39.9240ms 25.0476 Ops/s 24.7922 Ops/s $\color{#35bf28}+1.03\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.1039ms 17.4695ms 57.2428 Ops/s 57.0421 Ops/s $\color{#35bf28}+0.35\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2173ms 8.7278ms 114.5768 Ops/s 114.8056 Ops/s $\color{#d91a1a}-0.20\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7605ms 1.5109ms 661.8668 Ops/s 665.5851 Ops/s $\color{#d91a1a}-0.56\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4642ms 0.4159ms 2.4046 KOps/s 2.3929 KOps/s $\color{#35bf28}+0.49\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.5956ms 34.1299ms 29.2998 Ops/s 28.6313 Ops/s $\color{#35bf28}+2.33\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8863ms 1.7158ms 582.8058 Ops/s 576.2579 Ops/s $\color{#35bf28}+1.14\%$
test_dqn_speed[False-None] 1.8370ms 1.3977ms 715.4533 Ops/s 704.4070 Ops/s $\color{#35bf28}+1.57\%$
test_dqn_speed[False-backward] 2.0089ms 1.9188ms 521.1545 Ops/s 507.8672 Ops/s $\color{#35bf28}+2.62\%$
test_dqn_speed[True-None] 0.6487ms 0.5618ms 1.7800 KOps/s 1.7704 KOps/s $\color{#35bf28}+0.54\%$
test_dqn_speed[True-backward] 1.0729ms 1.0275ms 973.2582 Ops/s 921.6081 Ops/s $\textbf{\color{#35bf28}+5.60\%}$
test_dqn_speed[reduce-overhead-None] 0.9655ms 0.5551ms 1.8013 KOps/s 1.7905 KOps/s $\color{#35bf28}+0.60\%$
test_ddpg_speed[False-None] 3.2115ms 2.8141ms 355.3590 Ops/s 352.3564 Ops/s $\color{#35bf28}+0.85\%$
test_ddpg_speed[False-backward] 4.2347ms 4.0351ms 247.8268 Ops/s 246.3400 Ops/s $\color{#35bf28}+0.60\%$
test_ddpg_speed[True-None] 1.8738ms 1.4377ms 695.5722 Ops/s 685.8587 Ops/s $\color{#35bf28}+1.42\%$
test_ddpg_speed[True-backward] 2.8868ms 2.4887ms 401.8239 Ops/s 326.1937 Ops/s $\textbf{\color{#35bf28}+23.19\%}$
test_ddpg_speed[reduce-overhead-None] 1.6062ms 1.4218ms 703.3343 Ops/s 693.0622 Ops/s $\color{#35bf28}+1.48\%$
test_sac_speed[False-None] 8.5586ms 7.9980ms 125.0316 Ops/s 122.4935 Ops/s $\color{#35bf28}+2.07\%$
test_sac_speed[False-backward] 11.9479ms 11.2569ms 88.8346 Ops/s 87.5468 Ops/s $\color{#35bf28}+1.47\%$
test_sac_speed[True-None] 2.6614ms 2.1937ms 455.8489 Ops/s 446.7500 Ops/s $\color{#35bf28}+2.04\%$
test_sac_speed[True-backward] 5.6260ms 4.3012ms 232.4924 Ops/s 233.2586 Ops/s $\color{#d91a1a}-0.33\%$
test_sac_speed[reduce-overhead-None] 2.3224ms 2.1827ms 458.1473 Ops/s 443.9426 Ops/s $\color{#35bf28}+3.20\%$
test_redq_speed[False-None] 15.8832ms 10.6495ms 93.9011 Ops/s 94.6499 Ops/s $\color{#d91a1a}-0.79\%$
test_redq_speed[False-backward] 21.5267ms 17.9926ms 55.5784 Ops/s 54.8175 Ops/s $\color{#35bf28}+1.39\%$
test_redq_speed[True-None] 5.0863ms 4.6100ms 216.9214 Ops/s 210.7062 Ops/s $\color{#35bf28}+2.95\%$
test_redq_speed[reduce-overhead-None] 4.7998ms 4.5010ms 222.1705 Ops/s 218.4965 Ops/s $\color{#35bf28}+1.68\%$
test_redq_deprec_speed[False-None] 11.5706ms 11.1014ms 90.0785 Ops/s 89.8095 Ops/s $\color{#35bf28}+0.30\%$
test_redq_deprec_speed[False-backward] 16.5192ms 16.1009ms 62.1083 Ops/s 61.4617 Ops/s $\color{#35bf28}+1.05\%$
test_redq_deprec_speed[True-None] 4.0803ms 3.6933ms 270.7601 Ops/s 271.3451 Ops/s $\color{#d91a1a}-0.22\%$
test_redq_deprec_speed[True-backward] 7.7122ms 7.5405ms 132.6171 Ops/s 119.1505 Ops/s $\textbf{\color{#35bf28}+11.30\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.9281ms 3.6424ms 274.5426 Ops/s 255.0273 Ops/s $\textbf{\color{#35bf28}+7.65\%}$
test_td3_speed[False-None] 8.2636ms 7.9953ms 125.0727 Ops/s 124.2136 Ops/s $\color{#35bf28}+0.69\%$
test_td3_speed[False-backward] 11.3070ms 10.9467ms 91.3519 Ops/s 91.3566 Ops/s $-0.01\%$
test_td3_speed[True-None] 1.9090ms 1.8542ms 539.3019 Ops/s 523.6536 Ops/s $\color{#35bf28}+2.99\%$
test_td3_speed[True-backward] 3.7794ms 3.6435ms 274.4590 Ops/s 272.8157 Ops/s $\color{#35bf28}+0.60\%$
test_td3_speed[reduce-overhead-None] 1.9004ms 1.7987ms 555.9470 Ops/s 538.7366 Ops/s $\color{#35bf28}+3.19\%$
test_cql_speed[False-None] 30.7631ms 26.5177ms 37.7107 Ops/s 38.1241 Ops/s $\color{#d91a1a}-1.08\%$
test_cql_speed[False-backward] 39.3533ms 35.7406ms 27.9794 Ops/s 27.8805 Ops/s $\color{#35bf28}+0.35\%$
test_cql_speed[True-None] 16.9124ms 12.9357ms 77.3053 Ops/s 71.8756 Ops/s $\textbf{\color{#35bf28}+7.55\%}$
test_cql_speed[True-backward] 18.3857ms 17.9282ms 55.7782 Ops/s 55.7801 Ops/s $-0.00\%$
test_cql_speed[reduce-overhead-None] 12.6993ms 12.4610ms 80.2505 Ops/s 80.8604 Ops/s $\color{#d91a1a}-0.75\%$
test_a2c_speed[False-None] 5.6582ms 5.4164ms 184.6256 Ops/s 188.0736 Ops/s $\color{#d91a1a}-1.83\%$
test_a2c_speed[False-backward] 12.4703ms 11.9969ms 83.3547 Ops/s 85.2261 Ops/s $\color{#d91a1a}-2.20\%$
test_a2c_speed[True-None] 4.0468ms 3.7993ms 263.2082 Ops/s 260.1535 Ops/s $\color{#35bf28}+1.17\%$
test_a2c_speed[True-backward] 8.9709ms 8.7500ms 114.2860 Ops/s 111.4869 Ops/s $\color{#35bf28}+2.51\%$
test_a2c_speed[reduce-overhead-None] 4.0828ms 3.7853ms 264.1784 Ops/s 260.1536 Ops/s $\color{#35bf28}+1.55\%$
test_ppo_speed[False-None] 6.1564ms 5.8777ms 170.1345 Ops/s 171.4637 Ops/s $\color{#d91a1a}-0.78\%$
test_ppo_speed[False-backward] 12.8121ms 12.5546ms 79.6519 Ops/s 79.7849 Ops/s $\color{#d91a1a}-0.17\%$
test_ppo_speed[True-None] 3.8923ms 3.8032ms 262.9332 Ops/s 259.5267 Ops/s $\color{#35bf28}+1.31\%$
test_ppo_speed[True-backward] 8.9338ms 8.7385ms 114.4361 Ops/s 111.0422 Ops/s $\color{#35bf28}+3.06\%$
test_ppo_speed[reduce-overhead-None] 3.9349ms 3.7618ms 265.8337 Ops/s 262.7547 Ops/s $\color{#35bf28}+1.17\%$
test_reinforce_speed[False-None] 4.8792ms 4.5665ms 218.9880 Ops/s 215.7460 Ops/s $\color{#35bf28}+1.50\%$
test_reinforce_speed[False-backward] 7.7034ms 7.5108ms 133.1419 Ops/s 132.1674 Ops/s $\color{#35bf28}+0.74\%$
test_reinforce_speed[True-None] 3.3314ms 3.0141ms 331.7699 Ops/s 325.2242 Ops/s $\color{#35bf28}+2.01\%$
test_reinforce_speed[True-backward] 8.2496ms 7.9874ms 125.1974 Ops/s 114.4792 Ops/s $\textbf{\color{#35bf28}+9.36\%}$
test_reinforce_speed[reduce-overhead-None] 3.2036ms 2.9868ms 334.8043 Ops/s 333.4653 Ops/s $\color{#35bf28}+0.40\%$
test_iql_speed[False-None] 25.9358ms 20.5608ms 48.6362 Ops/s 51.1994 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_iql_speed[False-backward] 36.3305ms 31.0189ms 32.2384 Ops/s 33.2495 Ops/s $\color{#d91a1a}-3.04\%$
test_iql_speed[True-None] 9.0988ms 8.5664ms 116.7348 Ops/s 117.1332 Ops/s $\color{#d91a1a}-0.34\%$
test_iql_speed[True-backward] 17.4201ms 16.9894ms 58.8602 Ops/s 56.4463 Ops/s $\color{#35bf28}+4.28\%$
test_iql_speed[reduce-overhead-None] 8.7463ms 8.5412ms 117.0789 Ops/s 116.3096 Ops/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0865ms 5.9139ms 169.0939 Ops/s 169.1276 Ops/s $\color{#d91a1a}-0.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0193ms 0.2861ms 3.4959 KOps/s 3.1942 KOps/s $\textbf{\color{#35bf28}+9.44\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5226ms 0.2653ms 3.7698 KOps/s 2.9967 KOps/s $\textbf{\color{#35bf28}+25.80\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8980ms 5.6822ms 175.9882 Ops/s 176.2903 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0760ms 0.2789ms 3.5861 KOps/s 3.0822 KOps/s $\textbf{\color{#35bf28}+16.35\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5050ms 0.2613ms 3.8264 KOps/s 3.3127 KOps/s $\textbf{\color{#35bf28}+15.51\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5141ms 1.2628ms 791.8606 Ops/s 768.5540 Ops/s $\color{#35bf28}+3.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4561ms 1.1722ms 853.0829 Ops/s 830.1689 Ops/s $\color{#35bf28}+2.76\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.7197ms 5.8648ms 170.5082 Ops/s 171.7066 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8662ms 0.4302ms 2.3243 KOps/s 2.1862 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6599ms 0.4148ms 2.4106 KOps/s 2.2328 KOps/s $\textbf{\color{#35bf28}+7.96\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7582ms 5.6543ms 176.8574 Ops/s 176.9587 Ops/s $\color{#d91a1a}-0.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0687ms 0.2912ms 3.4343 KOps/s 3.4724 KOps/s $\color{#d91a1a}-1.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4159ms 0.2661ms 3.7575 KOps/s 3.7190 KOps/s $\color{#35bf28}+1.04\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8386ms 5.5932ms 178.7880 Ops/s 176.7150 Ops/s $\color{#35bf28}+1.17\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0711ms 0.3538ms 2.8262 KOps/s 3.4529 KOps/s $\textbf{\color{#d91a1a}-18.15\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6611ms 0.3462ms 2.8882 KOps/s 3.3750 KOps/s $\textbf{\color{#d91a1a}-14.42\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9461ms 5.8176ms 171.8915 Ops/s 172.7918 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3098ms 0.5089ms 1.9651 KOps/s 2.0332 KOps/s $\color{#d91a1a}-3.35\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6748ms 0.4894ms 2.0432 KOps/s 2.2753 KOps/s $\textbf{\color{#d91a1a}-10.20\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3720ms 4.9073ms 203.7783 Ops/s 49.4116 Ops/s $\textbf{\color{#35bf28}+312.41\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8860ms 1.9721ms 507.0820 Ops/s 514.6576 Ops/s $\color{#d91a1a}-1.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.3089ms 0.9340ms 1.0707 KOps/s 838.6191 Ops/s $\textbf{\color{#35bf28}+27.68\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6426s 17.7779ms 56.2497 Ops/s 199.3072 Ops/s $\textbf{\color{#d91a1a}-71.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.4768ms 1.9510ms 512.5580 Ops/s 530.6386 Ops/s $\color{#d91a1a}-3.41\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.0836ms 1.1833ms 845.1113 Ops/s 1.1555 KOps/s $\textbf{\color{#d91a1a}-26.86\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.6372ms 5.1182ms 195.3817 Ops/s 192.6138 Ops/s $\color{#35bf28}+1.44\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 5.9988ms 1.9144ms 522.3553 Ops/s 521.3902 Ops/s $\color{#35bf28}+0.19\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.1624ms 0.9955ms 1.0046 KOps/s 970.3880 Ops/s $\color{#35bf28}+3.52\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 44.5508ms 39.4719ms 25.3345 Ops/s 25.5820 Ops/s $\color{#d91a1a}-0.97\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4486ms 18.0861ms 55.2911 Ops/s 55.4523 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.3821ms 39.8491ms 25.0946 Ops/s 24.5916 Ops/s $\color{#35bf28}+2.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8203ms 18.3826ms 54.3992 Ops/s 54.3679 Ops/s $\color{#35bf28}+0.06\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.8075ms 41.1962ms 24.2741 Ops/s 23.7567 Ops/s $\color{#35bf28}+2.18\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.9985ms 19.7020ms 50.7563 Ops/s 50.2571 Ops/s $\color{#35bf28}+0.99\%$
test_storage_write_lazystack[50-img_shape0-small] 0.5587s 0.4431ms 2.2568 KOps/s 4.5192 KOps/s $\textbf{\color{#d91a1a}-50.06\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.5504ms 1.4152ms 706.6168 Ops/s 722.8397 Ops/s $\color{#d91a1a}-2.24\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7639ms 2.3116ms 432.5923 Ops/s 435.1482 Ops/s $\color{#d91a1a}-0.59\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0883ms 2.9447ms 339.5956 Ops/s 341.9980 Ops/s $\color{#d91a1a}-0.70\%$
test_storage_write_contiguous[50-img_shape0-small] 0.6489ms 0.1378ms 7.2576 KOps/s 7.5198 KOps/s $\color{#d91a1a}-3.49\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3426ms 0.1928ms 5.1878 KOps/s 5.5567 KOps/s $\textbf{\color{#d91a1a}-6.64\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9153ms 1.7534ms 570.3080 Ops/s 578.5600 Ops/s $\color{#d91a1a}-1.43\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5697ms 1.2776ms 782.7229 Ops/s 780.4269 Ops/s $\color{#35bf28}+0.29\%$
test_collector_stack_then_write[50-img_shape0-small] 1.1834ms 1.0970ms 911.5966 Ops/s 908.3801 Ops/s $\color{#35bf28}+0.35\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8839ms 3.5287ms 283.3919 Ops/s 283.8920 Ops/s $\color{#d91a1a}-0.18\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.8809ms 5.5970ms 178.6665 Ops/s 180.2586 Ops/s $\color{#d91a1a}-0.88\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.3701ms 6.9661ms 143.5532 Ops/s 147.1585 Ops/s $\color{#d91a1a}-2.45\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4211ms 0.2742ms 3.6470 KOps/s 3.6027 KOps/s $\color{#35bf28}+1.23\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6801ms 1.5309ms 653.2054 Ops/s 657.4796 Ops/s $\color{#d91a1a}-0.65\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6502ms 2.4276ms 411.9306 Ops/s 413.5784 Ops/s $\color{#d91a1a}-0.40\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4919ms 3.1495ms 317.5089 Ops/s 319.7966 Ops/s $\color{#d91a1a}-0.72\%$
test_collector_without_rb[100-img_shape0-atari] 33.0833ms 32.4072ms 30.8573 Ops/s 30.6161 Ops/s $\color{#35bf28}+0.79\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.0477ms 63.6930ms 15.7003 Ops/s 15.5407 Ops/s $\color{#35bf28}+1.03\%$
test_collector_with_rb[100-img_shape0-atari] 38.0901ms 36.9799ms 27.0417 Ops/s 26.9023 Ops/s $\color{#35bf28}+0.52\%$
test_collector_with_rb[200-img_shape1-large_batch] 73.9796ms 72.8712ms 13.7228 Ops/s 13.7853 Ops/s $\color{#d91a1a}-0.45\%$

@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.7936μs 81.8305μs 12.2204 KOps/s 12.2140 KOps/s $\color{#35bf28}+0.05\%$
test_tensor_to_bytestream_speed[torch.save] 0.1428ms 0.1424ms 7.0225 KOps/s 6.9636 KOps/s $\color{#35bf28}+0.85\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1135s 0.1129s 8.8591 Ops/s 8.7514 Ops/s $\color{#35bf28}+1.23\%$
test_tensor_to_bytestream_speed[numpy] 2.6035μs 2.5885μs 386.3289 KOps/s 375.0734 KOps/s $\color{#35bf28}+3.00\%$
test_tensor_to_bytestream_speed[safetensors] 37.3300μs 37.0864μs 26.9641 KOps/s 27.1064 KOps/s $\color{#d91a1a}-0.52\%$
test_simple 0.8126s 0.8033s 1.2448 Ops/s 1.2174 Ops/s $\color{#35bf28}+2.25\%$
test_transformed 1.3907s 1.3890s 0.7200 Ops/s 0.7055 Ops/s $\color{#35bf28}+2.05\%$
test_serial 2.3681s 2.3441s 0.4266 Ops/s 0.4287 Ops/s $\color{#d91a1a}-0.50\%$
test_parallel 1.9307s 1.8253s 0.5479 Ops/s 0.5534 Ops/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-True-True-True-True] 0.1796ms 42.0519μs 23.7802 KOps/s 22.8701 KOps/s $\color{#35bf28}+3.98\%$
test_step_mdp_speed[True-True-True-True-False] 48.7510μs 22.7106μs 44.0322 KOps/s 42.1423 KOps/s $\color{#35bf28}+4.48\%$
test_step_mdp_speed[True-True-True-False-True] 59.0410μs 23.9795μs 41.7023 KOps/s 40.3025 KOps/s $\color{#35bf28}+3.47\%$
test_step_mdp_speed[True-True-True-False-False] 38.3410μs 12.9118μs 77.4483 KOps/s 75.6388 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[True-True-False-True-True] 87.5310μs 45.0554μs 22.1949 KOps/s 21.5654 KOps/s $\color{#35bf28}+2.92\%$
test_step_mdp_speed[True-True-False-True-False] 49.1610μs 25.2029μs 39.6779 KOps/s 38.3540 KOps/s $\color{#35bf28}+3.45\%$
test_step_mdp_speed[True-True-False-False-True] 58.7510μs 25.8763μs 38.6455 KOps/s 37.1602 KOps/s $\color{#35bf28}+4.00\%$
test_step_mdp_speed[True-True-False-False-False] 53.0810μs 15.5397μs 64.3514 KOps/s 63.1168 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-False-True-True-True] 87.1720μs 47.5028μs 21.0514 KOps/s 20.5609 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[True-False-True-True-False] 69.3710μs 27.8153μs 35.9515 KOps/s 34.6676 KOps/s $\color{#35bf28}+3.70\%$
test_step_mdp_speed[True-False-True-False-True] 64.6410μs 26.1012μs 38.3123 KOps/s 36.6156 KOps/s $\color{#35bf28}+4.63\%$
test_step_mdp_speed[True-False-True-False-False] 41.9500μs 15.2871μs 65.4144 KOps/s 63.4149 KOps/s $\color{#35bf28}+3.15\%$
test_step_mdp_speed[True-False-False-True-True] 83.1210μs 49.4108μs 20.2385 KOps/s 19.4552 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[True-False-False-True-False] 72.0410μs 30.2486μs 33.0594 KOps/s 32.2003 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[True-False-False-False-True] 64.4410μs 28.9049μs 34.5962 KOps/s 33.5772 KOps/s $\color{#35bf28}+3.03\%$
test_step_mdp_speed[True-False-False-False-False] 44.1400μs 17.9340μs 55.7599 KOps/s 54.0143 KOps/s $\color{#35bf28}+3.23\%$
test_step_mdp_speed[False-True-True-True-True] 97.6120μs 47.1860μs 21.1927 KOps/s 20.4653 KOps/s $\color{#35bf28}+3.55\%$
test_step_mdp_speed[False-True-True-True-False] 60.8510μs 28.3902μs 35.2234 KOps/s 34.9781 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[False-True-True-False-True] 2.4255ms 30.5698μs 32.7120 KOps/s 32.8068 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-True-True-False-False] 44.4200μs 17.2233μs 58.0609 KOps/s 57.7377 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[False-True-False-True-True] 0.1230ms 49.5488μs 20.1821 KOps/s 19.4348 KOps/s $\color{#35bf28}+3.85\%$
test_step_mdp_speed[False-True-False-True-False] 59.9110μs 30.5216μs 32.7636 KOps/s 32.2129 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-True-False-False-True] 0.1051ms 32.1560μs 31.0984 KOps/s 30.1675 KOps/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[False-True-False-False-False] 54.2410μs 19.3127μs 51.7794 KOps/s 49.8870 KOps/s $\color{#35bf28}+3.79\%$
test_step_mdp_speed[False-False-True-True-True] 0.1780ms 51.6309μs 19.3683 KOps/s 18.8388 KOps/s $\color{#35bf28}+2.81\%$
test_step_mdp_speed[False-False-True-True-False] 68.8010μs 33.7494μs 29.6301 KOps/s 29.4876 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[False-False-True-False-True] 69.6210μs 33.2389μs 30.0853 KOps/s 30.4765 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-False-True-False-False] 57.3510μs 19.7564μs 50.6164 KOps/s 50.2512 KOps/s $\color{#35bf28}+0.73\%$
test_step_mdp_speed[False-False-False-True-True] 0.1241ms 54.2017μs 18.4496 KOps/s 17.9600 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[False-False-False-True-False] 69.6120μs 35.4691μs 28.1935 KOps/s 27.4062 KOps/s $\color{#35bf28}+2.87\%$
test_step_mdp_speed[False-False-False-False-True] 63.7110μs 34.1459μs 29.2861 KOps/s 28.1524 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[False-False-False-False-False] 57.3010μs 22.0253μs 45.4023 KOps/s 43.8782 KOps/s $\color{#35bf28}+3.47\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7274s 0.7240s 1.3812 Ops/s 1.3221 Ops/s $\color{#35bf28}+4.47\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7088s 0.6096s 1.6404 Ops/s 1.6158 Ops/s $\color{#35bf28}+1.53\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7397s 1.6592s 0.6027 Ops/s 0.6002 Ops/s $\color{#35bf28}+0.42\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5191s 1.4353s 0.6967 Ops/s 0.6921 Ops/s $\color{#35bf28}+0.66\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9987s 1.9101s 0.5235 Ops/s 0.5200 Ops/s $\color{#35bf28}+0.68\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7752s 1.6928s 0.5907 Ops/s 0.5880 Ops/s $\color{#35bf28}+0.47\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6697s 4.5888s 0.2179 Ops/s 0.2154 Ops/s $\color{#35bf28}+1.16\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5106s 4.4044s 0.2270 Ops/s 0.2245 Ops/s $\color{#35bf28}+1.13\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9715s 1.8746s 0.5334 Ops/s 0.5340 Ops/s $\color{#d91a1a}-0.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6995s 1.5892s 0.6293 Ops/s 0.6342 Ops/s $\color{#d91a1a}-0.77\%$
test_values[generalized_advantage_estimate-True-True] 21.5524ms 20.9742ms 47.6777 Ops/s 48.8874 Ops/s $\color{#d91a1a}-2.47\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1393s 3.7072ms 269.7486 Ops/s 252.2978 Ops/s $\textbf{\color{#35bf28}+6.92\%}$
test_values[td0_return_estimate-False-False] 0.1077ms 84.8863μs 11.7805 KOps/s 11.8245 KOps/s $\color{#d91a1a}-0.37\%$
test_values[td1_return_estimate-False-False] 51.4648ms 50.2191ms 19.9128 Ops/s 20.4418 Ops/s $\color{#d91a1a}-2.59\%$
test_values[vec_td1_return_estimate-False-False] 1.3857ms 1.0929ms 914.9923 Ops/s 916.2466 Ops/s $\color{#d91a1a}-0.14\%$
test_values[td_lambda_return_estimate-True-False] 85.3064ms 81.8532ms 12.2170 Ops/s 12.4667 Ops/s $\color{#d91a1a}-2.00\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2910ms 1.0889ms 918.3349 Ops/s 913.0630 Ops/s $\color{#35bf28}+0.58\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.1597ms 21.6885ms 46.1075 Ops/s 48.3425 Ops/s $\color{#d91a1a}-4.62\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0225ms 0.7847ms 1.2744 KOps/s 1.3111 KOps/s $\color{#d91a1a}-2.80\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7498ms 0.6937ms 1.4416 KOps/s 1.4698 KOps/s $\color{#d91a1a}-1.92\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5488ms 1.4876ms 672.2283 Ops/s 668.7622 Ops/s $\color{#35bf28}+0.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7395ms 0.6940ms 1.4410 KOps/s 1.4269 KOps/s $\color{#35bf28}+0.99\%$
test_dqn_speed[False-None] 1.7216ms 1.5811ms 632.4544 Ops/s 627.0818 Ops/s $\color{#35bf28}+0.86\%$
test_dqn_speed[False-backward] 2.3523ms 2.2110ms 452.2921 Ops/s 445.9063 Ops/s $\color{#35bf28}+1.43\%$
test_dqn_speed[True-None] 0.7308ms 0.6312ms 1.5844 KOps/s 1.6027 KOps/s $\color{#d91a1a}-1.14\%$
test_dqn_speed[True-backward] 1.3249ms 1.2893ms 775.6089 Ops/s 755.7293 Ops/s $\color{#35bf28}+2.63\%$
test_dqn_speed[reduce-overhead-None] 0.6849ms 0.6160ms 1.6233 KOps/s 1.5749 KOps/s $\color{#35bf28}+3.07\%$
test_ddpg_speed[False-None] 3.4167ms 3.0063ms 332.6335 Ops/s 335.4124 Ops/s $\color{#d91a1a}-0.83\%$
test_ddpg_speed[False-backward] 4.7115ms 4.3662ms 229.0317 Ops/s 229.4013 Ops/s $\color{#d91a1a}-0.16\%$
test_ddpg_speed[True-None] 1.5530ms 1.4271ms 700.7267 Ops/s 700.2279 Ops/s $\color{#35bf28}+0.07\%$
test_ddpg_speed[True-backward] 2.7102ms 2.6278ms 380.5509 Ops/s 371.0124 Ops/s $\color{#35bf28}+2.57\%$
test_ddpg_speed[reduce-overhead-None] 1.5064ms 1.4105ms 708.9543 Ops/s 694.4026 Ops/s $\color{#35bf28}+2.10\%$
test_sac_speed[False-None] 8.8603ms 8.4322ms 118.5930 Ops/s 115.0923 Ops/s $\color{#35bf28}+3.04\%$
test_sac_speed[False-backward] 12.0013ms 11.5655ms 86.4641 Ops/s 85.0984 Ops/s $\color{#35bf28}+1.60\%$
test_sac_speed[True-None] 2.2168ms 1.9900ms 502.5189 Ops/s 502.4154 Ops/s $\color{#35bf28}+0.02\%$
test_sac_speed[True-backward] 4.1325ms 3.8311ms 261.0228 Ops/s 270.5279 Ops/s $\color{#d91a1a}-3.51\%$
test_sac_speed[reduce-overhead-None] 17.6092ms 10.2037ms 98.0038 Ops/s 99.2129 Ops/s $\color{#d91a1a}-1.22\%$
test_redq_deprec_speed[False-None] 10.3849ms 9.4869ms 105.4087 Ops/s 105.3848 Ops/s $\color{#35bf28}+0.02\%$
test_redq_deprec_speed[False-backward] 13.3587ms 12.7477ms 78.4454 Ops/s 79.5332 Ops/s $\color{#d91a1a}-1.37\%$
test_redq_deprec_speed[True-None] 2.8392ms 2.7593ms 362.4155 Ops/s 354.9069 Ops/s $\color{#35bf28}+2.12\%$
test_redq_deprec_speed[True-backward] 4.9403ms 4.5216ms 221.1623 Ops/s 219.0141 Ops/s $\color{#35bf28}+0.98\%$
test_redq_deprec_speed[reduce-overhead-None] 14.6794ms 9.6294ms 103.8491 Ops/s 103.5529 Ops/s $\color{#35bf28}+0.29\%$
test_td3_speed[False-None] 8.4318ms 8.2924ms 120.5923 Ops/s 120.1253 Ops/s $\color{#35bf28}+0.39\%$
test_td3_speed[False-backward] 11.3225ms 10.8821ms 91.8942 Ops/s 91.7657 Ops/s $\color{#35bf28}+0.14\%$
test_td3_speed[True-None] 1.8073ms 1.7585ms 568.6651 Ops/s 563.8865 Ops/s $\color{#35bf28}+0.85\%$
test_td3_speed[True-backward] 3.4575ms 3.3369ms 299.6764 Ops/s 299.3717 Ops/s $\color{#35bf28}+0.10\%$
test_td3_speed[reduce-overhead-None] 50.6727ms 25.9171ms 38.5846 Ops/s 38.5811 Ops/s $+0.01\%$
test_cql_speed[False-None] 18.1110ms 17.6186ms 56.7583 Ops/s 55.7095 Ops/s $\color{#35bf28}+1.88\%$
test_cql_speed[False-backward] 23.7099ms 23.0599ms 43.3654 Ops/s 43.4817 Ops/s $\color{#d91a1a}-0.27\%$
test_cql_speed[True-None] 3.7139ms 3.5352ms 282.8726 Ops/s 282.7709 Ops/s $\color{#35bf28}+0.04\%$
test_cql_speed[True-backward] 6.7260ms 5.8815ms 170.0234 Ops/s 168.6167 Ops/s $\color{#35bf28}+0.83\%$
test_cql_speed[reduce-overhead-None] 18.1670ms 12.1363ms 82.3974 Ops/s 83.7679 Ops/s $\color{#d91a1a}-1.64\%$
test_a2c_speed[False-None] 3.8087ms 3.3400ms 299.4056 Ops/s 293.1510 Ops/s $\color{#35bf28}+2.13\%$
test_a2c_speed[False-backward] 6.7647ms 6.3399ms 157.7314 Ops/s 160.1211 Ops/s $\color{#d91a1a}-1.49\%$
test_a2c_speed[True-None] 2.0078ms 1.5138ms 660.5701 Ops/s 668.8793 Ops/s $\color{#d91a1a}-1.24\%$
test_a2c_speed[True-backward] 3.4306ms 3.3855ms 295.3813 Ops/s 308.6139 Ops/s $\color{#d91a1a}-4.29\%$
test_a2c_speed[reduce-overhead-None] 1.2933ms 1.1141ms 897.5668 Ops/s 875.0856 Ops/s $\color{#35bf28}+2.57\%$
test_ppo_speed[False-None] 4.0912ms 3.9465ms 253.3912 Ops/s 241.5229 Ops/s $\color{#35bf28}+4.91\%$
test_ppo_speed[False-backward] 7.6616ms 7.2240ms 138.4281 Ops/s 138.3077 Ops/s $\color{#35bf28}+0.09\%$
test_ppo_speed[True-None] 1.7590ms 1.6310ms 613.1282 Ops/s 604.9864 Ops/s $\color{#35bf28}+1.35\%$
test_ppo_speed[True-backward] 3.5744ms 3.3688ms 296.8373 Ops/s 289.6463 Ops/s $\color{#35bf28}+2.48\%$
test_ppo_speed[reduce-overhead-None] 1.2896ms 1.1713ms 853.7467 Ops/s 833.8859 Ops/s $\color{#35bf28}+2.38\%$
test_reinforce_speed[False-None] 2.5689ms 2.4087ms 415.1672 Ops/s 416.0512 Ops/s $\color{#d91a1a}-0.21\%$
test_reinforce_speed[False-backward] 3.6077ms 3.5328ms 283.0615 Ops/s 282.5706 Ops/s $\color{#35bf28}+0.17\%$
test_reinforce_speed[True-None] 1.5918ms 1.4879ms 672.1043 Ops/s 677.9767 Ops/s $\color{#d91a1a}-0.87\%$
test_reinforce_speed[True-backward] 3.4234ms 3.3344ms 299.9057 Ops/s 291.7759 Ops/s $\color{#35bf28}+2.79\%$
test_reinforce_speed[reduce-overhead-None] 0.6619s 10.4370ms 95.8134 Ops/s 113.3391 Ops/s $\textbf{\color{#d91a1a}-15.46\%}$
test_iql_speed[False-None] 9.9819ms 9.6698ms 103.4150 Ops/s 102.9754 Ops/s $\color{#35bf28}+0.43\%$
test_iql_speed[False-backward] 13.8253ms 13.4017ms 74.6174 Ops/s 74.8822 Ops/s $\color{#d91a1a}-0.35\%$
test_iql_speed[True-None] 2.5728ms 2.3719ms 421.6061 Ops/s 414.4795 Ops/s $\color{#35bf28}+1.72\%$
test_iql_speed[True-backward] 5.2103ms 5.1056ms 195.8640 Ops/s 194.8623 Ops/s $\color{#35bf28}+0.51\%$
test_iql_speed[reduce-overhead-None] 16.5940ms 10.0578ms 99.4249 Ops/s 98.1299 Ops/s $\color{#35bf28}+1.32\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4222ms 5.9973ms 166.7404 Ops/s 164.3130 Ops/s $\color{#35bf28}+1.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7401ms 0.4396ms 2.2750 KOps/s 2.7745 KOps/s $\textbf{\color{#d91a1a}-18.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6023ms 0.3549ms 2.8173 KOps/s 2.9116 KOps/s $\color{#d91a1a}-3.24\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1796ms 5.8439ms 171.1200 Ops/s 171.6744 Ops/s $\color{#d91a1a}-0.32\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9524ms 0.3865ms 2.5871 KOps/s 3.4803 KOps/s $\textbf{\color{#d91a1a}-25.67\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6552ms 0.3696ms 2.7053 KOps/s 3.7064 KOps/s $\textbf{\color{#d91a1a}-27.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6859ms 1.4615ms 684.2130 Ops/s 780.3965 Ops/s $\textbf{\color{#d91a1a}-12.32\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6572ms 1.3719ms 728.9172 Ops/s 761.9611 Ops/s $\color{#d91a1a}-4.34\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4285ms 6.1062ms 163.7683 Ops/s 166.9528 Ops/s $\color{#d91a1a}-1.91\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5403ms 0.4930ms 2.0285 KOps/s 2.0306 KOps/s $\color{#d91a1a}-0.10\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9374ms 0.4724ms 2.1170 KOps/s 2.0601 KOps/s $\color{#35bf28}+2.76\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9265ms 5.7941ms 172.5904 Ops/s 170.2235 Ops/s $\color{#35bf28}+1.39\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9246ms 0.3050ms 3.2782 KOps/s 2.6838 KOps/s $\textbf{\color{#35bf28}+22.15\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4773ms 0.2699ms 3.7057 KOps/s 2.8763 KOps/s $\textbf{\color{#35bf28}+28.84\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1224ms 5.7824ms 172.9380 Ops/s 173.5887 Ops/s $\color{#d91a1a}-0.37\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9352ms 0.3431ms 2.9146 KOps/s 3.1360 KOps/s $\textbf{\color{#d91a1a}-7.06\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4797ms 0.2755ms 3.6292 KOps/s 3.7168 KOps/s $\color{#d91a1a}-2.36\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2699ms 5.9031ms 169.4022 Ops/s 167.4434 Ops/s $\color{#35bf28}+1.17\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.6978ms 0.4378ms 2.2841 KOps/s 2.2101 KOps/s $\color{#35bf28}+3.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6279ms 0.4173ms 2.3964 KOps/s 1.9386 KOps/s $\textbf{\color{#35bf28}+23.61\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9582s 24.1307ms 41.4410 Ops/s 34.5132 Ops/s $\textbf{\color{#35bf28}+20.07\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 12.2142ms 2.1503ms 465.0431 Ops/s 458.6593 Ops/s $\color{#35bf28}+1.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.0870ms 1.1894ms 840.7849 Ops/s 757.7829 Ops/s $\textbf{\color{#35bf28}+10.95\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.8121ms 5.0499ms 198.0218 Ops/s 193.3336 Ops/s $\color{#35bf28}+2.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9403ms 1.8494ms 540.7093 Ops/s 472.5843 Ops/s $\textbf{\color{#35bf28}+14.42\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.9163ms 1.3684ms 730.7849 Ops/s 820.2687 Ops/s $\textbf{\color{#d91a1a}-10.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.6667s 18.5481ms 53.9139 Ops/s 184.9026 Ops/s $\textbf{\color{#d91a1a}-70.84\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1541ms 2.0005ms 499.8767 Ops/s 491.3098 Ops/s $\color{#35bf28}+1.74\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.9358ms 1.1402ms 877.0431 Ops/s 896.1474 Ops/s $\color{#d91a1a}-2.13\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.7514ms 39.4799ms 25.3293 Ops/s 24.8733 Ops/s $\color{#35bf28}+1.83\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9014ms 18.4867ms 54.0929 Ops/s 54.8829 Ops/s $\color{#d91a1a}-1.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.9993ms 40.8865ms 24.4580 Ops/s 23.8399 Ops/s $\color{#35bf28}+2.59\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.6858ms 19.0014ms 52.6276 Ops/s 52.6135 Ops/s $\color{#35bf28}+0.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 43.9850ms 42.7200ms 23.4083 Ops/s 22.8639 Ops/s $\color{#35bf28}+2.38\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.3638ms 20.6906ms 48.3311 Ops/s 48.9675 Ops/s $\color{#d91a1a}-1.30\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8971ms 0.2314ms 4.3224 KOps/s 4.4622 KOps/s $\color{#d91a1a}-3.13\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7059ms 1.4578ms 685.9867 Ops/s 699.2575 Ops/s $\color{#d91a1a}-1.90\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7791ms 2.3794ms 420.2680 Ops/s 422.5745 Ops/s $\color{#d91a1a}-0.55\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1411ms 2.9636ms 337.4221 Ops/s 334.2297 Ops/s $\color{#35bf28}+0.96\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2922ms 0.1695ms 5.9005 KOps/s 6.1019 KOps/s $\color{#d91a1a}-3.30\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3912ms 0.2250ms 4.4441 KOps/s 4.4714 KOps/s $\color{#d91a1a}-0.61\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0050ms 1.8448ms 542.0598 Ops/s 531.3856 Ops/s $\color{#35bf28}+2.01\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5740ms 1.4040ms 712.2417 Ops/s 670.4803 Ops/s $\textbf{\color{#35bf28}+6.23\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.4935ms 1.1633ms 859.6033 Ops/s 858.6454 Ops/s $\color{#35bf28}+0.11\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6699ms 3.7147ms 269.2040 Ops/s 268.8646 Ops/s $\color{#35bf28}+0.13\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.4443ms 6.0856ms 164.3215 Ops/s 167.7639 Ops/s $\color{#d91a1a}-2.05\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4050ms 7.1026ms 140.7937 Ops/s 138.2992 Ops/s $\color{#35bf28}+1.80\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4355ms 0.2758ms 3.6257 KOps/s 3.5810 KOps/s $\color{#35bf28}+1.25\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7042ms 1.5497ms 645.2667 Ops/s 646.4502 Ops/s $\color{#d91a1a}-0.18\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9601ms 2.4800ms 403.2191 Ops/s 398.0611 Ops/s $\color{#35bf28}+1.30\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3659ms 3.1992ms 312.5791 Ops/s 310.5382 Ops/s $\color{#35bf28}+0.66\%$
test_collector_without_rb[100-img_shape0-atari] 34.2482ms 33.4358ms 29.9081 Ops/s 29.6611 Ops/s $\color{#35bf28}+0.83\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.7173ms 66.0890ms 15.1311 Ops/s 15.1760 Ops/s $\color{#d91a1a}-0.30\%$
test_collector_with_rb[100-img_shape0-atari] 38.4097ms 37.7830ms 26.4669 Ops/s 26.3339 Ops/s $\color{#35bf28}+0.51\%$
test_collector_with_rb[200-img_shape1-large_batch] 97.5165ms 78.5453ms 12.7315 Ops/s 13.1493 Ops/s $\color{#d91a1a}-3.18\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 55.5917ms 55.3648ms 18.0620 Ops/s 17.6661 Ops/s $\color{#35bf28}+2.24\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1106s 0.1102s 9.0724 Ops/s 8.7956 Ops/s $\color{#35bf28}+3.15\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 57.4745ms 57.3102ms 17.4489 Ops/s 17.1558 Ops/s $\color{#35bf28}+1.71\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1165s 0.1145s 8.7367 Ops/s 8.5799 Ops/s $\color{#35bf28}+1.83\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Feature New feature Integrations/torch_geometric Integrations WeightUpdate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant