Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Hash transform #2648

Open
wants to merge 1 commit into
base: gh/kurtamohler/1/base
Choose a base branch
from

Conversation

kurtamohler
Copy link
Collaborator

@kurtamohler kurtamohler commented Dec 13, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Dec 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2648

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 9 New Failures, 6 Unrelated Failures

As of commit 7a66edf with merge base e3c3047 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kurtamohler added a commit that referenced this pull request Dec 13, 2024
ghstack-source-id: 80f920674e13db2fcbed6e82a990d35cb14c6d11
Pull Request resolved: #2648
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 13, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4284s 0.4260s 2.3474 Ops/s 2.2748 Ops/s $\color{#35bf28}+3.19\%$
test_transformed 0.6026s 0.6017s 1.6618 Ops/s 1.6154 Ops/s $\color{#35bf28}+2.87\%$
test_serial 1.3218s 1.3099s 0.7634 Ops/s 0.7424 Ops/s $\color{#35bf28}+2.83\%$
test_parallel 1.3658s 1.2941s 0.7727 Ops/s 0.7580 Ops/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-True-True-True-True] 0.1170ms 29.8328μs 33.5201 KOps/s 32.1446 KOps/s $\color{#35bf28}+4.28\%$
test_step_mdp_speed[True-True-True-True-False] 52.3980μs 17.7660μs 56.2873 KOps/s 53.0341 KOps/s $\textbf{\color{#35bf28}+6.13\%}$
test_step_mdp_speed[True-True-True-False-True] 70.0710μs 16.8702μs 59.2761 KOps/s 56.2922 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_step_mdp_speed[True-True-True-False-False] 36.6980μs 9.8854μs 101.1594 KOps/s 93.0029 KOps/s $\textbf{\color{#35bf28}+8.77\%}$
test_step_mdp_speed[True-True-False-True-True] 67.4360μs 31.1735μs 32.0785 KOps/s 30.0398 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_step_mdp_speed[True-True-False-True-False] 50.1230μs 19.2451μs 51.9612 KOps/s 48.1339 KOps/s $\textbf{\color{#35bf28}+7.95\%}$
test_step_mdp_speed[True-True-False-False-True] 55.4130μs 18.6609μs 53.5880 KOps/s 51.3294 KOps/s $\color{#35bf28}+4.40\%$
test_step_mdp_speed[True-True-False-False-False] 38.0220μs 11.5978μs 86.2229 KOps/s 80.2694 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_step_mdp_speed[True-False-True-True-True] 0.1152ms 33.1301μs 30.1840 KOps/s 28.5337 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_step_mdp_speed[True-False-True-True-False] 59.4010μs 21.0010μs 47.6168 KOps/s 44.4027 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_step_mdp_speed[True-False-True-False-True] 46.8470μs 18.3587μs 54.4700 KOps/s 51.2810 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_step_mdp_speed[True-False-True-False-False] 51.3140μs 11.5836μs 86.3286 KOps/s 80.2340 KOps/s $\textbf{\color{#35bf28}+7.60\%}$
test_step_mdp_speed[True-False-False-True-True] 78.2470μs 34.3757μs 29.0903 KOps/s 27.2335 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_step_mdp_speed[True-False-False-True-False] 69.6500μs 22.4947μs 44.4548 KOps/s 41.9092 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_step_mdp_speed[True-False-False-False-True] 74.6590μs 20.3071μs 49.2439 KOps/s 47.9243 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[True-False-False-False-False] 38.9830μs 13.3378μs 74.9750 KOps/s 71.2036 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_step_mdp_speed[False-True-True-True-True] 85.8000μs 32.5268μs 30.7439 KOps/s 28.5369 KOps/s $\textbf{\color{#35bf28}+7.73\%}$
test_step_mdp_speed[False-True-True-True-False] 43.6010μs 20.8721μs 47.9109 KOps/s 44.9539 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_step_mdp_speed[False-True-True-False-True] 67.9670μs 20.8615μs 47.9351 KOps/s 45.3815 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_step_mdp_speed[False-True-True-False-False] 41.1570μs 12.9393μs 77.2840 KOps/s 73.4784 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_step_mdp_speed[False-True-False-True-True] 84.1580μs 35.1839μs 28.4221 KOps/s 27.9390 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-True-False-True-False] 55.1330μs 22.9334μs 43.6046 KOps/s 42.2320 KOps/s $\color{#35bf28}+3.25\%$
test_step_mdp_speed[False-True-False-False-True] 2.6338ms 23.0063μs 43.4663 KOps/s 43.0292 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[False-True-False-False-False] 51.3760μs 14.8156μs 67.4963 KOps/s 65.6547 KOps/s $\color{#35bf28}+2.80\%$
test_step_mdp_speed[False-False-True-True-True] 80.4810μs 37.2675μs 26.8330 KOps/s 26.4946 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[False-False-True-True-False] 71.3730μs 24.8002μs 40.3223 KOps/s 38.8602 KOps/s $\color{#35bf28}+3.76\%$
test_step_mdp_speed[False-False-True-False-True] 51.4760μs 22.8324μs 43.7973 KOps/s 42.7831 KOps/s $\color{#35bf28}+2.37\%$
test_step_mdp_speed[False-False-True-False-False] 64.4810μs 14.7666μs 67.7204 KOps/s 66.1926 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[False-False-False-True-True] 94.5960μs 38.5340μs 25.9511 KOps/s 25.4379 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[False-False-False-True-False] 77.1330μs 25.9212μs 38.5784 KOps/s 36.5303 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_step_mdp_speed[False-False-False-False-True] 63.3080μs 24.2901μs 41.1690 KOps/s 40.0275 KOps/s $\color{#35bf28}+2.85\%$
test_step_mdp_speed[False-False-False-False-False] 68.9990μs 16.1604μs 61.8797 KOps/s 58.5344 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_values[generalized_advantage_estimate-True-True] 11.6459ms 9.4203ms 106.1537 Ops/s 105.0891 Ops/s $\color{#35bf28}+1.01\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.3293ms 35.2983ms 28.3300 Ops/s 28.3896 Ops/s $\color{#d91a1a}-0.21\%$
test_values[td0_return_estimate-False-False] 0.2386ms 0.1908ms 5.2418 KOps/s 5.4350 KOps/s $\color{#d91a1a}-3.55\%$
test_values[td1_return_estimate-False-False] 25.1536ms 24.0826ms 41.5238 Ops/s 42.1936 Ops/s $\color{#d91a1a}-1.59\%$
test_values[vec_td1_return_estimate-False-False] 37.7663ms 35.4169ms 28.2351 Ops/s 28.3246 Ops/s $\color{#d91a1a}-0.32\%$
test_values[td_lambda_return_estimate-True-False] 48.5995ms 34.2553ms 29.1925 Ops/s 29.3722 Ops/s $\color{#d91a1a}-0.61\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.3004ms 35.4035ms 28.2458 Ops/s 28.4848 Ops/s $\color{#d91a1a}-0.84\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.1309ms 8.2906ms 120.6181 Ops/s 121.6798 Ops/s $\color{#d91a1a}-0.87\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3065ms 1.9524ms 512.1772 Ops/s 528.4409 Ops/s $\color{#d91a1a}-3.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4385ms 0.3538ms 2.8261 KOps/s 2.7462 KOps/s $\color{#35bf28}+2.91\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.0003ms 46.2631ms 21.6155 Ops/s 23.1325 Ops/s $\textbf{\color{#d91a1a}-6.56\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.8587ms 3.0234ms 330.7573 Ops/s 322.8436 Ops/s $\color{#35bf28}+2.45\%$
test_dqn_speed[False-None] 1.8400ms 1.3604ms 735.0574 Ops/s 727.3876 Ops/s $\color{#35bf28}+1.05\%$
test_dqn_speed[False-backward] 1.9095ms 1.8379ms 544.1135 Ops/s 544.0986 Ops/s $+0.00\%$
test_dqn_speed[True-None] 0.7468ms 0.4603ms 2.1725 KOps/s 2.1277 KOps/s $\color{#35bf28}+2.11\%$
test_dqn_speed[True-backward] 0.9430ms 0.8641ms 1.1573 KOps/s 1.1138 KOps/s $\color{#35bf28}+3.91\%$
test_dqn_speed[reduce-overhead-None] 0.6711ms 0.4579ms 2.1838 KOps/s 2.1332 KOps/s $\color{#35bf28}+2.37\%$
test_dqn_speed[reduce-overhead-backward] 0.9226ms 0.8639ms 1.1576 KOps/s 1.1191 KOps/s $\color{#35bf28}+3.44\%$
test_ddpg_speed[False-None] 3.2180ms 2.8109ms 355.7613 Ops/s 347.3294 Ops/s $\color{#35bf28}+2.43\%$
test_ddpg_speed[False-backward] 4.1883ms 3.9444ms 253.5242 Ops/s 228.8986 Ops/s $\textbf{\color{#35bf28}+10.76\%}$
test_ddpg_speed[True-None] 1.3007ms 0.9850ms 1.0153 KOps/s 993.8859 Ops/s $\color{#35bf28}+2.15\%$
test_ddpg_speed[True-backward] 1.9080ms 1.8394ms 543.6645 Ops/s 527.5385 Ops/s $\color{#35bf28}+3.06\%$
test_ddpg_speed[reduce-overhead-None] 1.3189ms 0.9904ms 1.0097 KOps/s 988.9810 Ops/s $\color{#35bf28}+2.09\%$
test_ddpg_speed[reduce-overhead-backward] 1.9144ms 1.8588ms 537.9671 Ops/s 526.4680 Ops/s $\color{#35bf28}+2.18\%$
test_sac_speed[False-None] 0.1584s 9.1057ms 109.8214 Ops/s 126.4614 Ops/s $\textbf{\color{#d91a1a}-13.16\%}$
test_sac_speed[False-backward] 12.1175ms 10.6215ms 94.1487 Ops/s 93.0930 Ops/s $\color{#35bf28}+1.13\%$
test_sac_speed[True-None] 2.2232ms 1.8149ms 551.0002 Ops/s 540.7900 Ops/s $\color{#35bf28}+1.89\%$
test_sac_speed[True-backward] 3.9965ms 3.5894ms 278.5983 Ops/s 280.9162 Ops/s $\color{#d91a1a}-0.83\%$
test_sac_speed[reduce-overhead-None] 2.3084ms 1.8184ms 549.9376 Ops/s 542.4036 Ops/s $\color{#35bf28}+1.39\%$
test_sac_speed[reduce-overhead-backward] 3.5601ms 3.4943ms 286.1787 Ops/s 284.4178 Ops/s $\color{#35bf28}+0.62\%$
test_redq_speed[False-None] 13.9207ms 12.9537ms 77.1978 Ops/s 78.9014 Ops/s $\color{#d91a1a}-2.16\%$
test_redq_speed[False-backward] 22.9738ms 21.9694ms 45.5179 Ops/s 45.2087 Ops/s $\color{#35bf28}+0.68\%$
test_redq_speed[True-None] 6.0995ms 5.0493ms 198.0482 Ops/s 220.4251 Ops/s $\textbf{\color{#d91a1a}-10.15\%}$
test_redq_speed[True-backward] 12.4289ms 11.8829ms 84.1546 Ops/s 84.1021 Ops/s $\color{#35bf28}+0.06\%$
test_redq_speed[reduce-overhead-None] 6.1007ms 4.8291ms 207.0781 Ops/s 217.2478 Ops/s $\color{#d91a1a}-4.68\%$
test_redq_speed[reduce-overhead-backward] 13.3657ms 12.2250ms 81.7995 Ops/s 84.2369 Ops/s $\color{#d91a1a}-2.89\%$
test_redq_deprec_speed[False-None] 14.4122ms 12.8429ms 77.8639 Ops/s 77.8752 Ops/s $\color{#d91a1a}-0.01\%$
test_redq_deprec_speed[False-backward] 19.8996ms 18.4329ms 54.2508 Ops/s 53.5471 Ops/s $\color{#35bf28}+1.31\%$
test_redq_deprec_speed[True-None] 4.6005ms 3.5032ms 285.4533 Ops/s 280.8011 Ops/s $\color{#35bf28}+1.66\%$
test_redq_deprec_speed[True-backward] 9.0453ms 8.4102ms 118.9034 Ops/s 127.5949 Ops/s $\textbf{\color{#d91a1a}-6.81\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.2876ms 3.5604ms 280.8701 Ops/s 280.0445 Ops/s $\color{#35bf28}+0.29\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.2166ms 7.9003ms 126.5772 Ops/s 124.5959 Ops/s $\color{#35bf28}+1.59\%$
test_td3_speed[False-None] 8.3712ms 7.7954ms 128.2801 Ops/s 123.7002 Ops/s $\color{#35bf28}+3.70\%$
test_td3_speed[False-backward] 12.0098ms 10.2342ms 97.7115 Ops/s 96.4880 Ops/s $\color{#35bf28}+1.27\%$
test_td3_speed[True-None] 1.8364ms 1.6831ms 594.1290 Ops/s 581.4219 Ops/s $\color{#35bf28}+2.19\%$
test_td3_speed[True-backward] 3.4496ms 3.3366ms 299.7043 Ops/s 284.4551 Ops/s $\textbf{\color{#35bf28}+5.36\%}$
test_td3_speed[reduce-overhead-None] 1.8814ms 1.6783ms 595.8375 Ops/s 581.6416 Ops/s $\color{#35bf28}+2.44\%$
test_td3_speed[reduce-overhead-backward] 3.6970ms 3.4154ms 292.7927 Ops/s 300.3870 Ops/s $\color{#d91a1a}-2.53\%$
test_cql_speed[False-None] 37.7895ms 35.6789ms 28.0278 Ops/s 27.7927 Ops/s $\color{#35bf28}+0.85\%$
test_cql_speed[False-backward] 0.2822s 51.2021ms 19.5305 Ops/s 21.5484 Ops/s $\textbf{\color{#d91a1a}-9.36\%}$
test_cql_speed[True-None] 16.5677ms 15.3056ms 65.3356 Ops/s 64.0210 Ops/s $\color{#35bf28}+2.05\%$
test_cql_speed[True-backward] 23.2556ms 21.7375ms 46.0034 Ops/s 44.6278 Ops/s $\color{#35bf28}+3.08\%$
test_cql_speed[reduce-overhead-None] 15.8580ms 15.2349ms 65.6389 Ops/s 63.7731 Ops/s $\color{#35bf28}+2.93\%$
test_cql_speed[reduce-overhead-backward] 22.9861ms 21.9688ms 45.5191 Ops/s 44.6137 Ops/s $\color{#35bf28}+2.03\%$
test_a2c_speed[False-None] 7.7173ms 7.0801ms 141.2411 Ops/s 136.4422 Ops/s $\color{#35bf28}+3.52\%$
test_a2c_speed[False-backward] 14.5194ms 13.9761ms 71.5509 Ops/s 69.1928 Ops/s $\color{#35bf28}+3.41\%$
test_a2c_speed[True-None] 4.8642ms 4.1666ms 240.0049 Ops/s 234.3110 Ops/s $\color{#35bf28}+2.43\%$
test_a2c_speed[True-backward] 10.9922ms 10.6308ms 94.0663 Ops/s 93.1067 Ops/s $\color{#35bf28}+1.03\%$
test_a2c_speed[reduce-overhead-None] 4.7098ms 4.1139ms 243.0796 Ops/s 237.2537 Ops/s $\color{#35bf28}+2.46\%$
test_a2c_speed[reduce-overhead-backward] 10.8099ms 10.5481ms 94.8036 Ops/s 94.0300 Ops/s $\color{#35bf28}+0.82\%$
test_ppo_speed[False-None] 7.9129ms 7.3722ms 135.6448 Ops/s 134.6680 Ops/s $\color{#35bf28}+0.73\%$
test_ppo_speed[False-backward] 15.6432ms 14.8689ms 67.2546 Ops/s 68.0913 Ops/s $\color{#d91a1a}-1.23\%$
test_ppo_speed[True-None] 4.3872ms 3.6863ms 271.2721 Ops/s 270.3066 Ops/s $\color{#35bf28}+0.36\%$
test_ppo_speed[True-backward] 10.1740ms 9.4681ms 105.6180 Ops/s 101.8465 Ops/s $\color{#35bf28}+3.70\%$
test_ppo_speed[reduce-overhead-None] 4.1663ms 3.6198ms 276.2589 Ops/s 267.3947 Ops/s $\color{#35bf28}+3.32\%$
test_ppo_speed[reduce-overhead-backward] 10.1748ms 9.4140ms 106.2249 Ops/s 104.1800 Ops/s $\color{#35bf28}+1.96\%$
test_reinforce_speed[False-None] 7.7858ms 6.4003ms 156.2434 Ops/s 152.9618 Ops/s $\color{#35bf28}+2.15\%$
test_reinforce_speed[False-backward] 9.9892ms 9.5757ms 104.4307 Ops/s 101.0212 Ops/s $\color{#35bf28}+3.38\%$
test_reinforce_speed[True-None] 3.4532ms 2.7097ms 369.0459 Ops/s 367.0321 Ops/s $\color{#35bf28}+0.55\%$
test_reinforce_speed[True-backward] 9.1066ms 8.4951ms 117.7155 Ops/s 115.9353 Ops/s $\color{#35bf28}+1.54\%$
test_reinforce_speed[reduce-overhead-None] 3.1580ms 2.6439ms 378.2227 Ops/s 373.5645 Ops/s $\color{#35bf28}+1.25\%$
test_reinforce_speed[reduce-overhead-backward] 9.6828ms 8.6686ms 115.3583 Ops/s 115.3940 Ops/s $\color{#d91a1a}-0.03\%$
test_iql_speed[False-None] 33.4961ms 32.2070ms 31.0491 Ops/s 31.1124 Ops/s $\color{#d91a1a}-0.20\%$
test_iql_speed[False-backward] 46.7664ms 44.2349ms 22.6066 Ops/s 22.2489 Ops/s $\color{#35bf28}+1.61\%$
test_iql_speed[True-None] 12.6197ms 10.6654ms 93.7614 Ops/s 94.2066 Ops/s $\color{#d91a1a}-0.47\%$
test_iql_speed[True-backward] 22.4105ms 21.4806ms 46.5536 Ops/s 46.3132 Ops/s $\color{#35bf28}+0.52\%$
test_iql_speed[reduce-overhead-None] 11.5388ms 10.7400ms 93.1099 Ops/s 93.0013 Ops/s $\color{#35bf28}+0.12\%$
test_iql_speed[reduce-overhead-backward] 23.3362ms 21.8195ms 45.8305 Ops/s 45.8192 Ops/s $\color{#35bf28}+0.02\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8055ms 5.0738ms 197.0901 Ops/s 196.5750 Ops/s $\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.4047ms 0.5343ms 1.8717 KOps/s 1.9425 KOps/s $\color{#d91a1a}-3.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7992ms 0.5009ms 1.9963 KOps/s 2.0353 KOps/s $\color{#d91a1a}-1.92\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.8585ms 4.8580ms 205.8465 Ops/s 205.6818 Ops/s $\color{#35bf28}+0.08\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.3339ms 0.4970ms 2.0122 KOps/s 1.9961 KOps/s $\color{#35bf28}+0.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8447ms 0.4646ms 2.1525 KOps/s 2.1174 KOps/s $\color{#35bf28}+1.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.9006ms 1.6084ms 621.7463 Ops/s 609.5575 Ops/s $\color{#35bf28}+2.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.3317ms 1.5643ms 639.2792 Ops/s 631.3406 Ops/s $\color{#35bf28}+1.26\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.4248ms 4.7553ms 210.2895 Ops/s 201.2143 Ops/s $\color{#35bf28}+4.51\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.9324ms 0.6478ms 1.5438 KOps/s 1.5431 KOps/s $\color{#35bf28}+0.04\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0322ms 0.6086ms 1.6431 KOps/s 1.6144 KOps/s $\color{#35bf28}+1.78\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.9433ms 4.6513ms 214.9932 Ops/s 210.6810 Ops/s $\color{#35bf28}+2.05\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7130ms 0.5109ms 1.9573 KOps/s 1.9565 KOps/s $\color{#35bf28}+0.04\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7201ms 0.4711ms 2.1227 KOps/s 2.0780 KOps/s $\color{#35bf28}+2.15\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1422ms 4.7297ms 211.4311 Ops/s 209.0342 Ops/s $\color{#35bf28}+1.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7411ms 0.5018ms 1.9929 KOps/s 1.9672 KOps/s $\color{#35bf28}+1.31\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7104ms 0.4712ms 2.1224 KOps/s 2.0994 KOps/s $\color{#35bf28}+1.10\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8144ms 4.7575ms 210.1946 Ops/s 203.7984 Ops/s $\color{#35bf28}+3.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.9591ms 0.6413ms 1.5592 KOps/s 1.5593 KOps/s $-0.00\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1029ms 0.6172ms 1.6202 KOps/s 1.6078 KOps/s $\color{#35bf28}+0.77\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.5225ms 4.0754ms 245.3742 Ops/s 241.6437 Ops/s $\color{#35bf28}+1.54\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.8716ms 2.2208ms 450.2830 Ops/s 433.6117 Ops/s $\color{#35bf28}+3.84\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.8025ms 1.2884ms 776.1586 Ops/s 719.7547 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.3666s 11.4443ms 87.3797 Ops/s 241.6920 Ops/s $\textbf{\color{#d91a1a}-63.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.2336ms 2.2495ms 444.5394 Ops/s 388.0885 Ops/s $\textbf{\color{#35bf28}+14.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.3323ms 1.2635ms 791.4688 Ops/s 751.4334 Ops/s $\textbf{\color{#35bf28}+5.33\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 5.6580ms 4.2619ms 234.6392 Ops/s 241.4182 Ops/s $\color{#d91a1a}-2.81\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.0630ms 2.3607ms 423.6014 Ops/s 415.3647 Ops/s $\color{#35bf28}+1.98\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.6726ms 1.4164ms 706.0375 Ops/s 694.0256 Ops/s $\color{#35bf28}+1.73\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.4028ms 10.8934ms 91.7990 Ops/s 84.7870 Ops/s $\textbf{\color{#35bf28}+8.27\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 21.5577ms 15.4142ms 64.8753 Ops/s 65.3089 Ops/s $\color{#d91a1a}-0.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 0.3706s 26.5938ms 37.6027 Ops/s 49.6662 Ops/s $\textbf{\color{#d91a1a}-24.29\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.8120ms 15.1880ms 65.8417 Ops/s 65.2866 Ops/s $\color{#35bf28}+0.85\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 21.0005ms 19.7268ms 50.6926 Ops/s 49.7658 Ops/s $\color{#35bf28}+1.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.6505ms 16.1639ms 61.8663 Ops/s 61.2049 Ops/s $\color{#35bf28}+1.08\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7746s 0.7577s 1.3198 Ops/s 1.3373 Ops/s $\color{#d91a1a}-1.30\%$
test_transformed 1.0333s 1.0193s 0.9811 Ops/s 0.9909 Ops/s $\color{#d91a1a}-0.99\%$
test_serial 2.2329s 2.1898s 0.4567 Ops/s 0.4612 Ops/s $\color{#d91a1a}-0.98\%$
test_parallel 2.1542s 2.0116s 0.4971 Ops/s 0.4962 Ops/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-True-True-True-True] 0.2125ms 39.6060μs 25.2487 KOps/s 25.0556 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-True-True-False] 0.2071ms 22.7530μs 43.9503 KOps/s 44.3636 KOps/s $\color{#d91a1a}-0.93\%$
test_step_mdp_speed[True-True-True-False-True] 48.6220μs 21.5799μs 46.3393 KOps/s 45.4056 KOps/s $\color{#35bf28}+2.06\%$
test_step_mdp_speed[True-True-True-False-False] 53.6530μs 12.5252μs 79.8391 KOps/s 77.3972 KOps/s $\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-False-True-True] 0.1697ms 41.8331μs 23.9045 KOps/s 23.4412 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[True-True-False-True-False] 66.9740μs 24.3010μs 41.1505 KOps/s 40.3830 KOps/s $\color{#35bf28}+1.90\%$
test_step_mdp_speed[True-True-False-False-True] 57.0640μs 24.0294μs 41.6156 KOps/s 42.3484 KOps/s $\color{#d91a1a}-1.73\%$
test_step_mdp_speed[True-True-False-False-False] 0.1267ms 14.8406μs 67.3826 KOps/s 66.8765 KOps/s $\color{#35bf28}+0.76\%$
test_step_mdp_speed[True-False-True-True-True] 84.0350μs 44.1237μs 22.6636 KOps/s 22.5851 KOps/s $\color{#35bf28}+0.35\%$
test_step_mdp_speed[True-False-True-True-False] 0.1139ms 26.8629μs 37.2260 KOps/s 36.5812 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[True-False-True-False-True] 57.1930μs 23.6980μs 42.1977 KOps/s 41.7676 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-False-True-False-False] 82.5140μs 14.8218μs 67.4683 KOps/s 67.4919 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-False-False-True-True] 0.1024ms 45.5645μs 21.9469 KOps/s 21.8510 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-False-False-True-False] 58.5430μs 28.4142μs 35.1937 KOps/s 34.3477 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[True-False-False-False-True] 77.3540μs 25.6526μs 38.9824 KOps/s 39.3004 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-False-False-False-False] 43.1520μs 16.8050μs 59.5060 KOps/s 58.5393 KOps/s $\color{#35bf28}+1.65\%$
test_step_mdp_speed[False-True-True-True-True] 0.1820ms 44.0204μs 22.7168 KOps/s 22.4590 KOps/s $\color{#35bf28}+1.15\%$
test_step_mdp_speed[False-True-True-True-False] 47.8520μs 26.5421μs 37.6760 KOps/s 36.6752 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[False-True-True-False-True] 54.9230μs 27.7846μs 35.9912 KOps/s 35.7935 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-True-True-False-False] 48.1830μs 16.3382μs 61.2061 KOps/s 60.8109 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-True-False-True-True] 0.1150ms 45.8779μs 21.7970 KOps/s 21.5808 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[False-True-False-True-False] 58.9540μs 28.6362μs 34.9208 KOps/s 33.9480 KOps/s $\color{#35bf28}+2.87\%$
test_step_mdp_speed[False-True-False-False-True] 3.1840ms 30.5303μs 32.7543 KOps/s 33.7608 KOps/s $\color{#d91a1a}-2.98\%$
test_step_mdp_speed[False-True-False-False-False] 74.0240μs 18.8658μs 53.0058 KOps/s 54.1816 KOps/s $\color{#d91a1a}-2.17\%$
test_step_mdp_speed[False-False-True-True-True] 76.5340μs 48.1860μs 20.7529 KOps/s 20.6336 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[False-False-True-True-False] 88.1240μs 31.3497μs 31.8982 KOps/s 31.8953 KOps/s $+0.01\%$
test_step_mdp_speed[False-False-True-False-True] 0.1492ms 29.3646μs 34.0546 KOps/s 33.0438 KOps/s $\color{#35bf28}+3.06\%$
test_step_mdp_speed[False-False-True-False-False] 44.1930μs 18.9231μs 52.8455 KOps/s 54.1011 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[False-False-False-True-True] 87.0150μs 49.5156μs 20.1956 KOps/s 19.8607 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-False-False-True-False] 57.5330μs 33.1902μs 30.1294 KOps/s 29.9130 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[False-False-False-False-True] 0.1434ms 30.7524μs 32.5178 KOps/s 31.6154 KOps/s $\color{#35bf28}+2.85\%$
test_step_mdp_speed[False-False-False-False-False] 46.5020μs 20.2564μs 49.3671 KOps/s 49.0217 KOps/s $\color{#35bf28}+0.70\%$
test_values[generalized_advantage_estimate-True-True] 28.3440ms 27.0286ms 36.9978 Ops/s 38.7480 Ops/s $\color{#d91a1a}-4.52\%$
test_values[vec_generalized_advantage_estimate-True-True] 99.3684ms 2.8948ms 345.4523 Ops/s 352.8031 Ops/s $\color{#d91a1a}-2.08\%$
test_values[td0_return_estimate-False-False] 0.1184ms 83.2461μs 12.0126 KOps/s 12.0959 KOps/s $\color{#d91a1a}-0.69\%$
test_values[td1_return_estimate-False-False] 61.7568ms 59.7496ms 16.7365 Ops/s 17.6177 Ops/s $\textbf{\color{#d91a1a}-5.00\%}$
test_values[vec_td1_return_estimate-False-False] 1.4448ms 1.1008ms 908.4190 Ops/s 912.6914 Ops/s $\color{#d91a1a}-0.47\%$
test_values[td_lambda_return_estimate-True-False] 97.2347ms 95.5231ms 10.4687 Ops/s 11.1214 Ops/s $\textbf{\color{#d91a1a}-5.87\%}$
test_values[vec_td_lambda_return_estimate-True-False] 1.5655ms 1.1065ms 903.7732 Ops/s 908.1601 Ops/s $\color{#d91a1a}-0.48\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 27.6898ms 26.5342ms 37.6872 Ops/s 38.9557 Ops/s $\color{#d91a1a}-3.26\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0698ms 0.7635ms 1.3098 KOps/s 1.2979 KOps/s $\color{#35bf28}+0.92\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8387ms 0.6805ms 1.4695 KOps/s 1.4596 KOps/s $\color{#35bf28}+0.68\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6448ms 1.4921ms 670.1965 Ops/s 668.7619 Ops/s $\color{#35bf28}+0.21\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9019ms 0.6931ms 1.4427 KOps/s 1.4319 KOps/s $\color{#35bf28}+0.76\%$
test_dqn_speed[False-None] 7.0756ms 1.5332ms 652.2359 Ops/s 657.7485 Ops/s $\color{#d91a1a}-0.84\%$
test_dqn_speed[False-backward] 2.2618ms 2.1500ms 465.1230 Ops/s 463.7388 Ops/s $\color{#35bf28}+0.30\%$
test_dqn_speed[True-None] 0.7301ms 0.5483ms 1.8239 KOps/s 1.8204 KOps/s $\color{#35bf28}+0.19\%$
test_dqn_speed[True-backward] 1.1426ms 1.0884ms 918.7581 Ops/s 884.1694 Ops/s $\color{#35bf28}+3.91\%$
test_dqn_speed[reduce-overhead-None] 0.9622ms 0.5449ms 1.8353 KOps/s 1.7688 KOps/s $\color{#35bf28}+3.76\%$
test_dqn_speed[reduce-overhead-backward] 1.1043ms 0.9570ms 1.0449 KOps/s 1.0129 KOps/s $\color{#35bf28}+3.16\%$
test_ddpg_speed[False-None] 3.4111ms 2.8920ms 345.7812 Ops/s 335.0676 Ops/s $\color{#35bf28}+3.20\%$
test_ddpg_speed[False-backward] 4.4791ms 4.1354ms 241.8164 Ops/s 235.2899 Ops/s $\color{#35bf28}+2.77\%$
test_ddpg_speed[True-None] 1.4924ms 1.0813ms 924.8436 Ops/s 914.2024 Ops/s $\color{#35bf28}+1.16\%$
test_ddpg_speed[True-backward] 2.2664ms 2.1395ms 467.4044 Ops/s 447.2467 Ops/s $\color{#35bf28}+4.51\%$
test_ddpg_speed[reduce-overhead-None] 1.3203ms 1.1006ms 908.5859 Ops/s 880.6786 Ops/s $\color{#35bf28}+3.17\%$
test_ddpg_speed[reduce-overhead-backward] 1.8568ms 1.6340ms 611.9803 Ops/s 597.3282 Ops/s $\color{#35bf28}+2.45\%$
test_sac_speed[False-None] 8.6826ms 8.1978ms 121.9842 Ops/s 121.5921 Ops/s $\color{#35bf28}+0.32\%$
test_sac_speed[False-backward] 11.9746ms 11.2507ms 88.8836 Ops/s 88.9066 Ops/s $\color{#d91a1a}-0.03\%$
test_sac_speed[True-None] 1.7985ms 1.5320ms 652.7342 Ops/s 638.4463 Ops/s $\color{#35bf28}+2.24\%$
test_sac_speed[True-backward] 3.6007ms 3.4151ms 292.8148 Ops/s 301.1118 Ops/s $\color{#d91a1a}-2.76\%$
test_sac_speed[reduce-overhead-None] 23.1218ms 12.8516ms 77.8114 Ops/s 79.0833 Ops/s $\color{#d91a1a}-1.61\%$
test_sac_speed[reduce-overhead-backward] 1.6307ms 1.5104ms 662.0845 Ops/s 727.4001 Ops/s $\textbf{\color{#d91a1a}-8.98\%}$
test_redq_speed[False-None] 8.3630ms 7.6544ms 130.6442 Ops/s 130.0231 Ops/s $\color{#35bf28}+0.48\%$
test_redq_speed[False-backward] 13.6026ms 12.1399ms 82.3728 Ops/s 85.9347 Ops/s $\color{#d91a1a}-4.14\%$
test_redq_speed[True-None] 2.2515ms 1.9995ms 500.1304 Ops/s 494.4818 Ops/s $\color{#35bf28}+1.14\%$
test_redq_speed[True-backward] 4.0358ms 3.8739ms 258.1345 Ops/s 267.7060 Ops/s $\color{#d91a1a}-3.58\%$
test_redq_speed[reduce-overhead-None] 2.4860ms 2.0236ms 494.1619 Ops/s 491.8693 Ops/s $\color{#35bf28}+0.47\%$
test_redq_speed[reduce-overhead-backward] 3.8836ms 3.6898ms 271.0175 Ops/s 266.9504 Ops/s $\color{#35bf28}+1.52\%$
test_redq_deprec_speed[False-None] 9.8677ms 9.2087ms 108.5933 Ops/s 107.4049 Ops/s $\color{#35bf28}+1.11\%$
test_redq_deprec_speed[False-backward] 12.8797ms 12.3195ms 81.1719 Ops/s 80.9441 Ops/s $\color{#35bf28}+0.28\%$
test_redq_deprec_speed[True-None] 2.6953ms 2.3453ms 426.3817 Ops/s 424.4103 Ops/s $\color{#35bf28}+0.46\%$
test_redq_deprec_speed[True-backward] 4.2442ms 4.0333ms 247.9373 Ops/s 247.4113 Ops/s $\color{#35bf28}+0.21\%$
test_redq_deprec_speed[reduce-overhead-None] 2.7393ms 2.3969ms 417.2141 Ops/s 422.5019 Ops/s $\color{#d91a1a}-1.25\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.4659ms 4.0311ms 248.0707 Ops/s 236.8423 Ops/s $\color{#35bf28}+4.74\%$
test_td3_speed[False-None] 8.2539ms 8.0910ms 123.5936 Ops/s 124.1099 Ops/s $\color{#d91a1a}-0.42\%$
test_td3_speed[False-backward] 10.9242ms 10.4211ms 95.9593 Ops/s 48.0993 Ops/s $\textbf{\color{#35bf28}+99.50\%}$
test_td3_speed[True-None] 1.6947ms 1.6260ms 615.0084 Ops/s 635.2741 Ops/s $\color{#d91a1a}-3.19\%$
test_td3_speed[True-backward] 3.2524ms 3.1191ms 320.6062 Ops/s 300.2969 Ops/s $\textbf{\color{#35bf28}+6.76\%}$
test_td3_speed[reduce-overhead-None] 83.5249ms 26.2980ms 38.0257 Ops/s 36.9529 Ops/s $\color{#35bf28}+2.90\%$
test_td3_speed[reduce-overhead-backward] 1.3603ms 1.3107ms 762.9768 Ops/s 678.7676 Ops/s $\textbf{\color{#35bf28}+12.41\%}$
test_cql_speed[False-None] 16.8800ms 16.4850ms 60.6613 Ops/s 60.0929 Ops/s $\color{#35bf28}+0.95\%$
test_cql_speed[False-backward] 22.4572ms 21.6930ms 46.0977 Ops/s 44.8432 Ops/s $\color{#35bf28}+2.80\%$
test_cql_speed[True-None] 3.3213ms 2.9206ms 342.3991 Ops/s 333.1749 Ops/s $\color{#35bf28}+2.77\%$
test_cql_speed[True-backward] 5.2564ms 5.0372ms 198.5220 Ops/s 187.5200 Ops/s $\textbf{\color{#35bf28}+5.87\%}$
test_cql_speed[reduce-overhead-None] 21.7830ms 13.3168ms 75.0929 Ops/s 58.8993 Ops/s $\textbf{\color{#35bf28}+27.49\%}$
test_cql_speed[reduce-overhead-backward] 1.6793ms 1.5070ms 663.5549 Ops/s 588.8388 Ops/s $\textbf{\color{#35bf28}+12.69\%}$
test_a2c_speed[False-None] 3.6493ms 3.2589ms 306.8513 Ops/s 303.0421 Ops/s $\color{#35bf28}+1.26\%$
test_a2c_speed[False-backward] 6.6559ms 6.2193ms 160.7903 Ops/s 153.9303 Ops/s $\color{#35bf28}+4.46\%$
test_a2c_speed[True-None] 1.3851ms 0.9985ms 1.0015 KOps/s 959.1259 Ops/s $\color{#35bf28}+4.42\%$
test_a2c_speed[True-backward] 2.9062ms 2.5992ms 384.7308 Ops/s 379.3678 Ops/s $\color{#35bf28}+1.41\%$
test_a2c_speed[reduce-overhead-None] 22.2008ms 11.7708ms 84.9559 Ops/s 86.8272 Ops/s $\color{#d91a1a}-2.16\%$
test_a2c_speed[reduce-overhead-backward] 1.1476ms 0.9799ms 1.0205 KOps/s 998.5481 Ops/s $\color{#35bf28}+2.20\%$
test_ppo_speed[False-None] 4.2321ms 3.8659ms 258.6728 Ops/s 261.0679 Ops/s $\color{#d91a1a}-0.92\%$
test_ppo_speed[False-backward] 7.4279ms 6.9774ms 143.3198 Ops/s 142.5946 Ops/s $\color{#35bf28}+0.51\%$
test_ppo_speed[True-None] 1.1939ms 0.9751ms 1.0255 KOps/s 1.0401 KOps/s $\color{#d91a1a}-1.40\%$
test_ppo_speed[True-backward] 2.7708ms 2.5542ms 391.5107 Ops/s 387.6586 Ops/s $\color{#35bf28}+0.99\%$
test_ppo_speed[reduce-overhead-None] 7.0103ms 0.5153ms 1.9405 KOps/s 1.9165 KOps/s $\color{#35bf28}+1.25\%$
test_ppo_speed[reduce-overhead-backward] 1.1652ms 1.0616ms 941.9402 Ops/s 873.4766 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_reinforce_speed[False-None] 2.5338ms 2.3247ms 430.1674 Ops/s 429.0849 Ops/s $\color{#35bf28}+0.25\%$
test_reinforce_speed[False-backward] 4.2860ms 3.5467ms 281.9522 Ops/s 290.1113 Ops/s $\color{#d91a1a}-2.81\%$
test_reinforce_speed[True-None] 1.0148ms 0.8217ms 1.2169 KOps/s 1.1838 KOps/s $\color{#35bf28}+2.80\%$
test_reinforce_speed[True-backward] 2.7734ms 2.5605ms 390.5497 Ops/s 382.5537 Ops/s $\color{#35bf28}+2.09\%$
test_reinforce_speed[reduce-overhead-None] 23.3695ms 11.9770ms 83.4936 Ops/s 87.9641 Ops/s $\textbf{\color{#d91a1a}-5.08\%}$
test_reinforce_speed[reduce-overhead-backward] 1.2511ms 1.1917ms 839.1528 Ops/s 828.8903 Ops/s $\color{#35bf28}+1.24\%$
test_iql_speed[False-None] 10.3971ms 9.7182ms 102.8996 Ops/s 105.8369 Ops/s $\color{#d91a1a}-2.78\%$
test_iql_speed[False-backward] 14.2781ms 13.6413ms 73.3070 Ops/s 74.3596 Ops/s $\color{#d91a1a}-1.42\%$
test_iql_speed[True-None] 2.1031ms 1.8377ms 544.1555 Ops/s 538.3884 Ops/s $\color{#35bf28}+1.07\%$
test_iql_speed[True-backward] 4.8368ms 4.4236ms 226.0593 Ops/s 223.4413 Ops/s $\color{#35bf28}+1.17\%$
test_iql_speed[reduce-overhead-None] 20.6795ms 11.7359ms 85.2088 Ops/s 109.4676 Ops/s $\textbf{\color{#d91a1a}-22.16\%}$
test_iql_speed[reduce-overhead-backward] 1.9060ms 1.6078ms 621.9850 Ops/s 695.3433 Ops/s $\textbf{\color{#d91a1a}-10.55\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0103ms 6.4119ms 155.9594 Ops/s 151.7818 Ops/s $\color{#35bf28}+2.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6045ms 0.3664ms 2.7292 KOps/s 3.4626 KOps/s $\textbf{\color{#d91a1a}-21.18\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6018ms 0.3475ms 2.8778 KOps/s 3.7328 KOps/s $\textbf{\color{#d91a1a}-22.91\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6084ms 6.1819ms 161.7630 Ops/s 159.5074 Ops/s $\color{#35bf28}+1.41\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7564ms 0.2686ms 3.7230 KOps/s 2.6744 KOps/s $\textbf{\color{#35bf28}+39.21\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6057ms 0.3489ms 2.8660 KOps/s 2.8739 KOps/s $\color{#d91a1a}-0.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8066ms 1.5539ms 643.5422 Ops/s 690.5075 Ops/s $\textbf{\color{#d91a1a}-6.80\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5537ms 1.3353ms 748.8967 Ops/s 719.9004 Ops/s $\color{#35bf28}+4.03\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6543ms 6.3944ms 156.3864 Ops/s 155.4807 Ops/s $\color{#35bf28}+0.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1809ms 0.5113ms 1.9559 KOps/s 2.2642 KOps/s $\textbf{\color{#d91a1a}-13.62\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8003ms 0.5001ms 1.9996 KOps/s 2.1861 KOps/s $\textbf{\color{#d91a1a}-8.53\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.5268ms 6.2151ms 160.8989 Ops/s 157.8360 Ops/s $\color{#35bf28}+1.94\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9680ms 0.2946ms 3.3939 KOps/s 3.4630 KOps/s $\color{#d91a1a}-2.00\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6202ms 0.3700ms 2.7029 KOps/s 3.7672 KOps/s $\textbf{\color{#d91a1a}-28.25\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6782ms 6.2076ms 161.0917 Ops/s 160.3695 Ops/s $\color{#35bf28}+0.45\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9072ms 0.3762ms 2.6578 KOps/s 3.6129 KOps/s $\textbf{\color{#d91a1a}-26.44\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5930ms 0.3320ms 3.0117 KOps/s 2.7249 KOps/s $\textbf{\color{#35bf28}+10.52\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6621ms 6.3881ms 156.5407 Ops/s 155.6070 Ops/s $\color{#35bf28}+0.60\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2730ms 0.4753ms 2.1041 KOps/s 1.9378 KOps/s $\textbf{\color{#35bf28}+8.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6765ms 0.4607ms 2.1704 KOps/s 2.0399 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.1711ms 5.3248ms 187.7999 Ops/s 164.6979 Ops/s $\textbf{\color{#35bf28}+14.03\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.4456ms 2.1263ms 470.3002 Ops/s 438.4793 Ops/s $\textbf{\color{#35bf28}+7.26\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.0448ms 1.2366ms 808.6653 Ops/s 775.8311 Ops/s $\color{#35bf28}+4.23\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.9618ms 5.3714ms 186.1699 Ops/s 187.5583 Ops/s $\color{#d91a1a}-0.74\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.5125s 12.3577ms 80.9215 Ops/s 435.4868 Ops/s $\textbf{\color{#d91a1a}-81.42\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.2895ms 1.1649ms 858.4399 Ops/s 933.7989 Ops/s $\textbf{\color{#d91a1a}-8.07\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.4924ms 5.6062ms 178.3733 Ops/s 32.7436 Ops/s $\textbf{\color{#35bf28}+444.76\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.5519ms 2.2035ms 453.8155 Ops/s 470.7679 Ops/s $\color{#d91a1a}-3.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 9.1481ms 1.4715ms 679.5993 Ops/s 713.2783 Ops/s $\color{#d91a1a}-4.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3563ms 13.1665ms 75.9506 Ops/s 74.7589 Ops/s $\color{#35bf28}+1.59\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 21.0020ms 18.1245ms 55.1739 Ops/s 56.6122 Ops/s $\color{#d91a1a}-2.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.9078ms 18.2586ms 54.7686 Ops/s 54.4909 Ops/s $\color{#35bf28}+0.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.7357ms 18.0251ms 55.4783 Ops/s 54.0463 Ops/s $\color{#35bf28}+2.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.4175ms 17.5296ms 57.0463 Ops/s 54.8854 Ops/s $\color{#35bf28}+3.94\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.5430ms 19.2737ms 51.8843 Ops/s 50.9305 Ops/s $\color{#35bf28}+1.87\%$

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to interact with SipHash and other hash modules?

Or perhpas a tokenizer (not strictly a hash but the signature is similar for strings).

Or perhaps a custom hash function?

I'm happy with this being restrictive but if possible I'd prefer to avoid having multple transforms that do the str -> int map

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Dec 18, 2024

Good questions. I agree that the transform should allow the user to specify any hashing function or tokenizer they want to use, including nn.Modules like SipHash. In fact, I think the transform basically just needs to do is this (along with adding the out_keys to the spec):

for in_key, out_key in zip(in_keys, out_keys):
    td[out_key] = user_specified_fn(td[in_key])

The user_specified_fn can be any unary function that knows how to handle whatever is in td[in_key]. However, the user would be free to specify a function that has nothing to do with hashing or tokenizing--it could even be something like lambda x: x + 4, so it doesn't seem right to call this a Hash transform.

It seems like what we really need is just a general key-wise UnaryTransform. I think its signature would be something like:

class UnaryTransform(Transform):
    def __init__(self, in_keys, out_keys, fn, output_spec):
    ...

In order to use this to implement a Python-hash transform for ChessEnv, for instance, we would just do:

chess_hash_t = UnaryTransform(
    in_keys=["fen"],
    out_keys=["hashing"],
    fn=hash,
    output_spec=Unbounded(shape=(), dtype=torch.int64)
)

Or to implement a transform that uses just the Python-hash transform, like the one currently in this PR, we would just do this:

class PythonHash(UnaryTransform):
    def __init__(self, in_keys, out_keys):
        super().__init__(
            in_keys=in_keys,
            out_keys=out_keys,
            fn=hash,
            output_spec=Unbounded(shape=(), dtype=torch.int64)
        )

Although it would be nice if the user didn't have to even think about specs. I wonder if it would be possible to make a transform automatically guess what the spec updates need to be, based on the output of forward.

We could allow the user to optionally specify inverse keys and an inverse function to UnaryTransform. If we make the class robust enough, we could probably simplify the implementations of many of the existing transforms by making them inherit from UnaryTransform and then set the proper configurations in the derived class's __init__ function.

We could also consider other kinds of generalized transforms. For instance, ReduceTransform could take multiple in_keys and reduce them down to one out_key. Stack and CatTensors are examples of this kind of transform.

What do you think?

@vmoens
Copy link
Contributor

vmoens commented Dec 18, 2024

@kurtamohler I'm up to it up until PythonHash, but I'd be cautious about having a big fraction of the transforms all inherit from the same parent.
The design philosophy in the PT ecosystem is (usually) to avoid too many levels of inheritance to keep a clear API for the users, it makes it easier for people to hack their own transforms.

@vmoens vmoens added the enhancement New feature or request label Dec 18, 2024
@vmoens
Copy link
Contributor

vmoens commented Dec 20, 2024

For the record, here is a script that makes it possible to use sha256 hashes (fewer collisions) in a reproducible manner (ie, sort of seeded)

import hashlib


def reproducible_hash_parts(string, seed):
    """
    Creates a reproducible 256-bit hash from a string using a seed and splits it into four 64-bit parts.

    Args:
        string (str): The input string.
        seed (str): The seed value.

    Returns:
        tuple: Four 64-bit integers representing the parts of the 256-bit hash value.
    """
    # Prepend the seed to the string
    seeded_string = seed + string

    # Create a new SHA-256 hash object
    hash_object = hashlib.sha256()

    # Update the hash object with the seeded string
    hash_object.update(seeded_string.encode('utf-8'))

    # Get the hash value as bytes
    hash_bytes = hash_object.digest()

    # Split the hash bytes into four parts
    part1 = hash_bytes[:8]
    part2 = hash_bytes[8:16]
    part3 = hash_bytes[16:24]
    part4 = hash_bytes[24:]

    # Convert each part to a 64-bit integer
    part1_value = int.from_bytes(part1, 'big')
    part2_value = int.from_bytes(part2, 'big')
    part3_value = int.from_bytes(part3, 'big')
    part4_value = int.from_bytes(part4, 'big')

    return part1_value, part2_value, part3_value, part4_value


# Example usage:
string = "Hello, World!"
seed = "my_seed"

part1, part2, part3, part4 = reproducible_hash_parts(string, seed)
print(f"Part 1: {part1}")
print(f"Part 2: {part2}")
print(f"Part 3: {part3}")
print(f"Part 4: {part4}")

@vmoens
Copy link
Contributor

vmoens commented Dec 20, 2024

Another random thought: we could add the option to store a table of hash-to-value within the transform

class HashTransform(...):
    _hash_table: Dict[HashType, str]
    ...

and include that in the transform state-dict (or make it easy to save this for future use)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants