Release v0.5.0: Dynamic specs, envs with non-tensor data and replay buffer checkpointers · pytorch/rl

What's Changed

This new release makes it possible to run environments that output non-tensor data. #1944

We also introduce dynamic specs, allowing environments to change the size of the observations / actions during the
course of a rollout. This feature is compatible with parallel environment and collectors! #2143

Additionally, it is now possible to update a Replay Buffer in-place by assigning values at a given index. #2224

Finally, TorchRL is now compatible with Python 3.12 (#2282, #2281).

As always, a huge thanks to the vibrant OSS community that helps us developt this library!

New algorithms

[Algorithm] CrossQ by @BY571 in #2033
[Algorithm] TD3+BC by @BY571 in #2249

Features

[Feature] ActionDiscretizer by @vmoens in #2247
[Feature] Add KL approximation in PPO loss metadata by @albertbou92 in #2166
[Feature] Add modules.AdditiveGaussianModule by @kurtamohler in #2296
[Feature] Add modules.OrnsteinUhlenbeckProcessModule by @kurtamohler in #2297
[Feature] Autocomplete for losses by @vmoens in #2148
[Feature] Crop Transform by @albertbou92 in #2336
[Feature] Dynamic specs by @vmoens in #2143
[Feature] Extract primers from modules that contain RNNs by @albertbou92 in #2127
[Feature] Jumanji from_pixels=True by @vmoens in #2129
[Feature] Make ProbabilisticActor compatible with Composite distributions by @vmoens in #2220
[Feature] Replay buffer checkpointers by @vmoens in #2137
[Feature] Some improvements to VecNorm by @vmoens in #2251
[Feature] Split-trajectories and represent as nested tensor by @vmoens in #2043
[Feature] _make_ordinal_device by @vmoens in #2237
[Feature] assigning values to RB storage by @vmoens in #2224

Bug fixes

[BugFix,Feature] Allow non-tensor data in envs by @vmoens in #1944
[BugFix] Allow zero alpha value for PrioritizedSampler by @albertbou92 in #2164
[BugFix] Expose MARL modules by @vmoens in #2321
[BugFix] Fit vecnorm out_keys by @vmoens in #2157
[BugFix] Fix Brax by @vmoens in #2233
[BugFix] Fix OOB sampling in PrioritizedSliceSampler by @vmoens in #2239
[BugFix] Fix VecNorm test in test_collectors.py by @vmoens in #2162
[BugFix] Fix to in MultiDiscreteTensorSpec by @Quinticx in #2204
[BugFix] Fix and test PRB priority update across dims and rb types by @vmoens in #2244
[BugFix] Fix another ctx test by @vmoens in #2284
[BugFix] Fix async gym env with non-sync resets by @vmoens in #2170
[BugFix] Fix async gym when all reset by @vmoens in #2144
[BugFix] Fix brax wrapping by @vmoens in #2190
[BugFix] Fix collector tests where device ordinal is needed by @vmoens in #2240
[BugFix] Fix collectors with non tensors by @vmoens in #2232
[BugFix] Fix done/terminated computation in slice samplers by @vmoens in #2213
[BugFix] Fix info reading with async gym by @vmoens in #2150
[BugFix] Fix isaac - bis by @vmoens in #2119
[BugFix] Fix lib tests by @vmoens in #2218
[BugFix] Fix max value within buffer during update priority by @vmoens in #2242
[BugFix] Fix max-priority update by @vmoens in #2215
[BugFix] Fix non-tensor passage in _StepMDP by @vmoens in #2260
[BugFix] Fix non-tensor passage in _StepMDP by @vmoens in #2262
[BugFix] Fix prefetch in samples without replacement - .sample() compatibility issues by @vmoens in #2226
[BugFix] Fix sampling in NonTensorSpec by @vmoens in #2172
[BugFix] Fix sampling of values from NonTensorSpec by @vmoens in #2169
[BugFix] Fix slice sampler end computation at the cursor place by @vmoens in #2225
[BugFix] Fix sliced PRB when only traj is provided by @vmoens in #2228
[BugFix] Fix strict length in PRB+SliceSampler by @vmoens in #2202
[BugFix] Fix strict_length in prioritized slice sampler by @vmoens in #2194
[BugFix] Fix tanh normal mode by @vmoens in #2198
[BugFix] Fix tensordict private imports by @vmoens in #2275
[BugFix] Fix test_specs.py by @vmoens in #2214
[BugFix] Fix torch 2.3 compatibility of padding indices by @vmoens in #2216
[BugFix] Fix truncated normal by @vmoens in #2147
[BugFix] Fix typo in weight assignment in PRB by @vmoens in #2241
[BugFix] Fix update_priority generic signature for Samplers by @vmoens in #2252
[BugFix] Fix vecnorm state-dicts by @vmoens in #2158
[BugFix] Global import of optional library by @matteobettini in #2217
[BugFix] Gym async with _reset full of True by @vmoens in #2145
[BugFix] MLFlow logger by @GJBoth in #2152
[BugFix] Make DMControlEnv aware of truncated signals by @vmoens in #2196
[BugFix] Make _reset follow done shape by @matteobettini in #2189
[BugFix] EnvBase._complete_done to complete "terminated" key properly by @kurtamohler in #2294
[BugFix] LazyTensorStorage only allocates data on the given device by @matteobettini in #2188
[BugFix] done = done | truncated in collector by @vmoens in #2333
[BugFix] buffer iter for samplers without replacement + prefetch by @JulianKu in #2185
[BugFix] buffer __iter__ for samplers without replacement + prefetch by @JulianKu in #2178
[BugFix] missing deprecated kwargs by @fedebotu in #2125

Docs

[Doc] Add Custom Options for VideoRecorder by @N00bcak in #2259
[Doc] Add documentation for masks in tensor specs by @kurtamohler in #2289
[Doc] Better doc for make_tensordict_primer by @vmoens in #2324
[Doc] Dynamic envs by @vmoens in #2191
[Doc] Edit README for local installs by @vmoens in #2255
[Doc] Fix algorithms references in tutos by @vmoens in #2320
[Doc] Fix documentation mismatch for default argument by @TheRisenPhoenix in #2149
[Doc] Fix links in doc by @vmoens in #2151
[Doc] Fix mistakes in docs for Trainer checkpointing backends by @kurtamohler in #2285
[Doc] Indicate necessary context to run multiprocessed collectors in doc by @GJBoth in #2126
[Doc] Restore colab links by @vmoens in #2197
[Doc] Update README.md by @KPCOFGS in #2155
[Doc] default_interaction_type doc by @vmoens in #2177
[Docs] InitTracker cleanup by @matteobettini in #2330
[Docs] Reintroduce BenchMARL pointers in MARL tutos by @matteobettini in #2159

Performance

[Performance, Refactor, BugFix] Faster loading of uninitialized storages by @vmoens in #2221
[Performance] consolidate TDs in ParallelEnv without buffers by @vmoens in #2231

Others

Fix "Run in Colab" and "Download Notebook" links in tutorials by @kurtamohler in #2268
Fix brax examples by @Jendker in #2318
Fixed several broken links in readme.md by @drMJ in #2156
Revert "[BugFix] Fix non-tensor passage in _StepMDP" by @vmoens in #2261
Revert "[BugFix] Fix tensordict private imports" by @vmoens in #2276
Revert "[BugFix] buffer __iter__ for samplers without replacement + prefetch" by @vmoens in #2182
[CI, Tests] Fix windows tests by @vmoens in #2337
[CI] Bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabot in #2250
[CI] Fix CI by @vmoens in #2245
[CI] Fix nightly by @vmoens in #2279
[CI] Fix wheels by @vmoens in #2274
[CI] Pin transformers version to < 4.42.0 to make vmap happy by @vmoens in #2278
[CI] Upgrade SDL to install pygame 2.6 by @vmoens in #2248
[CI] Windows build fix by @vmoens in #2335
[CI] python 3.12 nightlies by @vmoens in #2281
[Example,BugFix] Add a Async gym env example by @vmoens in #2139
[MINOR] Fix unclear language by @software-samurai in #2165
[Minor] Code quality improvements by @vmoens in #2140
[Quality] Fix low/high in SOTA implementations by @vmoens in #2266
[Quality] Fix repr of MARL modules by @vmoens in #2192
[Quality] Remove global seeding in set_seed by @vmoens in #2195
[Quality] Warn if the sampler is not prioritized but update_priority is called by @vmoens in #2253
[Quality] better error message for CompositeSpec shape mismatch by @vmoens in #2223
[Refactor] Deprecate NormalParamWrapper by @vmoens in #2308
[Refactor] Remove _run_checks from TensorDict.__init__ by @vmoens in #2256
[Refactor] Update all instances of exploration *Wrapper to *Module by @kurtamohler in #2298
[Refactor] Use td.transpose in multi-step transform by @vmoens in #2288
[Refactor] tensordict._tensordict -> tensordict._C by @vmoens in #2286
[Tests] Fix VMAS tests by @matteobettini in #2287
[Tests] Fix windows tests by @vmoens in #2219
[Versioning] Add python 3.12 to setup.py by @vmoens in #2282
[Versioning] Allow any torch version for local builds by @vmoens in #2130
[Versioning] Bump torch 2.0 as minimal version by @vmoens in #2200
[Versioning] v0.5 bump by @vmoens in #2267
[Versioning] windows build - add legacy back and .bat env-script by @vmoens in #2339
init by @vmoens in #2322

New Contributors

@GJBoth made their first contribution in #2126
@TheRisenPhoenix made their first contribution in #2149
@drMJ made their first contribution in #2156
@KPCOFGS made their first contribution in #2155
@software-samurai made their first contribution in #2165
@JulianKu made their first contribution in #2178
@Quinticx made their first contribution in #2204
@kurtamohler made their first contribution in #2268
@N00bcak made their first contribution in #2259
@Jendker made their first contribution in #2318

Full Changelog: v0.4.0...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0: Dynamic specs, envs with non-tensor data and replay buffer checkpointers