0.2.0: Faster collection, MARL compatibility and RLHF prototype
TorchRL 0.2.0
This release provides many new features and bug fixes.
TorchRL now publishes Apple Silicon compatible wheels.
We drop coverage of python 3.7 in favour of 3.11.
New and updated algorithms
Most algorithms have been cleaned and designed to reach (at least) SOTA results.
Compatibility with MARL settings has been drastically improved, and we provide a good amount of MARL examples within the library:
A prototype RLHF training script is also proposed (#1597)
A whole new category of offline RL algorithms have been integrated: Decision transformers.
- [Algorithm] Update offpolicy examples by @BY571 in #1206
- [Algorithm] Online Decision transformer by @BY571 in #1149
- [Algorithm] QMixer loss and multiagent models by @matteobettini in #1378
- [Algorithm] RLHF end-to-end, clean by @vmoens in #1597
- [Algorithm] Update A2C examples by @albertbou92 in #1521
- [Algorithm] Update DDPG Example by @BY571 in #1525
- [Algorithm] Update DT by @BY571 in #1560
- [Algorithm] Update PPO examples by @albertbou92 in #1495
- [Algorithm] Update SAC Example by @BY571 in #1524
- [Algorithm] Update TD3 Example by @BY571 in #1523
New features
One of the major new features of the library is the introduction of the terminated / truncated / done distinction at no cost within the library. All third-party and primary environments are now compatible with this, as well as losses and data collection primitives (collector etc). This feature is also compatible with complex data structures, such as those found in MARL training pipelines.
All losses are now compatible with tensordict-free inputs, for a more generic deployment.
New transforms
Atari games can now benefit from a EndOfLifeTransform that allows to use the end-of-life as a done state in the loss (#1605)
We provide a KL transform to add a KL factor to the reward in RLHF settings.
Action masking is made possible through the ActionMask transform (#1421)
VC1 is also integrated for better image embedding.
- [Feature] Allow sequential transforms to work offline by @vmoens in #1136
- [Feature] ClipTransform + rename
min/maximum
->low/high
by @vmoens in #1500 - [Feature] End-of-life transform by @vmoens in #1605
- [Feature] KL Transform for RLHF by @vmoens in #1196
- [Features] Conv3dNet and PermuteTransform by @xmaples in #1398
- [Feature, Refactor] Scale in ToTensorImage based on the dtype and new from_int parameter by @hyerra in #1208
- [Feature] CatFrames used as inverse by @BY571 in #1321
- [Feature] Masking actions by @vmoens in #1421
- [Feature] VC1 integration by @vmoens in #1211
New models
We provide GRU alongside LSTM for POMDP training.
MARL model coverage is now richer of a MultiAgentMLP and MultiAgentCNN! Other improvments for MARL include coverage for nested keys in most places of the library (losses, data collection, environments...)/
- [Feature] Support for GRU by @vmoens in #1586
- [Feature] TanhModule by @vmoens in #1213
- [Features] Conv3dNet and PermuteTransform by @xmaples in #1398
- [Feature] CNN version of MultiAgentMLP by @MarkHaoxiang in #1479
Other features (misc)
- [Feature] RLHF Rollouts (reopened) by @vmoens in #1329
- [Feature] Add CQL by @BY571 in #1239
- [Feature] Allow multiple (nested) action, reward, done keys in
env
,vec_env
andcollectors
by @matteobettini in #1462 - [Feature] Auto-DoubleToFloat by @vmoens in #1442
- [Feature] CompositeSpec.lock by @vmoens in #1143
- [Feature] Device transform by @vmoens in #1472
- [Feature] Dispatch DiscreteSAC loss module by @Blonck in #1248
- [Feature] Dispatch PPO loss module by @Blonck in #1249
- [Feature] Dispatch REDQ loss module by @Blonck in #1251
- [Feature] Dispatch SAC loss module by @Blonck in #1244
- [Feature] Dispatch TD3 loss module by @Blonck in #1254
- [Feature] Dispatch for DDPG loss module by @Blonck in #1215
- [Feature] Dispatch for SAC loss module by @Blonck in #1223
- [Feature] Dispatch reinforce loss module by @Blonck in #1252
- [Feature] Distpatch IQL loss module by @Blonck in #1230
- [Feature] Fix DType casting lazy init by @vmoens in #1589
- [Feature] Heterogeneous Environments compatibility by @matteobettini in #1411
- [Feature] Log hparams from python dict by @matteobettini in #1517
- [Feature] MARL exploration e-greedy compatibility by @matteobettini in #1277
- [Feature] Make advantages compatible with Terminated, Truncated, Done by @vmoens in #1581
- [Feature] Make losses inherit from TDMBase by @vmoens in #1246
- [Feature] Making action masks compatible with q value modules and e-greedy by @matteobettini in #1499
- [Feature] Nested keys in
OrnsteinUhlenbeckProcess
by @matteobettini in #1305 - [Feature] Optional mapping of "state" in gym specs by @matteobettini in #1431
- [Feature] Parallel environments lazy heterogenous data compatibility by @matteobettini in #1436
- [Feature] Pettingzoo: add multiagent dimension to single agent groups by @matteobettini in #1550
- [Feature] RLHF Reward Model (reopened) by @vmoens in #1328
- [Feature] RLHF dataloading by @vmoens in #1309
- [Feature] RLHF networks by @apbard in #1319
- [Feature] Refactor categorical dists: Masked one-hot and pass-through gradients by @vmoens in #1488
- [Feature] ReplayBuffer.empty by @vmoens in #1238
- [Feature] Separate losses by @MateuszGuzek in #1240
- [Feature] Single call to value network in advantages [bis] by @vmoens in #1263
- [Feature] Single call to value network in advantages by @vmoens in #1256
- [Feature] TensorStorage by @vmoens in #1310
- [Feature] Threaded collection and parallel envs by @vmoens in #1559
- [Feature] Unbind specs by @vmoens in #1555
- [Feature] VMAS obs dict by @matteobettini in #1419
- [Feature] VMAS: choose between categorical or one-hot actions by @matteobettini in #1484
- [Feature] dispatch for DQNLoss by @vmoens in #1194
- [Feature] log histograms by @vmoens in #1306
- [Feature] make csv logger
exist_ok
on logging folder by @matteobettini in #1561 - [Feature] shifted for all adv by @vmoens in #1276
New environments and third-party improvements
We now cover SMAC-v2, PettingZoo, IsaacGymEnvs (prototype) and RoboHive. The D4RL dataset can now be used without the eponym library, which permit training with more recent or older versions of gym.
- [Environment, Docs] SMACv2 and docs on action masking by @matteobettini in #1466
- [Environment] Petting zoo by @matteobettini in #1471
- [Feature] D4rl direct download by @MateuszGuzek in #1430
- [Feature] Gym 'vectorized' envs compatibility by @vmoens in #1519
- [Feature] Gym compatibility: Terminal and truncated by @vmoens in #1539
- [Feature] IsaacGymEnvs integration by @vmoens in #1443
- [Feature] RoboHive integration by @vmoens in #1119
Performance improvements
We provide several speed improvements, in particular for data collection.
- [Performance] Accelerate GAE by @Blonck in #1142
- [Performance] Accelerate TD lambda return estimate by @Blonck in #1158
- [Performance] Accelerate
_split_and_pad_sequence
by @Blonck in #1147 - [Performance] Faster GAE by @vmoens in #1153
- [Performance] Faster losses by @vmoens in #1272
- [Performance] Improve performance and streamline the generating of the gammalambda tensor by @Blonck in #1171
- [Performance] Miscellaneous efficiency improvements by @vmoens in #1513
- [Performance] Reduce key accessing in transforms by @matteobettini in #1590
- [Performance] Some efficiency improvements by @vmoens in #1250
- [Performance] Vmas vectorized reset by @matteobettini in #1146
Bug fixes
- [BugFIx] Fix entropy signature in truncated normal by @vmoens in #1536
- [BugFix,CI] Fix virtualenv not found by @vmoens in #1280
- [BugFix] Add
torch.no_grad()
for rendering in multiagent PPO tutorial by @matteobettini in #1511 - [BugFix] Batched envs compatibility with custom keys by @matteobettini in #1348
- [BugFix] C++17 by @vmoens in #1169
- [BugFix] Check env specs for nested envs by @matteobettini in #1332
- [BugFix] CompositeSpec.unsqueeze by @btx0424 in #1464
- [BugFix] DDPG select also critic input for actor loss by @matteobettini in #1563
- [BugFix] DQN loss dispatch respect configured tensordict keys by @Blonck in #1285
- [BugFix] Discrete SAC rewrite by @matteobettini in #1461
- [BugFix] Empty-spec tolerance by @vmoens in #1501
- [BugFix] Fix Brax reset by @vmoens in #1195
- [BugFix] Fix CatFrames by @vmoens in #1336
- [BugFix] Fix ClipTransform device by @vmoens in #1508
- [BugFix] Fix Cython for D4RL by @vmoens in #1429
- [BugFix] Fix DDPG by @vmoens in #1183
- [BugFix] Fix DDPG squeezing by @matteobettini in #1487
- [BugFix] Fix Dreamer test error by @vmoens in #1558
- [BugFix] Fix Gym Categorical/One-hot issues by @vmoens in #1482
- [BugFix] Fix KL import errors by @vmoens in #1207
- [BugFix] Fix KLTransform execution with LSTM by @vmoens in #1426
- [BugFix] Fix KeyError in inverse transform replay buffer by @BY571 in #1165
- [BugFix] Fix LSTM - VecEnv compatibility by @vmoens in #1427
- [BugFix] Fix LSTM use with padded/masked segments by @smorad in #1399
- [BugFix] Fix NoopResetEnv behavior when trials exceeded. by @skandermoalla in #1477
- [BugFix] Fix QValueModule multi_one_hot by @smorad in #1439
- [BugFix] Fix RLHF tests - transformers v4.34 by @vmoens in #1601
- [BugFix] Fix RewardSum spec transform to mimic reward spec by @matteobettini in #1478
- [BugFix] Fix SAC alpha optim by @vmoens in #1192
- [BugFix] Fix SAC by @vmoens in #1189
- [BugFix] Fix SAC by @vmoens in #1190
- [BugFix] Fix SACv2 by @vmoens in #1191
- [BugFix] Fix SMAC-v2 by @vmoens in #1538
- [BugFix] Fix TD3 and compat with pytorch/tensordict#482 by @vmoens in #1375
- [BugFix] Fix TD3 inplace updates by @vmoens in #1219
- [BugFix] Fix TD3 target net by @vmoens in #1186
- [BugFix] Fix
LazyStackedCompositeSpec
and introducingconsolidate_spec
by @matteobettini in #1392 - [BugFix] Fix
step_mdp()
by @matteobettini in #1334 - [BugFix] Fix action mask test by @vmoens in #1492
- [BugFix] Fix brax by @vmoens in #1346
- [BugFix] Fix bug in ppo example config by @degensean in #1396
- [BugFix] Fix envpool by @vmoens in #1530
- [BugFix] Fix error message of .set_keys() in advantage modules by @Blonck in #1218
- [BugFix] Fix examples by @vmoens in #1173
- [BugFix] Fix locked params modif by @vmoens in #1307
- [BugFix] Fix max length by @vmoens in #1233
- [BugFix] Fix missing ("next", "observation") key in dispatch of losses by @Blonck in #1235
- [BugFix] Fix nested CompositeSpec creation by @vmoens in #1261
- [BugFix] Fix nightly tensordict dependency by @skandermoalla in #1302
- [BugFix] Fix ppo example by @vmoens in #1225
- [BugFix] Fix ppo training NaN occurences by @vmoens in #1403
- [BugFix] Fix reward sum within parallel envs by @vmoens in #1454
- [BugFix] Fix run_type_checks by @vmoens in #1570
- [BugFix] Fix safe tanh for older torch versions by @vmoens in #1220
- [BugFix] Fix serialization of parallel envs by @vmoens in #1197
- [BugFix] Fix split_trajs by @vmoens in #1444
- [BugFix] Fix tanh/atanh vmap compatibility by @vmoens in #1217
- [BugFix] Fix the bug of
RoundRobinWriter.extend(data)
by @xmaples in #1295 - [BugFix] Fix tutorials by @vmoens in #1382
- [BugFix] Fix typo in CatFrames Transform error message. by @skandermoalla in #1491
- [BugFix] Fix vmap in VmapModule (torch 1.13 compat) by @vmoens in #1350
- [BugFix] Improve collector buffer initialisation when policy spec is unavailable by @matteobettini in #1547
- [BugFix] Instantiate 2 losses with different keys by @matteobettini in #1553
- [BugFix] KL module integration by @vmoens in #1212
- [BugFix] Key selection in batched envs by @vmoens in #1253
- [BugFix] Load collector frames and iter by @matteobettini in #1557
- [BugFix] Make VecNorm Transform pickable by @albertbou92 in #1596
- [BugFix] Minor fixes PPO / A2C examples by @albertbou92 in #1591
- [BugFix] Multiagent "auto" entropy fix in SAC by @matteobettini in #1494
- [BugFix] Nested envs compatibility by @matteobettini in #1347
- [BugFix] Nested key in replay buffer by @matteobettini in #1485
- [BugFix] Nested keys in transforms by @matteobettini in #1355
- [BugFix] Nested keys to probabilistic modules by @matteobettini in #1363
- [BugFix] Parametric
rand_action()
inBaseEnv
by @matteobettini in #1267 - [BugFix] Parametric collectors by @matteobettini in #1303
- [BugFix] Patch SAC to allow state_dict manipulation before exec by @vmoens in #1607
- [BugFix] PettingZoo seeding by @matteobettini in #1554
- [BugFix] Pickable buffer by @albertbou92 in #1410
- [BugFix] QValue modules and nested action by @matteobettini in #1351
- [BugFix] Reward sum custom key by @matteobettini in #1413
- [BugFix] SafeModule not safely handling specs by @matteobettini in #1352
- [BugFix] Small patches to SMAC by @matteobettini in #1533
- [BugFix] Sparse info in SMACv2 by @matteobettini in #1546
- [BugFix] ToTensorImage unsqueeze would not update the observation spec by @hyerra in #1161
- [BugFix] Torch 1.13 compat by @vmoens in #1294
- [BugFix] Unbreak tensordict import by @vmoens in #1231
- [BugFix] Vectorized priority update in replay buffers by @matteobettini in #1598
- [BugFix] _transpose_time with single dim by @vmoens in #1155
- [BugFix]
RewardSum
transform for multiple reward keys by @matteobettini in #1544 - [BugFix]
step_mdp
nested keys by @matteobettini in #1339 - [BugFix] include buffers in policy_weights by @vmoens in #1185
- [BugFix] load_state_dict in param updates for collectors by @vmoens in #1145
- [BugFix] make value estimator with value_key from the PPOLoss init arg by @xmaples in #1144
- [BugFix] unlock in tensordictmodules tests by @vmoens in #1417
- [BugFix] valid_size not saved as attribute by @tcbegley in #1337
Miscellaneous
- Envpool Tests to Nova by @osalpekar in #1283
- Fix CI by @matteobettini in #1368
- Fix MacOS Mujoco Failure by @osalpekar in #1450
- Linux GPU Brax Unittests by @osalpekar in #1133
- Linux Gym Unittests to GHA by @osalpekar in #1139
- Linux Olddeps tests to Nova by @osalpekar in #1289
- Move to More Efficient Windows Runner by @osalpekar in #1476
- OptDeps Tests to Nova by @osalpekar in #1290
- Remove Distributed CCI job by @osalpekar in #1374
- Remove Envpool from CCI by @osalpekar in #1390
- Remove old CircleCI Lint by @osalpekar in #1134
- Removing Migrated and Unused CCI jobs by @osalpekar in #1288
- Revert "[Feature] Single call to value network in advantages" by @vmoens in #1262
- Revert "[Refactor,Performance] Faster collectors" by @vmoens in #1330
- Sklearn test to Nova by @osalpekar in #1291
- Windows Unittests on GHA by @osalpekar in #1086
- [Benchmark,CI] Benchmarks in PR (pre) by @vmoens in #1342
- [Benchmark,CI] Benchmarks in PR by @vmoens in #1341
- [Benchmark] Benchmark Gym vs TorchRL by @vmoens in #1602
- [Benchmark] Benchmark losses by @vmoens in #1287
- [Benchmark] Benchmark number GPU vectorised environments in VMAS (TorchRL vs RLlib) by @matteobettini in #1446
- [Benchmark] Improve benchmark precision + step_mdp + fix GPU by @vmoens in #1340
- [CI] Add macOS M1 binaries Wheels by @DanilBaibak in #1504
- [CI] Add ninja for MacOS builts by @vmoens in #1564
- [CI] Concurrency on gha by @vmoens in #1152
- [CI] Deprecate Windows GPU CCI by @osalpekar in #1387
- [CI] Doc CI fix by @matteobettini in #1384
- [CI] Fix CI PettingZoo by @matteobettini in #1528
- [CI] Fix CI by @vmoens in #1529
- [CI] Fix GHA gpu tests by @vmoens in #1356
- [CI] Fix Jax version in Jumanji by @vmoens in #1242
- [CI] Fix Mujoco version by @vmoens in #1475
- [CI] Fix RoboHive CI by @vmoens in #1541
- [CI] Fix brax and habitat by @vmoens in #1353
- [CI] Fix examples CI by @matteobettini in #1489
- [CI] Fix failing jobs by @vmoens in #1318
- [CI] Fix failing jobs by @vmoens in #1335
- [CI] Fix habitat CI by @vmoens in #1537
- [CI] Fix jumanji by @vmoens in #1566
- [CI] Fix nightly build dependency on tensordict by @vmoens in #1300
- [CI] Fix opt deps machine and docker by @vmoens in #1362
- [CI] Fix tuto deps by @matteobettini in #1416
- [CI] Fix wheels by @vmoens in #1301
- [CI] Less old deps by @vmoens in #1255
- [CI] Less warnings in CI (costs) by @vmoens in #1349
- [CI] Merge Distributed and Linux GPU job by @osalpekar in #1182
- [CI] Migrate examples by @vmoens in #1364
- [CI] Move linux stable to GHA by @vmoens in #1503
- [CI] Reduce CI time by @vmoens in #1226
- [CI] Remove CCI Config by @osalpekar in #1456
- [CI] Remove examples from CCI by @vmoens in #1367
- [CI] Update cuda version by @vmoens in #1380
- [CI] Windows GPU Tests by @osalpekar in #1386
- [Doc] Add link to paper in readme by @giadefa in #1298
- [Doc] Add paper refs in doc and KB by @vmoens in #1241
- [Doc] CITATION.cff by @vmoens in #1229
- [Doc] Do not clean gh-pages by @vmoens in #1150
- [Doc] Fix GPU benchmark by @vmoens in #1151
- [Doc] Fix advantage examples by @vmoens in #1600
- [Doc] Fix default value of
tanh_loc
in the documentation ofTruncatedNormal
. by @skandermoalla in #1205 - [Doc] Fix doctest examples by @degensean in #1393
- [Doc] Fix exploration modules docstrings by @vmoens in #1326
- [Doc] Fix tanh_loc in docstrings by @vmoens in #1203
- [Doc] TorchRL Logo by @vmoens in #1234
- [Doc] Update citation by @vmoens in #1228
- [Doc] Update coding_ppo.py by @kushaangupta in #1483
- [Doc] correct typos in pendulum tutorial by @kushaangupta in #1502
- [Doc] fixed typos in ppo tutorial by @MatteoGaetzner in #1314
- [Docs] Fix multi-agent tutorial by @matteobettini in #1599
- [Docs] Multi-agent environments by @matteobettini in #1383
- [Example] Multiagent examples: MAPPO-IPPO-MADDPG-IDDPG-IQL-QMIX-VDN by @matteobettini in #1027
- [Fix] Remove loss device by @matteobettini in #1395
- [Lint] Add TorchFix linter by @kit1980 in #1580
- [Minor] Capture error in CatFrame edit by @vmoens in #1498
- [Minor] Fix prints by @vmoens in #1257
- [Minor] Fix typo by @vmoens in #1193
- [Minor] Missing commit from #1488 by @vmoens in #1490
- [Minor] Missing lint by @vmoens in #1556
- [Minor] More efficient SAC v1 by @vmoens in #1507
- [Minor] Remove ya gymnasium deprecation warning in vectorized envs by @vmoens in #1573
- [Minor] small fixes by @vmoens in #1237
- [Nova] Jumanji Tests to GHA by @osalpekar in #1282
- [Nova] Remove windows Unittests from CCI by @osalpekar in #1159
- [Nova] Removing CircleCI Gym Unittests by @osalpekar in #1179
- [Nova] Vmas Tests to GHA by @osalpekar in #1284
- [Quality] Filter out warnings in subprocs by @vmoens in #1552
- [Refacto] Migration due to tensordict 473 and 474 by @vmoens in #1354
- [Refactor,Performance] Faster collectors (bis) by @vmoens in #1331
- [Refactor,Performance] Faster collectors by @vmoens in #1327
- [Refactor] Better GymLikeEnv by @vmoens in #1168
- [Refactor] Better batch-size handling by RBs by @vmoens in #1311
- [Refactor] Better updaters by @vmoens in #1184
- [Refactor] Change objectives parameter/buffer/target logic by @vmoens in #1424
- [Refactor] Edit ppo params by @vmoens in #1322
- [Refactor] Expose all wrappers in torchrl.envs by @vmoens in #1532
- [Refactor] Faster envs (2) by @vmoens in #1457
- [Refactor] Fix imports by @vmoens in #1551
- [Refactor] Follow-up on tensordict PR 473 by @vmoens in #1361
- [Refactor] More unravel fixes by @vmoens in #1357
- [Refactor] Nested reward and done specs by @vmoens in #1115
- [Refactor] Refactor DDPG loss in standalone methods by @vmoens in #1603
- [Refactor] Refactor _reset in ParallelEnv by @vmoens in #1172
- [Refactor] Refactor losses for generalization by @vmoens in #1286
- [Refactor] Remove pkg_resources import by @vmoens in #1379
- [Refactor] Remove private calls to _set by @vmoens in #1370
- [Refactor] Shape ops in LSTM based on tensor shape, not tensordict by @vmoens in #1170
- [Refactor] Use _set_tuple for faster set by @vmoens in #1372
- [Refactor] Use
wait
instead ofis_set
to get results in ParallelEnv by @vmoens in #1562 - [Refactor] Use masking in collectors by @vmoens in #1412
- [Refactor] Vmas nested by @matteobettini in #1366
- [Refactor] the usage of tensordict keys in loss modules by @Blonck in #1175
- [Setup] Update setup.py python versions by @vmoens in #1496
- [Test,BugFix] Fix Jax backend tests by @vmoens in #1162
- [Test,CI,Feature] Total time per test by @vmoens in #1232
- [Test] Remove import of test class by @matteobettini in #1549
- [Test] Skip tests in python 3.11 by @vmoens in #1535
- [Test] Skip threading tests in OSX by @vmoens in #1571
- [Test] Test split trajs by @vmoens in #1445
- [Test] Test state_dict and loss modules by @vmoens in #1527
- [Tests] Collector compatibility for heterogeneous environments by @matteobettini in #1414
- [Tests] DDPG extra critic input tests by @matteobettini in #1568
- [Tutorial] Multiagent PPO tutorial by @matteobettini in #1385
- [Versioning] Python 3.11 by @vmoens in #1433
- [Versioning] Use python 3.8 for GPU tests by @vmoens in #1577
- [Versioning] Write version all cases in setup.py by @vmoens in #1579
- d4rl Test to Nova by @osalpekar in #1293
- python 3.11 in README by @vmoens in #1434
New Contributors
- @Blonck made their first contribution in #1142
- @hyerra made their first contribution in #1161
- @skandermoalla made their first contribution in #1205
- @giadefa made their first contribution in #1298
- @MatteoGaetzner made their first contribution in #1314
- @MateuszGuzek made their first contribution in #1240
- @degensean made their first contribution in #1393
- @smorad made their first contribution in #1399
- @kushaangupta made their first contribution in #1483
- @kit1980 made their first contribution in #1580
- @MarkHaoxiang made their first contribution in #1479
- @DanilBaibak made their first contribution in #1504
A great THANKS to our contributors, in particular (but not in any particular order) @skandermoalla, @matteobettini, @BY571 and @albertbou92 for their tremendous dedication.
Full Changelog: v0.1.1...v0.2.0