Skip to content

Latest commit

 

History

History
59 lines (59 loc) · 2.44 KB

2024-06-11-kolev24a.md

File metadata and controls

59 lines (59 loc) · 2.44 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Efficient imitation learning with conservative world models
We tackle the problem of policy learning from expert demonstrations without a reward function. A central challenge in this space is that these policies fail upon deployment due to issues of distributional shift, environment stochasticity, or compounding errors. Adversarial imitation learning alleviates this issue but requires additional on-policy training samples for stability, which presents a challenge in realistic domains due to inefficient learning and high sample complexity. One approach to this issue is to learn a world model of the environment, and use synthetic data for policy training. While successful in prior works, we argue that this is sub-optimal due to additional distribution shifts between the learned model and the real environment. Instead, we re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one. Drawing theoretical connections to offline RL and fine-tuning algorithms, we argue that standard online world model algorithms are not well suited to the imitation learning problem. We derive a principled conservative optimization bound and demonstrate empirically that it leads to improved performance on two very challenging manipulation environments form high-dimensional raw pixel observations. We set a new state-of-the-art performance on the Franka Kitchen environment from images, requiring only 10 demos on no reward labels, as well as solving a complex dexterity manipulation task.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
kolev24a
0
Efficient imitation learning with conservative world models
1777
1790
1777-1790
1777
false
Kolev, Victor and Rafailov, Rafael and Hatch, Kyle and Wu, Jiajun and Finn, Chelsea
given family
Victor
Kolev
given family
Rafael
Rafailov
given family
Kyle
Hatch
given family
Jiajun
Wu
given family
Chelsea
Finn
2024-06-11
Proceedings of the 6th Annual Learning for Dynamics & Control Conference
242
inproceedings
date-parts
2024
6
11