title | section | openreview | abstract | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions |
Poster |
0I3su3mkuL |
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
chebotar23a |
0 |
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions |
3909 |
3928 |
3909-3928 |
3909 |
false |
Chebotar, Yevgen and Vuong, Quan and Hausman, Karol and Xia, Fei and Lu, Yao and Irpan, Alex and Kumar, Aviral and Yu, Tianhe and Herzog, Alexander and Pertsch, Karl and Gopalakrishnan, Keerthana and Ibarz, Julian and Nachum, Ofir and Sontakke, Sumedh Anand and Salazar, Grecia and Tran, Huong T. and Peralta, Jodilyn and Tan, Clayton and Manjunath, Deeksha and Singh, Jaspiar and Zitkovich, Brianna and Jackson, Tomas and Rao, Kanishka and Finn, Chelsea and Levine, Sergey |
|
2023-12-02 |
Proceedings of The 7th Conference on Robot Learning |
229 |
inproceedings |
|