Skip to content

Latest commit

 

History

History
46 lines (46 loc) · 2.29 KB

2008-07-09-sutton08a.md

File metadata and controls

46 lines (46 loc) · 2.29 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_editor editor bibtex_author author date note address container-title volume genre issued pdf extras
Dyna-style planning with linear function approximation and prioritized sweeping
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
sutton08a
0
Dyna-style planning with linear function approximation and prioritized sweeping
528
536
528-536
528
false
McAllester, David A. and Myllym{"a}ki, Petri
given family
David A.
McAllester
given family
Petri
Myllymäki
Sutton, Richard S. and Szepesv\'{a}ri, Csaba and Geramifard, Alborz and Bowling, Michael
given family
Richard S.
Sutton
given family
Csaba
Szepesvári
given family
Alborz
Geramifard
given family
Michael
Bowling
2008-07-09
Reissued by PMLR on 30 October 2024.
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence
R6
inproceedings
date-parts
2008
7
9