Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 528 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 528 Bytes

q-learning with ocbo

This code applies concepts from OCBO (Offline Contextual Bayesian Optimization) to reinforcement learning by choosing start states for each episode in a "smart" fashion: each episode starts at an "interesting" state that is expected to give high improvement, rather than choosing start states randomly.

The environment is tabular (discrete actions, discrete states), and a customizable Grid-World environment is implemented.