This repository contains gym environments for flat/discrete POMDPs loaded from the pomdp file format.
This package is dependent on the
rl_parsers package. Install
rl_parsers
before proceeding.
This repository provides the POMDP
environment and the BatchPOMDP
wrappers.
The POMDP environment receives a path to the pomdp file, and a boolean flag indicating whether the POMDP should be considered episodic or continuing (more on this later).
All the POMDPs in the pomdps/
folder are registered under gym:
- An episodic variant under ID
POMDP-{name}-episodic-v{version}
; and - A continuing variant under ID
POMDP-{name}-continuing-v{version}
.
The reset
keyword in pomdp files (see
rl_parsers for details) denotes the
end of a sequential experience. This library uses that in two ways:
- In the episodic variant, the terminal condition has been reached and the sequential experience has concluded.
- In the continuing variant, the state is resampled from the initial distribution, and the sequential experience indefinitely.
The BatchPOMDP wrapper runs multiple independent (but synchronized) experiences at the same time, and is more efficient than running the experiences sequentially. The wrapper receives a POMDP environment and the number of experiences to run concurrently. States, actions, observations, rewards and dones are vectorized (with np.array).
NOTE: This wrapper currently only supports continuing POMDPs.
The BeliefMDP wrapper simulates the belief-MDP associated with the given POMDP, keeping track of the current belief-state and computing expected rewards.
If you use gym-pomdps
, please cite it:
@misc{baisero2019gym-pomdps,
author = {Andrea Baisero},
title = {gym-pomdps: Gym environments from {POMDP} files},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/abaisero/gym-pomdps}},
}