Skip to content

Latest commit

 

History

History
270 lines (204 loc) · 14.5 KB

README.rst

File metadata and controls

270 lines (204 loc) · 14.5 KB

gym-gridverse

Documentation Status

Gridworld domains for fully and partially observable reinforcement learning

Features

Customization

GridVerse is highly customizable; while many components are provided out-of-the-box, it is designed such that you can create your own components programmatically, including your own objects, starting states, transition functions, reward functions, observation functions, terminating functions, etc.

The following GridObjects are provided:

  • Floor -- An empty tile.
  • Wall -- An opaque wall.
  • Exit -- An exit tile.
  • Door -- A door which can be opened/closed.
  • Key -- An item to open a locked Door.
  • MovingObstacle -- An obstacle which moves autonomously.
  • Box -- A container of other GridObjects.
  • Telepod -- A teleporting tile.

The following transition functions are provided:

  • move_agent -- Moves the agent.
  • turn_agent -- Turns the agent.
  • pickndrop -- Lets agent pick and/or drop an object.
  • actuate_door -- Opens/closes a Door.
  • actuate_box -- Opens a Box.
  • move_obstacles -- Lets MovingObstacle objects move.
  • teleport -- Teleports the agent across the Telepods.

The following reward functions are provided:

  • reduce_sum -- A sum of other rewards
  • living_reward -- A constant reward
  • reach_exit -- A reward for reaching an Exit.
  • overlap -- A reward for standing on/off a GridObject type.
  • proportional_to_distance -- Reward based on distance from a GridObject type.
  • getting_closer -- Rewards for moving closer to/further from a GridObject type.
  • actuate_door -- Rewards for actuating a Door.
  • pickndrop -- Rewards for picking and/or dropping GridObject types.

The following observation functions are provided:

  • from_visibility -- Observability determined by custom visibility functions.
  • full_observation -- Observability which is unblocked by Walls.
  • partial_observation -- Observability which is blocked by Walls.
  • raytracing observation -- Observability determined by direct line of sight.

The following terminating functions are provided:

  • reduce_any -- Terminates if any of the given terminating functions are satisfied.
  • reduce_all -- Terminates if all of the given terminating functions are satisfied.
  • overlap -- Terminates if the agent is standing on a GridObject type.
  • reach_exit -- Terminates if the agent reaches an Exit.
  • bump_moving_obstacle -- Terminates if the agent bumps into a MovingObstacle.
  • bump_into_wall -- Terminates if the agent bumps into a Wall.

YAML Configuration Files

Aside being able to define your own environments programmatically, GridVerse allows you to create and share YAML configuration files which fully describe the components which define an environment. This is a very convenient way to create an environment made of existing components and share it with the world. The yaml/ folder contains a number of environments defined using the YAML configuration format.

Suitable for Fully/Partially Observable Control Problems for Learning/Planning

Depending on your research interests, most GridVerse components can be used to form either fully observable or partially observable control problems. Further, GridVerse environments provide both a state-ful and a functional interface, depending on whether you are addressing learning or planning problems.

Future work / in progress:

  • 100% test coverage
  • Multi-agent support
  • Benchmark performance of reinforcement learning and planning algorithms

Examples

yaml/gv_crossing.7x7.yaml

State

gv_crossing.7x7.state.gif

Observations

gv_crossing.7x7.observation.montage.gif

yaml/gv_dynamic_obstacles.7x7.yaml

State

gv_dynamic_obstacles.7x7.state.gif

Observations

gv_dynamic_obstacles.7x7.observation.montage.gif

yaml/gv_empty.8x8.yaml

State

gv_empty.8x8.state.gif

Observations

gv_empty.8x8.observation.montage.gif

yaml/gv_four_rooms.9x9.yaml

State

gv_four_rooms.9x9.state.gif

Observations

gv_four_rooms.9x9.observation.montage.gif

yaml/gv_keydoor.5x5.yaml

State

gv_keydoor.5x5.state.gif

Observations

gv_keydoor.5x5.observation.montage.gif

yaml/gv_nine_rooms.13.13.yaml

State

gv_nine_rooms.13x13.state.gif

Observations

gv_nine_rooms.13x13.observation.montage.gif

yaml/gv_teleport.7x7.yaml

State

gv_teleport.7x7.state.gif

Observations

gv_teleport.7x7.observation.montage.gif

Similar Projects

The GridVerse project takes heavy inspiration from MiniGrid, and was designed to address a few shortcomings which limited our ability to it fully:

Customization and Configurability
Our design philosophy is primarily based on user customization. We provide interfaces for you to design your own objects, state dynamics, reward functions, observability, etc. We also provide a YAML-based configuration format which will allow you to conveniently share environmens with others.
Time-Invariant Reward Functions
Our reward functions satisfy the formal time-invariance property of Markov decision processes.
Full Observability
We provide a full observability interface which satisfies the formal property of Markov decision processes.
Functional Interface
We provide a functional interface which enables the use of planning methods, e.g., MCTS, POMCP.

MiniWorld is a 3D variant similar to MiniGrid by the same authors.

While GridVerse provides functionality which we found useful and/or necessary for our needs, each project provides something which is unique compared to the others, e.g., MiniGrid includes tasks which involve natural language comprehension, and MiniWorld incorporates a whole third dimension. Make sure to browse all projects to get a clearer picture on which best suits your needs.

Project Comparison
  GridVerse MiniGrid MiniWorld
2D Environments  
3D Environments    
Partial Observability
Full Observability [1]  
RGB Observability  
Natural Language Tasks    
Customizable  
YAML-Configurable    
[1]While Minigrid provides FullyObsWrapper, which extends the agent's observation range, it does not represents true full-state observability.

Citation

If you use gym-gridverse, please cite it:

@misc{baisero2021gym-gridverse,
    author = {Andrea Baisero and Sammie Katt and Christopher Amato},
    title = {gym-gridverse: Gridworld domains for fully and partially observable reinforcement learning},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/abaisero/gym-gridverse}},
}

Credits

This package was inspired by MiniGrid, and created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.