Introduction
I present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, reward gaming. To measure compliance with the intended safe behavior, we equip each environment with a reward function that is hidden from the agent. This is another AI project using Grid Worlds but it was developed from the @DeepMind team. My work is a simpler version of them, required for the AI module in the University.
The second doc "GridWorld-paper" has the paper published by the DeepMind team.