From 3ad4c4c8a926ff5cba388f5c8ae5bfb4864df681 Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Tue, 8 Nov 2022 11:33:14 -0500 Subject: [PATCH 1/7] Add env tutorial --- notebooks/env.ipynb | 683 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 683 insertions(+) create mode 100644 notebooks/env.ipynb diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb new file mode 100644 index 00000000..ae5e30cf --- /dev/null +++ b/notebooks/env.ipynb @@ -0,0 +1,683 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# About the tutorial\n", + "Datasets are essential in both supervised and unsupervised machine learning settings. In a typical reinforcement learning (RL) setting, the agent must interact with the environment in order to collect data for learning. Thus, environments serve a kind of similar function in RL as datasets do in supervised and unsupervised learning. In this tutorial, we will explain how to use RLHive environments. Note that this tutorial is on single-agent environments." + ], + "metadata": { + "id": "TiiOCUnwDlXB" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qy1F1g74qfo_" + }, + "source": [ + "# Introduction and Setup" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### What is RLHive and how to install it" + ], + "metadata": { + "id": "EJutnqZVjNTu" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xi-qOSphrCH7" + }, + "source": [ + "RLHive is a framework designed to facilitate research in RL. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments. It is designed to be readable and easily extensible, to allow users to quickly run and experiment with their own ideas. For installation, you can check [this notebook](https://colab.research.google.com/drive/11YirxgoVD7gjN02TdAeyFXOL1qH7Eydv?usp=sharing)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JWmMKQBPFoO8" + }, + "outputs": [], + "source": [ + "!pip install ruamel.yaml\n", + "!pip install pyglet\n", + "!pip install git+https://github.com/chandar-lab/RLHive.git@dev\n", + "\n", + "!python -m pip install pyvirtualdisplay\n", + "!pip3 install box2d\n", + "!sudo apt-get install xvfb" + ] + }, + { + "cell_type": "code", + "source": [ + "import torch\n", + "import hive\n", + "from hive.utils.registry import registry\n", + "from hive.envs.base import BaseEnv\n", + "from hive.envs.gym_env import GymEnv\n", + "from hive.envs.env_spec import EnvSpec\n", + "from ruamel import yaml\n", + "import sys\n", + "import os\n", + "import os.path\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from pyvirtualdisplay import Display\n", + "%matplotlib inline" + ], + "metadata": { + "id": "VKastN5fSqsP" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### How to install environments\n", + "\n", + "RLHive currently supports the following environments:\n", + "\n", + "\n", + "\n", + "* Gym classic control\n", + "* Atari\n", + "* Minatar (simplified Atari)\n", + "* Minigrid (single-agent grid world)\n", + "* Marlgrid (multi-agent)\n", + "* Pettingzoo (multi-agent)\n", + "\n", + "To install Gym, you could simply run `pip install gym==0.21.0`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[]` where `` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.\n", + "\n", + "Minatar and Marlgrid are also supported, but must be installed separately.\n", + "\n", + "* To install Minatar, run `pip install MinAtar@git+https://github.com/kenjyoung/MinAtar.git@8b39a18a60248ede15ce70142b557f3897c4e1eb`\n", + "* To install Marlgrid, run `pip install marlgrid@https://github.com/kandouss/marlgrid/archive/refs/heads/master.zip`" + ], + "metadata": { + "id": "eGZyL2zzGKEt" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install rlhive[gym_minigrid]\n", + "!pip install gym==0.21.0" + ], + "metadata": { + "id": "9-L5w2ubGJVC" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import gym\n", + "import gym_minigrid\n", + "from gym_minigrid.wrappers import ReseedWrapper\n", + "from gym.spaces.discrete import Discrete" + ], + "metadata": { + "id": "GfL7tFeASz2L" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Creating environments" + ], + "metadata": { + "id": "Bjn9nUcSnvPa" + } + }, + { + "cell_type": "markdown", + "source": [ + "Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, done, turn, info)`. All these values correspond to their canonical meanings, and `turn` corresponds to the index of the agent whose turn it is (in multi-agent environments).\n", + "\n", + "The `reward` return value can be a single number, an array, or a dictionary. If it’s a number, then that same reward will be given to every single agent. If it’s an array, the agents get the reward corresponding to their index in the runner. If it’s a dictionary, the keys should be the agent ids, and the value the reward for that agent.\n" + ], + "metadata": { + "id": "37-E7egZn8oc" + } + }, + { + "cell_type": "markdown", + "source": [ + "### `GymEnv`" + ], + "metadata": { + "id": "5ZuasJNftrIp" + } + }, + { + "cell_type": "markdown", + "source": [ + "The [OpenAI gym](https://www.gymlibrary.dev/), which provides a flexible manner of designing environments, initializing them, and interacting with them, has become well-known between RL researchers.\n", + "\n", + "If your environment is a gym environment, and you do not need to preprocess the observations generated by the environment, then you can directly use the `hive.envs.gym_env.GymEnv`." + ], + "metadata": { + "id": "Sjhcfhu9twX4" + } + }, + { + "cell_type": "code", + "source": [ + "env = GymEnv(\"CartPole-v0\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-6K303tyS6Fv", + "outputId": "1afe1892-21fc-4692-e0b3-c387c6ae136e" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/gym/envs/registration.py:594: UserWarning: \u001b[33mWARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.\u001b[0m\n", + "/usr/local/lib/python3.7/dist-packages/gym/core.py:318: DeprecationWarning: \u001b[33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\u001b[0m\n", + " def reset(self, **kwargs):\n", + "/usr/local/lib/python3.7/dist-packages/gym/wrappers/step_api_compatibility.py:40: DeprecationWarning: \u001b[33mWARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\u001b[0m\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### `EnvSpec`\n", + "\n", + "Each environment should also provide an `EnvSpec` environment that indicates what space is for the observations, action. These should be lists with one element for each agent. The agent uses this information to create its network according to provided format of valid actions and observations." + ], + "metadata": { + "id": "FdppbbQfoVRg" + } + }, + { + "cell_type": "code", + "source": [ + "env_spec = env.env_spec\n", + "obs_spec, act_spec = env_spec.observation_space[0], env_spec.action_space[0]\n", + "print(\"Environment name : \\n\", env_spec.env_name)\n", + "print(\"Environment observation space: \\n\", obs_spec)\n", + "print(\"Environment action space: \\n\", act_spec)\n", + "print(\"Environment info: \\n\", env_spec.env_info)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GlUB4vYSNfB8", + "outputId": "7c7dfac5-0fd1-486f-e889-452f80ce032e" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Environment name : \n", + " CartPole-v0\n", + "Environment observation space: \n", + " Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)\n", + "Environment action space: \n", + " Discrete(2)\n", + "Environment info: \n", + " {}\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Environment basic methods" + ], + "metadata": { + "id": "ujC05sQURorb" + } + }, + { + "cell_type": "markdown", + "source": [ + "To work with any environment, we first set `seed` (to ensure that the code is reproducible), `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function.Finally, when we're done with the environment, we can `close` it." + ], + "metadata": { + "id": "drzzPq_ISBr7" + } + }, + { + "cell_type": "code", + "source": [ + "env.seed(42)\n", + "obs, turn = env.reset()\n", + "print(\"Environment initial observation : \\n\", obs)\n", + "print(\"Environment initial turn: \\n\", turn)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mhnAQ_63R2Am", + "outputId": "55317676-57ae-4cba-bf5b-b9441a5dfb13" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Environment initial observation : \n", + " [ 0.0273956 -0.00611216 0.03585979 0.0197368 ]\n", + "Environment initial turn: \n", + " 0\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:175: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.\u001b[0m\n", + "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:191: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting.\u001b[0m\n", + "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:196: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.\u001b[0m\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "The `turn` indicats the agent ID, which is 0 in the case of a single agent setting." + ], + "metadata": { + "id": "a4JeyDM7UwaQ" + } + }, + { + "cell_type": "code", + "source": [ + "# This cell is just to show the animation in the notebook\n", + "\n", + "display = Display(visible=0, size=(1400, 900))\n", + "display.start()\n", + "\n", + "is_ipython = 'inline' in plt.get_backend()\n", + "if is_ipython:\n", + " from IPython import display\n", + "\n", + "plt.ion()" + ], + "metadata": { + "id": "I3xMRmjKkOM1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "num_steps = 100\n", + "img = plt.imshow(env.render())\n", + "\n", + "for t in range(num_steps):\n", + " obs, reward, done, turn, info = env.step(act_spec.sample()) # Random policy\n", + " img.set_data(env.render()) \n", + " plt.axis('off')\n", + " display.display(plt.gcf())\n", + " display.clear_output(wait=True)\n", + " if done:\n", + " obs, turn = env.reset()\n", + "\n", + "env.close()" + ], + "metadata": { + "id": "PAde34AbUiLG", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 248 + }, + "outputId": "71f519e9-c6f2-456e-d776-35b6ed6ac59c" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVQAAADnCAYAAABBu67aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAHHklEQVR4nO3dTW9cBxXH4TMvtpPGqZt2aI2haQlISQQUiQ1CQCRWXXZV1nyC7voNkLroMuuAEF+ghQ1SdxUVRFAJIdGqQCv1JWmoYzt2XurYmRkWFU0nMzhx+Nd3pn2enc+MNWdx9dP1+N6Z1nA4LAD+f+2mFwD4ohBUgBBBBQgRVIAQQQUI6d7lcZcAAIxrTRo6QwUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCOk2vQDcj4t/+V1d/+idkdmhYyv1+A+fbWgjEFRm1I2192rrgzdGZoNbOw1tA5/wJz9AiKAyc3auX6mbW5fH5ovL32pgG7hNUJk5O1cv1/bGxbH50hNPNbAN3CaoACGCChAiqAAhgsrMGdzaHZu12p1qtRzONMsRyMy59Nffj82Ofu10PdA73sA2cJugMnMG/fEL+NudbrXanQa2gdsEFSBEUAFCBBUgRFCZKR+vX6idq2tj81ZnroFtYJSgMlNubq3W7o3N0WGrVcvfe7qZheAzBJUvhHbXGSrNE1SAEEEFCBFUZsZwOKxbN683vQb8T4LK7BgO699/e2VsfOTRE9U9fLSBhWCUoDJThoPB2OzwwyvVXTjSwDYwSlABQgQVIERQAUIElZnR392u4aA/Nvf+KdNCUJkZG++8XjtXR78+ut2dr698+6cNbQSjBJUZMpwwa/nqE6aGIxEgRFABQgQVIERQmQmD/m5tvPP62LyzcLiq1WpgIxgnqMyE4WBQH69fHJv3Tv64uocWG9gIxgkqs63VqpYzVKaEoAKECCpAiKAyE3ZvbE687RSmiaAyE9b/db76d3xaf2f+cD30xFMNbQTjBJWZ1erM1cLSY02vAZ8SVIAQQQUIEVSAEEFl6g36t2r3xubYfP7IMR/dx1RxNDL1+jev1/o/z4/Ne6d/Uq1Ot4GNYDJBZaa57ZRpIqgAIYIKECKoACGCytS7/NZrNejvjszmFx+upePfbWgjmExQmXo7V9eqhqPfeNruztfcAw82tBFMJqgAIYIKECKoACGCylTr727XzrWNpteAeyKoTLXd61dq68IbY/Pe6TNV5S4ppougMpMWjvbcdsrUEVSAEEEFCBFUgBBBZap9vPFh1ehNUtVZeMBdUkwlQWWqrf3jj3VnUQ899NU68ug3mlkI9tAa3nGP9B32fBD268KFC/Xcc8/VYDC4p+c/+/2lOvnYwsjs/Y2d+vWfrtz1d9vtdp09e7ZWVlbua1fYw8RLTHx/BAfq2rVr9fLLL1e/37+n5/9o+el6snfq05/n2tu1trZeL73027v+bqfTqRdeeOG+d4X9ElSm2ns3Ttf26s/qkxOCYZ06+uca1qWm14KJvIfKVFvd+Xr1h/PVH85Vfzhfb279oK7sPNL0WjCRoDK1Hl95ok598zsjs0F16zev/L2hjWBvgsrUOnZou55c+rA++7/RudZ2bV3bam4p2IP3UJlaN3dvVfvqH+ryxmItHTlcx44u1Omj5+vB7nrTq8FEgsrUeuv9tfr5L35Zw/pVnTreq5PHH6nXquqDVWeoTKc9g/riiy8e1B58Sayurt7zNahVVYPhsKqG9ea7H9Wb7360r9caDAZ17ty56vV6+9wS9vb8889PnO95Yf+lS5dc2E/U22+/XWfOnNlXVO9Xp9OpV199tU6cOPG5vxZfLsvLy/u/sH95efnz2YYvrc3NzQP9HNNer+c45sD4Lz9AiKAChAgqQIigAoQIKkCIC/s5UIuLi/XMM88cyGVT7Xa7FhcXP/fXgf/yAdMA+zfx2j9/8gOECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChHTv8njrQLYA+AJwhgoQIqgAIYIKECKoACGCChAiqAAh/wHxQxtzxGpt+AAAAABJRU5ErkJggg==\n" + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Custom environment\n", + "\n", + "You can also create your own custom environment using `GymEnv`. If you need to add extra preprocessing or change the default way that environment/`EnvSpec` creation is done, you can simply subclass this class and override either `create_env()` and/or `create_env_spec()`.\n" + ], + "metadata": { + "id": "lBK7hXAnHHht" + } + }, + { + "cell_type": "code", + "source": [ + "class MiniGridEnv(GymEnv):\n", + " def __init__(self, env_name, num_players=1, seed=42, **kwargs):\n", + " super().__init__(env_name, num_players, seed=seed, **kwargs)\n", + "\n", + " def create_env(self, env_name, seed, **kwargs):\n", + " self._env = gym.make(env_name, **kwargs)\n", + " self._env = ReseedWrapper(self._env, seeds=[seed])\n", + "\n", + " def create_env_spec(self, env_name, **kwargs):\n", + " env_spec = super().create_env_spec(env_name, **kwargs)\n", + " return env_spec\n", + "\n", + " def step(self, action):\n", + " return super().step(action)" + ], + "metadata": { + "id": "Nw8WqXWfisji" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We can also create an environment from scratch inherting `hive.envs.base.BaseEnv`. For instance, in the following cell we have `GridEnv`; it is a 1$\\times$7 grid, indexed from -3 to 3 from left to right. The agent always starts in cell number 0, and at each step, it can choose to walk right (if possible), left (if possible), or stay in the current cell. The agent would be rewarded only when it is in cell 1." + ], + "metadata": { + "id": "BA3SZKwTbMGd" + } + }, + { + "cell_type": "code", + "source": [ + "class GridEnv(BaseEnv):\n", + " def __init__(self, env_name = 'GridEnv', max_steps = 20, **kwargs):\n", + " self._num_grid = 7\n", + " self._observation = 0\n", + " self._num_steps = 0\n", + " self._max_steps = max_steps\n", + "\n", + " super().__init__(self.create_env_spec(env_name, **kwargs), 1)\n", + "\n", + " def create_env_spec(self, env_name, **kwargs):\n", + " observation_spaces = [Discrete(self._num_grid, start = self._num_grid // 2)]\n", + " action_spaces = [Discrete(3, start = -1)]\n", + " return EnvSpec(\n", + " env_name=env_name,\n", + " observation_space=observation_spaces,\n", + " action_space=action_spaces,\n", + " )\n", + "\n", + " def reset(self):\n", + " self._observation = self._num_steps = 0\n", + " return self._observation, self._turn\n", + "\n", + " def step(self, action):\n", + " self._num_steps += 1\n", + "\n", + " if action == 1:\n", + " self._observation = min(self._num_grid // 2, self._observation+1)\n", + " elif action == -1:\n", + " self._observation = max(-self._num_grid // 2, self._observation-1)\n", + " \n", + " if self._observation == 1:\n", + " reward = 1\n", + " else:\n", + " reward = 0\n", + "\n", + " done = self._num_steps == self._max_steps\n", + " info = {}\n", + "\n", + " return self._observation, reward, done, self._turn, info\n", + "\n", + " def render(self):\n", + " pass\n", + " def close(self):\n", + " pass\n", + " def save(self):\n", + " pass\n", + " def seed(self):\n", + " pass\n", + " " + ], + "metadata": { + "id": "ntMi-6cmbQ18" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "env = GridEnv()\n", + "env_spec = env.env_spec\n", + "obs_spec, act_spec = env_spec.observation_space[0], env_spec.action_space[0]\n", + "print(\"Environment name : \\n\", env_spec.env_name)\n", + "print(\"Environment observation space: \\n\", obs_spec)\n", + "print(\"Environment action space: \\n\", act_spec)\n", + "print(\"Environment info: \\n\", env_spec.env_info)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wBNYTJw1xxFD", + "outputId": "6a2332cd-07da-4c44-958e-faa7d307f74f" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Environment name : \n", + " GridEnv\n", + "Environment observation space: \n", + " Discrete(7, start=3)\n", + "Environment action space: \n", + " Discrete(3, start=-1)\n", + "Environment info: \n", + " {}\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "done = False\n", + "env.reset()\n", + "\n", + "while not done:\n", + " obs, reward, done, turn, info = env.step(act_spec.sample())\n", + " print(\"Cell {}, Reward {}\".format(obs, reward))\n", + "\n", + "env.close()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9zEL32h-yTgx", + "outputId": "05a779d8-05af-496b-bdf5-b9d72413d032" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cell 1, Reward 1\n", + "Cell 1, Reward 1\n", + "Cell 1, Reward 1\n", + "Cell 1, Reward 1\n", + "Cell 1, Reward 1\n", + "Cell 2, Reward 0\n", + "Cell 2, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 2, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 2, Reward 0\n", + "Cell 2, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 2, Reward 0\n", + "Cell 1, Reward 1\n", + "Cell 2, Reward 0\n", + "Cell 3, Reward 0\n", + "Cell 2, Reward 0\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### Registering environments\n", + "The registry module `hive.utils.registry` is used to register classes in the RLHive Registry. Consider registering `GridEnv` we created before:" + ], + "metadata": { + "id": "FHr4g3ljmo8N" + } + }, + { + "cell_type": "code", + "source": [ + "registry.register(name = 'GridEnv', constructor = GridEnv, type = GridEnv)" + ], + "metadata": { + "id": "3TxnV6Zdn-XX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Also, when you're using the gym-based environments (e.g. `MiniGridEnv`), you can simply use `gym.register`:\n", + "\n" + ], + "metadata": { + "id": "eyIzVdOgdM9x" + } + }, + { + "cell_type": "code", + "source": [ + "gym.register(id = 'MyMiniGrid', entry_point = MiniGridEnv)" + ], + "metadata": { + "id": "CP3qqjPFTHQr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "More than one environment can be registered at once using the `register_all` method. Consider registering two environments `Env1` and `Env2` (inheriting `BaseEnv`):" + ], + "metadata": { + "id": "zdXJ0cOlnP1s" + } + }, + { + "cell_type": "code", + "source": [ + "class Env1(BaseEnv):\n", + " def __init__(self, env_name = 'Env1', **kwargs):\n", + " pass\n", + " def reset(self):\n", + " pass\n", + " def step(self):\n", + " pass\n", + " def render(self):\n", + " pass\n", + " def close(self):\n", + " pass\n", + " def save(self):\n", + " pass\n", + "\n", + "class Env2(BaseEnv):\n", + " def __init__(self, env_name = 'Env2', **kwargs):\n", + " pass\n", + " def reset(self):\n", + " pass\n", + " def step(self):\n", + " pass\n", + " def render(self):\n", + " pass\n", + " def close(self):\n", + " pass\n", + " def save(self):\n", + " pass" + ], + "metadata": { + "id": "XvfqR3uXmoQm" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "registry.register_all(\n", + " BaseEnv,\n", + " {\n", + " \"Env1\": Env2,\n", + " \"Env1\": Env2,\n", + " },\n", + ")" + ], + "metadata": { + "id": "PXkYMQrjnKGJ" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From b720e2b9e6a7da9c12620233f36173538a981f1a Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Tue, 8 Nov 2022 13:50:18 -0500 Subject: [PATCH 2/7] Update env tutorial --- notebooks/env.ipynb | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb index ae5e30cf..531a2d3e 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env.ipynb @@ -17,6 +17,15 @@ "accelerator": "GPU" }, "cells": [ + { + "cell_type": "markdown", + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)" + ], + "metadata": { + "id": "P6rofPFBkF8B" + } + }, { "cell_type": "markdown", "source": [ From 35822d059af0ac91055248165e34c3188ab5b047 Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Wed, 11 Jan 2023 20:41:38 -0500 Subject: [PATCH 3/7] Updated gym->gymnasium --- notebooks/env.ipynb | 252 ++++++++++++++------------------------------ 1 file changed, 78 insertions(+), 174 deletions(-) diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb index 531a2d3e..5d99256f 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env.ipynb @@ -3,9 +3,7 @@ "nbformat_minor": 0, "metadata": { "colab": { - "provenance": [], - "collapsed_sections": [], - "toc_visible": true + "provenance": [] }, "kernelspec": { "name": "python3", @@ -14,18 +12,10 @@ "language_info": { "name": "python" }, - "accelerator": "GPU" + "accelerator": "GPU", + "gpuClass": "standard" }, "cells": [ - { - "cell_type": "markdown", - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)" - ], - "metadata": { - "id": "P6rofPFBkF8B" - } - }, { "cell_type": "markdown", "source": [ @@ -48,7 +38,7 @@ { "cell_type": "markdown", "source": [ - "### What is RLHive and how to install it" + "### RLHive Installation" ], "metadata": { "id": "EJutnqZVjNTu" @@ -60,9 +50,34 @@ "id": "xi-qOSphrCH7" }, "source": [ - "RLHive is a framework designed to facilitate research in RL. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments. It is designed to be readable and easily extensible, to allow users to quickly run and experiment with their own ideas. For installation, you can check [this notebook](https://colab.research.google.com/drive/11YirxgoVD7gjN02TdAeyFXOL1qH7Eydv?usp=sharing)." + "For installation, you can check [this notebook](https://colab.research.google.com/drive/11YirxgoVD7gjN02TdAeyFXOL1qH7Eydv?usp=sharing)." ] }, + { + "cell_type": "markdown", + "source": [ + "### How to install environments\n", + "\n", + "RLHive currently supports the following environments:\n", + "\n", + "\n", + "\n", + "* Gym classic control\n", + "* Atari\n", + "* Minigrid (single-agent grid world)\n", + "* Marlgrid (multi-agent)\n", + "* Pettingzoo (multi-agent)\n", + "\n", + "To install Gym, you could simply run `pip install gym==0.26.0`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[]` where `` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.\n", + "\n", + "Marlgrid are also supported, but must be installed separately. Moreover, MinAtar could be reached directly via Gym.\n", + "\n", + "* To install Marlgrid, run `pip install marlgrid@https://github.com/kandouss/marlgrid/archive/refs/heads/master.zip`" + ], + "metadata": { + "id": "eGZyL2zzGKEt" + } + }, { "cell_type": "code", "execution_count": null, @@ -74,10 +89,8 @@ "!pip install ruamel.yaml\n", "!pip install pyglet\n", "!pip install git+https://github.com/chandar-lab/RLHive.git@dev\n", - "\n", - "!python -m pip install pyvirtualdisplay\n", - "!pip3 install box2d\n", - "!sudo apt-get install xvfb" + "!pip install gymnasium\n", + "!pip install RLHive['gym_minigrid']" ] }, { @@ -94,8 +107,6 @@ "import os\n", "import os.path\n", "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from pyvirtualdisplay import Display\n", "%matplotlib inline" ], "metadata": { @@ -104,49 +115,10 @@ "execution_count": null, "outputs": [] }, - { - "cell_type": "markdown", - "source": [ - "### How to install environments\n", - "\n", - "RLHive currently supports the following environments:\n", - "\n", - "\n", - "\n", - "* Gym classic control\n", - "* Atari\n", - "* Minatar (simplified Atari)\n", - "* Minigrid (single-agent grid world)\n", - "* Marlgrid (multi-agent)\n", - "* Pettingzoo (multi-agent)\n", - "\n", - "To install Gym, you could simply run `pip install gym==0.21.0`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[]` where `` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.\n", - "\n", - "Minatar and Marlgrid are also supported, but must be installed separately.\n", - "\n", - "* To install Minatar, run `pip install MinAtar@git+https://github.com/kenjyoung/MinAtar.git@8b39a18a60248ede15ce70142b557f3897c4e1eb`\n", - "* To install Marlgrid, run `pip install marlgrid@https://github.com/kandouss/marlgrid/archive/refs/heads/master.zip`" - ], - "metadata": { - "id": "eGZyL2zzGKEt" - } - }, - { - "cell_type": "code", - "source": [ - "!pip install rlhive[gym_minigrid]\n", - "!pip install gym==0.21.0" - ], - "metadata": { - "id": "9-L5w2ubGJVC" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "code", "source": [ - "import gym\n", + "import gymnasium as gym\n", "import gym_minigrid\n", "from gym_minigrid.wrappers import ReseedWrapper\n", "from gym.spaces.discrete import Discrete" @@ -154,7 +126,7 @@ "metadata": { "id": "GfL7tFeASz2L" }, - "execution_count": null, + "execution_count": 3, "outputs": [] }, { @@ -169,7 +141,7 @@ { "cell_type": "markdown", "source": [ - "Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, done, turn, info)`. All these values correspond to their canonical meanings, and `turn` corresponds to the index of the agent whose turn it is (in multi-agent environments).\n", + "Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, terminated, truncated, turn, info)`. The `terminated` is `True` if environment terminates, like task completion. The `truncated` is `True` if episode truncates due to a time limit or a reason that is not defined as part of the task MDP. Note that the `info` is a dictionary containing auxiliary diagnostic information for debugging, learning, and logging. For instance, it could contain individual reward terms that are combined to produce the total reward. The `turn` corresponds to the index of the agent whose turn it is (in multi-agent environments).\n", "\n", "The `reward` return value can be a single number, an array, or a dictionary. If it’s a number, then that same reward will be given to every single agent. If it’s an array, the agents get the reward corresponding to their index in the runner. If it’s a dictionary, the keys should be the agent ids, and the value the reward for that agent.\n" ], @@ -203,25 +175,10 @@ "env = GymEnv(\"CartPole-v0\")" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "-6K303tyS6Fv", - "outputId": "1afe1892-21fc-4692-e0b3-c387c6ae136e" + "id": "-6K303tyS6Fv" }, "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.7/dist-packages/gym/envs/registration.py:594: UserWarning: \u001b[33mWARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.\u001b[0m\n", - "/usr/local/lib/python3.7/dist-packages/gym/core.py:318: DeprecationWarning: \u001b[33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\u001b[0m\n", - " def reset(self, **kwargs):\n", - "/usr/local/lib/python3.7/dist-packages/gym/wrappers/step_api_compatibility.py:40: DeprecationWarning: \u001b[33mWARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.\u001b[0m\n" - ] - } - ] + "outputs": [] }, { "cell_type": "markdown", @@ -249,9 +206,9 @@ "base_uri": "https://localhost:8080/" }, "id": "GlUB4vYSNfB8", - "outputId": "7c7dfac5-0fd1-486f-e889-452f80ce032e" + "outputId": "9b017111-aae6-4133-9259-25ea2562bda6" }, - "execution_count": null, + "execution_count": 5, "outputs": [ { "output_type": "stream", @@ -281,7 +238,7 @@ { "cell_type": "markdown", "source": [ - "To work with any environment, we first set `seed` (to ensure that the code is reproducible), `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function.Finally, when we're done with the environment, we can `close` it." + "To work with any environment, we `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function.Finally, when we're done with the environment, we can `close` it." ], "metadata": { "id": "drzzPq_ISBr7" @@ -290,7 +247,6 @@ { "cell_type": "code", "source": [ - "env.seed(42)\n", "obs, turn = env.reset()\n", "print(\"Environment initial observation : \\n\", obs)\n", "print(\"Environment initial turn: \\n\", turn)" @@ -300,28 +256,19 @@ "base_uri": "https://localhost:8080/" }, "id": "mhnAQ_63R2Am", - "outputId": "55317676-57ae-4cba-bf5b-b9441a5dfb13" + "outputId": "1f1cb0af-6b47-4dea-8d9a-64e3611d210e" }, - "execution_count": null, + "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Environment initial observation : \n", - " [ 0.0273956 -0.00611216 0.03585979 0.0197368 ]\n", + " [ 0.03139541 0.0282384 -0.02267022 0.02850169]\n", "Environment initial turn: \n", " 0\n" ] - }, - { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:175: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.\u001b[0m\n", - "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:191: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting.\u001b[0m\n", - "/usr/local/lib/python3.7/dist-packages/gym/utils/passive_env_checker.py:196: UserWarning: \u001b[33mWARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.\u001b[0m\n" - ] } ] }, @@ -334,66 +281,23 @@ "id": "a4JeyDM7UwaQ" } }, - { - "cell_type": "code", - "source": [ - "# This cell is just to show the animation in the notebook\n", - "\n", - "display = Display(visible=0, size=(1400, 900))\n", - "display.start()\n", - "\n", - "is_ipython = 'inline' in plt.get_backend()\n", - "if is_ipython:\n", - " from IPython import display\n", - "\n", - "plt.ion()" - ], - "metadata": { - "id": "I3xMRmjKkOM1" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "code", "source": [ "num_steps = 100\n", - "img = plt.imshow(env.render())\n", "\n", "for t in range(num_steps):\n", - " obs, reward, done, turn, info = env.step(act_spec.sample()) # Random policy\n", - " img.set_data(env.render()) \n", - " plt.axis('off')\n", - " display.display(plt.gcf())\n", - " display.clear_output(wait=True)\n", - " if done:\n", - " obs, turn = env.reset()\n", + " obs, reward, terminated, truncated, turn, info = env.step(act_spec.sample()) # Random policy\n", + " if terminated or truncated:\n", + " break\n", "\n", "env.close()" ], "metadata": { - "id": "PAde34AbUiLG", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 248 - }, - "outputId": "71f519e9-c6f2-456e-d776-35b6ed6ac59c" + "id": "PAde34AbUiLG" }, - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVQAAADnCAYAAABBu67aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAHHklEQVR4nO3dTW9cBxXH4TMvtpPGqZt2aI2haQlISQQUiQ1CQCRWXXZV1nyC7voNkLroMuuAEF+ghQ1SdxUVRFAJIdGqQCv1JWmoYzt2XurYmRkWFU0nMzhx+Nd3pn2enc+MNWdx9dP1+N6Z1nA4LAD+f+2mFwD4ohBUgBBBBQgRVIAQQQUI6d7lcZcAAIxrTRo6QwUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCBFUgBBBBQgRVIAQQQUIEVSAEEEFCOk2vQDcj4t/+V1d/+idkdmhYyv1+A+fbWgjEFRm1I2192rrgzdGZoNbOw1tA5/wJz9AiKAyc3auX6mbW5fH5ovL32pgG7hNUJk5O1cv1/bGxbH50hNPNbAN3CaoACGCChAiqAAhgsrMGdzaHZu12p1qtRzONMsRyMy59Nffj82Ofu10PdA73sA2cJugMnMG/fEL+NudbrXanQa2gdsEFSBEUAFCBBUgRFCZKR+vX6idq2tj81ZnroFtYJSgMlNubq3W7o3N0WGrVcvfe7qZheAzBJUvhHbXGSrNE1SAEEEFCBFUZsZwOKxbN683vQb8T4LK7BgO699/e2VsfOTRE9U9fLSBhWCUoDJThoPB2OzwwyvVXTjSwDYwSlABQgQVIERQAUIElZnR392u4aA/Nvf+KdNCUJkZG++8XjtXR78+ut2dr698+6cNbQSjBJUZMpwwa/nqE6aGIxEgRFABQgQVIERQmQmD/m5tvPP62LyzcLiq1WpgIxgnqMyE4WBQH69fHJv3Tv64uocWG9gIxgkqs63VqpYzVKaEoAKECCpAiKAyE3ZvbE687RSmiaAyE9b/db76d3xaf2f+cD30xFMNbQTjBJWZ1erM1cLSY02vAZ8SVIAQQQUIEVSAEEFl6g36t2r3xubYfP7IMR/dx1RxNDL1+jev1/o/z4/Ne6d/Uq1Ot4GNYDJBZaa57ZRpIqgAIYIKECKoACGCytS7/NZrNejvjszmFx+upePfbWgjmExQmXo7V9eqhqPfeNruztfcAw82tBFMJqgAIYIKECKoACGCylTr727XzrWNpteAeyKoTLXd61dq68IbY/Pe6TNV5S4ppougMpMWjvbcdsrUEVSAEEEFCBFUgBBBZap9vPFh1ehNUtVZeMBdUkwlQWWqrf3jj3VnUQ899NU68ug3mlkI9tAa3nGP9B32fBD268KFC/Xcc8/VYDC4p+c/+/2lOvnYwsjs/Y2d+vWfrtz1d9vtdp09e7ZWVlbua1fYw8RLTHx/BAfq2rVr9fLLL1e/37+n5/9o+el6snfq05/n2tu1trZeL73027v+bqfTqRdeeOG+d4X9ElSm2ns3Ttf26s/qkxOCYZ06+uca1qWm14KJvIfKVFvd+Xr1h/PVH85Vfzhfb279oK7sPNL0WjCRoDK1Hl95ok598zsjs0F16zev/L2hjWBvgsrUOnZou55c+rA++7/RudZ2bV3bam4p2IP3UJlaN3dvVfvqH+ryxmItHTlcx44u1Omj5+vB7nrTq8FEgsrUeuv9tfr5L35Zw/pVnTreq5PHH6nXquqDVWeoTKc9g/riiy8e1B58Sayurt7zNahVVYPhsKqG9ea7H9Wb7360r9caDAZ17ty56vV6+9wS9vb8889PnO95Yf+lS5dc2E/U22+/XWfOnNlXVO9Xp9OpV199tU6cOPG5vxZfLsvLy/u/sH95efnz2YYvrc3NzQP9HNNer+c45sD4Lz9AiKAChAgqQIigAoQIKkCIC/s5UIuLi/XMM88cyGVT7Xa7FhcXP/fXgf/yAdMA+zfx2j9/8gOECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChAgqQIigAoQIKkCIoAKECCpAiKAChHTv8njrQLYA+AJwhgoQIqgAIYIKECKoACGCChAiqAAh/wHxQxtzxGpt+AAAAABJRU5ErkJggg==\n" - }, - "metadata": { - "needs_background": "light" - } - } - ] + "execution_count": 7, + "outputs": [] }, { "cell_type": "markdown", @@ -427,7 +331,7 @@ "metadata": { "id": "Nw8WqXWfisji" }, - "execution_count": null, + "execution_count": 8, "outputs": [] }, { @@ -477,10 +381,10 @@ " else:\n", " reward = 0\n", "\n", - " done = self._num_steps == self._max_steps\n", + " truncated = self._num_steps == self._max_steps\n", " info = {}\n", "\n", - " return self._observation, reward, done, self._turn, info\n", + " return self._observation, reward, False, truncated, self._turn, info\n", "\n", " def render(self):\n", " pass\n", @@ -495,7 +399,7 @@ "metadata": { "id": "ntMi-6cmbQ18" }, - "execution_count": null, + "execution_count": 9, "outputs": [] }, { @@ -510,13 +414,13 @@ "print(\"Environment info: \\n\", env_spec.env_info)" ], "metadata": { + "id": "wBNYTJw1xxFD", "colab": { "base_uri": "https://localhost:8080/" }, - "id": "wBNYTJw1xxFD", - "outputId": "6a2332cd-07da-4c44-958e-faa7d307f74f" + "outputId": "8333bf10-6c97-4ba6-c95f-cf20b5a3aac3" }, - "execution_count": null, + "execution_count": 10, "outputs": [ { "output_type": "stream", @@ -537,48 +441,48 @@ { "cell_type": "code", "source": [ - "done = False\n", + "terminated = truncated = False\n", "env.reset()\n", "\n", - "while not done:\n", - " obs, reward, done, turn, info = env.step(act_spec.sample())\n", + "while not terminated and not truncated:\n", + " obs, reward, terminated, truncated, turn, info = env.step(act_spec.sample())\n", " print(\"Cell {}, Reward {}\".format(obs, reward))\n", "\n", "env.close()" ], "metadata": { + "id": "9zEL32h-yTgx", "colab": { "base_uri": "https://localhost:8080/" }, - "id": "9zEL32h-yTgx", - "outputId": "05a779d8-05af-496b-bdf5-b9d72413d032" + "outputId": "4abfe938-a57c-450d-e774-44f44bc8ac3a" }, - "execution_count": null, + "execution_count": 12, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Cell 1, Reward 1\n", - "Cell 1, Reward 1\n", - "Cell 1, Reward 1\n", - "Cell 1, Reward 1\n", "Cell 1, Reward 1\n", "Cell 2, Reward 0\n", "Cell 2, Reward 0\n", - "Cell 3, Reward 0\n", - "Cell 3, Reward 0\n", "Cell 2, Reward 0\n", - "Cell 3, Reward 0\n", + "Cell 1, Reward 1\n", "Cell 2, Reward 0\n", "Cell 2, Reward 0\n", - "Cell 3, Reward 0\n", - "Cell 3, Reward 0\n", "Cell 2, Reward 0\n", "Cell 1, Reward 1\n", - "Cell 2, Reward 0\n", - "Cell 3, Reward 0\n", - "Cell 2, Reward 0\n" + "Cell 0, Reward 0\n", + "Cell 1, Reward 1\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell -1, Reward 0\n", + "Cell -2, Reward 0\n", + "Cell -1, Reward 0\n", + "Cell -1, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 1, Reward 1\n", + "Cell 0, Reward 0\n" ] } ] @@ -601,7 +505,7 @@ "metadata": { "id": "3TxnV6Zdn-XX" }, - "execution_count": null, + "execution_count": 13, "outputs": [] }, { @@ -622,7 +526,7 @@ "metadata": { "id": "CP3qqjPFTHQr" }, - "execution_count": null, + "execution_count": 14, "outputs": [] }, { @@ -668,7 +572,7 @@ "metadata": { "id": "XvfqR3uXmoQm" }, - "execution_count": null, + "execution_count": 15, "outputs": [] }, { @@ -685,7 +589,7 @@ "metadata": { "id": "PXkYMQrjnKGJ" }, - "execution_count": null, + "execution_count": 16, "outputs": [] } ] From e505d4908ca50223e010edfede572a55878d56a8 Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Wed, 11 Jan 2023 20:46:31 -0500 Subject: [PATCH 4/7] Updated gym->gymnasium --- notebooks/env.ipynb | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb index 5d99256f..1d2076a7 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env.ipynb @@ -16,6 +16,15 @@ "gpuClass": "standard" }, "cells": [ + { + "cell_type": "markdown", + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)" + ], + "metadata": { + "id": "scoL83VwrQAo" + } + }, { "cell_type": "markdown", "source": [ From dbe1331625d0eefe7ae1b60ab4f31e4e0d548938 Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Thu, 12 Jan 2023 17:54:22 -0500 Subject: [PATCH 5/7] Updated gym->gymnasium --- notebooks/env.ipynb | 101 +++++++++++++++++++++++++++----------------- 1 file changed, 62 insertions(+), 39 deletions(-) diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb index 1d2076a7..da0306d9 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env.ipynb @@ -77,7 +77,7 @@ "* Marlgrid (multi-agent)\n", "* Pettingzoo (multi-agent)\n", "\n", - "To install Gym, you could simply run `pip install gym==0.26.0`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[]` where `` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.\n", + "To install Gym, you could simply run `pip install gymnasium`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[]` where `` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.\n", "\n", "Marlgrid are also supported, but must be installed separately. Moreover, MinAtar could be reached directly via Gym.\n", "\n", @@ -107,6 +107,7 @@ "source": [ "import torch\n", "import hive\n", + "from hive import envs\n", "from hive.utils.registry import registry\n", "from hive.envs.base import BaseEnv\n", "from hive.envs.gym_env import GymEnv\n", @@ -121,7 +122,7 @@ "metadata": { "id": "VKastN5fSqsP" }, - "execution_count": null, + "execution_count": 16, "outputs": [] }, { @@ -135,7 +136,7 @@ "metadata": { "id": "GfL7tFeASz2L" }, - "execution_count": 3, + "execution_count": null, "outputs": [] }, { @@ -170,7 +171,7 @@ { "cell_type": "markdown", "source": [ - "The [OpenAI gym](https://www.gymlibrary.dev/), which provides a flexible manner of designing environments, initializing them, and interacting with them, has become well-known between RL researchers.\n", + "The [OpenAI gym](https://gymnasium.farama.org/), which provides a flexible manner of designing environments, initializing them, and interacting with them, has become well-known between RL researchers. \n", "\n", "If your environment is a gym environment, and you do not need to preprocess the observations generated by the environment, then you can directly use the `hive.envs.gym_env.GymEnv`." ], @@ -215,9 +216,9 @@ "base_uri": "https://localhost:8080/" }, "id": "GlUB4vYSNfB8", - "outputId": "9b017111-aae6-4133-9259-25ea2562bda6" + "outputId": "442235d1-a5da-4076-f324-39f423b773fc" }, - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -265,16 +266,16 @@ "base_uri": "https://localhost:8080/" }, "id": "mhnAQ_63R2Am", - "outputId": "1f1cb0af-6b47-4dea-8d9a-64e3611d210e" + "outputId": "b6d1e3a9-a6ad-4433-ec68-c7cd33601d2b" }, - "execution_count": 6, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Environment initial observation : \n", - " [ 0.03139541 0.0282384 -0.02267022 0.02850169]\n", + " [-0.03666259 0.02792012 -0.00758092 0.03330537]\n", "Environment initial turn: \n", " 0\n" ] @@ -305,7 +306,7 @@ "metadata": { "id": "PAde34AbUiLG" }, - "execution_count": 7, + "execution_count": null, "outputs": [] }, { @@ -340,7 +341,7 @@ "metadata": { "id": "Nw8WqXWfisji" }, - "execution_count": 8, + "execution_count": null, "outputs": [] }, { @@ -408,7 +409,7 @@ "metadata": { "id": "ntMi-6cmbQ18" }, - "execution_count": 9, + "execution_count": null, "outputs": [] }, { @@ -427,9 +428,9 @@ "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "8333bf10-6c97-4ba6-c95f-cf20b5a3aac3" + "outputId": "a2579aee-c630-4a1a-8998-5ea07991dd1e" }, - "execution_count": 10, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -464,34 +465,34 @@ "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "4abfe938-a57c-450d-e774-44f44bc8ac3a" + "outputId": "397280de-ad48-487c-bc59-d72c8e5e0ba9" }, - "execution_count": 12, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Cell 1, Reward 1\n", - "Cell 2, Reward 0\n", - "Cell 2, Reward 0\n", - "Cell 2, Reward 0\n", - "Cell 1, Reward 1\n", - "Cell 2, Reward 0\n", - "Cell 2, Reward 0\n", - "Cell 2, Reward 0\n", - "Cell 1, Reward 1\n", - "Cell 0, Reward 0\n", - "Cell 1, Reward 1\n", - "Cell 0, Reward 0\n", - "Cell 0, Reward 0\n", "Cell -1, Reward 0\n", "Cell -2, Reward 0\n", - "Cell -1, Reward 0\n", - "Cell -1, Reward 0\n", - "Cell 0, Reward 0\n", - "Cell 1, Reward 1\n", - "Cell 0, Reward 0\n" + "Cell -3, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -2, Reward 0\n", + "Cell -2, Reward 0\n", + "Cell -2, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -4, Reward 0\n", + "Cell -4, Reward 0\n", + "Cell -4, Reward 0\n", + "Cell -4, Reward 0\n", + "Cell -4, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -2, Reward 0\n", + "Cell -3, Reward 0\n", + "Cell -4, Reward 0\n" ] } ] @@ -500,7 +501,7 @@ "cell_type": "markdown", "source": [ "#### Registering environments\n", - "The registry module `hive.utils.registry` is used to register classes in the RLHive Registry. Consider registering `GridEnv` we created before:" + "You can register your environment to create it in one line. The registry module `hive.utils.registry` is used to register classes in the RLHive Registry. Consider registering `GridEnv` we created before:" ], "metadata": { "id": "FHr4g3ljmo8N" @@ -514,7 +515,29 @@ "metadata": { "id": "3TxnV6Zdn-XX" }, - "execution_count": 13, + "execution_count": 24, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Then, you can pass a config `dict` to the getter function to create an object of the environment. The configs should have two fields; The `name`, which is the name used when registering a class in the registry, and `**kwargs`, keyword arguments that will be passed to the constructor." + ], + "metadata": { + "id": "5zJEbZ_WEs-s" + } + }, + { + "cell_type": "code", + "source": [ + "environment = {'name': 'GridEnv', 'kwargs': {'env_name': 'GridEnv'}}\n", + "grid_env_fn, full_configs = envs.get_env(environment, 'environment')\n", + "grid_env = grid_env_fn()" + ], + "metadata": { + "id": "_iTWBzDHEj27" + }, + "execution_count": 25, "outputs": [] }, { @@ -535,7 +558,7 @@ "metadata": { "id": "CP3qqjPFTHQr" }, - "execution_count": 14, + "execution_count": null, "outputs": [] }, { @@ -581,7 +604,7 @@ "metadata": { "id": "XvfqR3uXmoQm" }, - "execution_count": 15, + "execution_count": null, "outputs": [] }, { @@ -598,7 +621,7 @@ "metadata": { "id": "PXkYMQrjnKGJ" }, - "execution_count": 16, + "execution_count": null, "outputs": [] } ] From 1a55fa35abf0fc4f172ab2694e956d7f544b8e0a Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Wed, 25 Jan 2023 10:44:00 -0500 Subject: [PATCH 6/7] Update environment tutorial --- notebooks/env.ipynb | 102 ++++++++++++++++++-------------------------- 1 file changed, 42 insertions(+), 60 deletions(-) diff --git a/notebooks/env.ipynb b/notebooks/env.ipynb index da0306d9..ad76b3f0 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env.ipynb @@ -3,7 +3,8 @@ "nbformat_minor": 0, "metadata": { "colab": { - "provenance": [] + "provenance": [], + "toc_visible": true }, "kernelspec": { "name": "python3", @@ -22,7 +23,7 @@ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)" ], "metadata": { - "id": "scoL83VwrQAo" + "id": "JJrPBUVPi3iQ" } }, { @@ -95,46 +96,25 @@ }, "outputs": [], "source": [ - "!pip install ruamel.yaml\n", - "!pip install pyglet\n", "!pip install git+https://github.com/chandar-lab/RLHive.git@dev\n", - "!pip install gymnasium\n", "!pip install RLHive['gym_minigrid']" ] }, { "cell_type": "code", "source": [ - "import torch\n", - "import hive\n", "from hive import envs\n", "from hive.utils.registry import registry\n", "from hive.envs.base import BaseEnv\n", "from hive.envs.gym_env import GymEnv\n", "from hive.envs.env_spec import EnvSpec\n", - "from ruamel import yaml\n", - "import sys\n", - "import os\n", - "import os.path\n", - "import numpy as np\n", - "%matplotlib inline" - ], - "metadata": { - "id": "VKastN5fSqsP" - }, - "execution_count": 16, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ "import gymnasium as gym\n", "import gym_minigrid\n", "from gym_minigrid.wrappers import ReseedWrapper\n", "from gym.spaces.discrete import Discrete" ], "metadata": { - "id": "GfL7tFeASz2L" + "id": "VKastN5fSqsP" }, "execution_count": null, "outputs": [] @@ -151,9 +131,9 @@ { "cell_type": "markdown", "source": [ - "Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, terminated, truncated, turn, info)`. The `terminated` is `True` if environment terminates, like task completion. The `truncated` is `True` if episode truncates due to a time limit or a reason that is not defined as part of the task MDP. Note that the `info` is a dictionary containing auxiliary diagnostic information for debugging, learning, and logging. For instance, it could contain individual reward terms that are combined to produce the total reward. The `turn` corresponds to the index of the agent whose turn it is (in multi-agent environments).\n", + "Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset()` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step()` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, terminated, truncated, turn, info)`. The `terminated` variable is `True` when the environment terminates (e.g., when a task is completed). The `truncated` variable is `True` if the episode is truncated due to a time limit or a reason that is not defined as part of the task MDP. Note that the `info` variable is a dictionary containing auxiliary diagnostic information for debugging, learning, and logging. For instance, it could contain individual reward terms that are combined to produce the total reward. The `turn` variable corresponds to the index of the agent whose turn it is (in multi-agent environments).\n", "\n", - "The `reward` return value can be a single number, an array, or a dictionary. If it’s a number, then that same reward will be given to every single agent. If it’s an array, the agents get the reward corresponding to their index in the runner. If it’s a dictionary, the keys should be the agent ids, and the value the reward for that agent.\n" + "The `reward` can be a single number, an array, or a dictionary. If it is a number, then that same reward will be given to every single agent. If it is an array, the agents get the reward corresponding to their index in the runner. If it is a dictionary, the keys should be the agent ids, and the value the reward for that agent." ], "metadata": { "id": "37-E7egZn8oc" @@ -216,9 +196,9 @@ "base_uri": "https://localhost:8080/" }, "id": "GlUB4vYSNfB8", - "outputId": "442235d1-a5da-4076-f324-39f423b773fc" + "outputId": "61918772-5919-4c39-bf97-dd465d25bc3b" }, - "execution_count": null, + "execution_count": 4, "outputs": [ { "output_type": "stream", @@ -248,7 +228,8 @@ { "cell_type": "markdown", "source": [ - "To work with any environment, we `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function.Finally, when we're done with the environment, we can `close` it." + "To work with any environment, we `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function. \n", + "Finally, when we're done with the environment, we can `close` it." ], "metadata": { "id": "drzzPq_ISBr7" @@ -266,16 +247,16 @@ "base_uri": "https://localhost:8080/" }, "id": "mhnAQ_63R2Am", - "outputId": "b6d1e3a9-a6ad-4433-ec68-c7cd33601d2b" + "outputId": "9fec792e-fcbe-4c54-acb7-9b82d32b0afe" }, - "execution_count": null, + "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Environment initial observation : \n", - " [-0.03666259 0.02792012 -0.00758092 0.03330537]\n", + " [-0.02180136 0.02328087 -0.01036017 0.00271058]\n", "Environment initial turn: \n", " 0\n" ] @@ -298,6 +279,7 @@ "\n", "for t in range(num_steps):\n", " obs, reward, terminated, truncated, turn, info = env.step(act_spec.sample()) # Random policy\n", + " \n", " if terminated or truncated:\n", " break\n", "\n", @@ -306,7 +288,7 @@ "metadata": { "id": "PAde34AbUiLG" }, - "execution_count": null, + "execution_count": 6, "outputs": [] }, { @@ -341,7 +323,7 @@ "metadata": { "id": "Nw8WqXWfisji" }, - "execution_count": null, + "execution_count": 7, "outputs": [] }, { @@ -409,7 +391,7 @@ "metadata": { "id": "ntMi-6cmbQ18" }, - "execution_count": null, + "execution_count": 8, "outputs": [] }, { @@ -428,9 +410,9 @@ "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "a2579aee-c630-4a1a-8998-5ea07991dd1e" + "outputId": "8c442fb6-aedc-410e-8877-f0dcc519375a" }, - "execution_count": null, + "execution_count": 9, "outputs": [ { "output_type": "stream", @@ -465,34 +447,34 @@ "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "397280de-ad48-487c-bc59-d72c8e5e0ba9" + "outputId": "7287698c-1582-4b6c-c1ab-321f9f55a449" }, - "execution_count": null, + "execution_count": 10, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Cell -1, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 1, Reward 1\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell -1, Reward 0\n", + "Cell -1, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell -1, Reward 0\n", "Cell -1, Reward 0\n", "Cell -2, Reward 0\n", "Cell -3, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -2, Reward 0\n", - "Cell -2, Reward 0\n", - "Cell -2, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -4, Reward 0\n", - "Cell -4, Reward 0\n", - "Cell -4, Reward 0\n", - "Cell -4, Reward 0\n", - "Cell -4, Reward 0\n", - "Cell -3, Reward 0\n", "Cell -2, Reward 0\n", - "Cell -3, Reward 0\n", - "Cell -4, Reward 0\n" + "Cell -1, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 0, Reward 0\n", + "Cell 1, Reward 1\n" ] } ] @@ -515,7 +497,7 @@ "metadata": { "id": "3TxnV6Zdn-XX" }, - "execution_count": 24, + "execution_count": 11, "outputs": [] }, { @@ -537,7 +519,7 @@ "metadata": { "id": "_iTWBzDHEj27" }, - "execution_count": 25, + "execution_count": 12, "outputs": [] }, { @@ -558,7 +540,7 @@ "metadata": { "id": "CP3qqjPFTHQr" }, - "execution_count": null, + "execution_count": 13, "outputs": [] }, { @@ -604,7 +586,7 @@ "metadata": { "id": "XvfqR3uXmoQm" }, - "execution_count": null, + "execution_count": 14, "outputs": [] }, { @@ -621,7 +603,7 @@ "metadata": { "id": "PXkYMQrjnKGJ" }, - "execution_count": null, + "execution_count": 15, "outputs": [] } ] From 9061564b5f9f7f628acda0fc4347b8bc83d18cf5 Mon Sep 17 00:00:00 2001 From: mrsamsami Date: Thu, 23 Feb 2023 17:06:12 -0500 Subject: [PATCH 7/7] minigrid updated --- hive/envs/minigrid/__init__.py | 1 - notebooks/{env.ipynb => env_tutorial.ipynb} | 73 ++++++++++----------- 2 files changed, 36 insertions(+), 38 deletions(-) delete mode 100644 hive/envs/minigrid/__init__.py rename notebooks/{env.ipynb => env_tutorial.ipynb} (96%) diff --git a/hive/envs/minigrid/__init__.py b/hive/envs/minigrid/__init__.py deleted file mode 100644 index d4a89b9b..00000000 --- a/hive/envs/minigrid/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from hive.envs.minigrid.minigrid import MiniGridEnv diff --git a/notebooks/env.ipynb b/notebooks/env_tutorial.ipynb similarity index 96% rename from notebooks/env.ipynb rename to notebooks/env_tutorial.ipynb index ad76b3f0..2b2093b4 100644 --- a/notebooks/env.ipynb +++ b/notebooks/env_tutorial.ipynb @@ -20,10 +20,10 @@ { "cell_type": "markdown", "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)" + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/assets/colab-badge.svg)" ], "metadata": { - "id": "JJrPBUVPi3iQ" + "id": "iBPurVluSqjt" } }, { @@ -97,7 +97,7 @@ "outputs": [], "source": [ "!pip install git+https://github.com/chandar-lab/RLHive.git@dev\n", - "!pip install RLHive['gym_minigrid']" + "!pip install RLHive['minigrid']" ] }, { @@ -109,8 +109,8 @@ "from hive.envs.gym_env import GymEnv\n", "from hive.envs.env_spec import EnvSpec\n", "import gymnasium as gym\n", - "import gym_minigrid\n", - "from gym_minigrid.wrappers import ReseedWrapper\n", + "import minigrid\n", + "from minigrid import ReseedWrapper\n", "from gym.spaces.discrete import Discrete" ], "metadata": { @@ -198,7 +198,7 @@ "id": "GlUB4vYSNfB8", "outputId": "61918772-5919-4c39-bf97-dd465d25bc3b" }, - "execution_count": 4, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -249,7 +249,7 @@ "id": "mhnAQ_63R2Am", "outputId": "9fec792e-fcbe-4c54-acb7-9b82d32b0afe" }, - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -288,7 +288,7 @@ "metadata": { "id": "PAde34AbUiLG" }, - "execution_count": 6, + "execution_count": null, "outputs": [] }, { @@ -323,7 +323,27 @@ "metadata": { "id": "Nw8WqXWfisji" }, - "execution_count": 7, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "When you're using gym-based environments, like the `MiniGridEnv`, you can conveniently register the environment for future use by calling `gym.register`:" + ], + "metadata": { + "id": "eyIzVdOgdM9x" + } + }, + { + "cell_type": "code", + "source": [ + "gym.register(id = 'MyMiniGrid', entry_point = MiniGridEnv)" + ], + "metadata": { + "id": "CP3qqjPFTHQr" + }, + "execution_count": null, "outputs": [] }, { @@ -391,7 +411,7 @@ "metadata": { "id": "ntMi-6cmbQ18" }, - "execution_count": 8, + "execution_count": null, "outputs": [] }, { @@ -412,7 +432,7 @@ }, "outputId": "8c442fb6-aedc-410e-8877-f0dcc519375a" }, - "execution_count": 9, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -449,7 +469,7 @@ }, "outputId": "7287698c-1582-4b6c-c1ab-321f9f55a449" }, - "execution_count": 10, + "execution_count": null, "outputs": [ { "output_type": "stream", @@ -497,7 +517,7 @@ "metadata": { "id": "3TxnV6Zdn-XX" }, - "execution_count": 11, + "execution_count": null, "outputs": [] }, { @@ -519,28 +539,7 @@ "metadata": { "id": "_iTWBzDHEj27" }, - "execution_count": 12, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Also, when you're using the gym-based environments (e.g. `MiniGridEnv`), you can simply use `gym.register`:\n", - "\n" - ], - "metadata": { - "id": "eyIzVdOgdM9x" - } - }, - { - "cell_type": "code", - "source": [ - "gym.register(id = 'MyMiniGrid', entry_point = MiniGridEnv)" - ], - "metadata": { - "id": "CP3qqjPFTHQr" - }, - "execution_count": 13, + "execution_count": null, "outputs": [] }, { @@ -586,7 +585,7 @@ "metadata": { "id": "XvfqR3uXmoQm" }, - "execution_count": 14, + "execution_count": null, "outputs": [] }, { @@ -603,7 +602,7 @@ "metadata": { "id": "PXkYMQrjnKGJ" }, - "execution_count": 15, + "execution_count": null, "outputs": [] } ]