⚡ SheepRL 🐑

Environment	Total frames	Training time	Test reward	Paper reward	GPUs
Crafter	1M	1d 3h	12.1	11.7	1-V100
Atari-MsPacman	100K	14h	1542	1327	1-3080
Atari-Boxing	100K	14h	84	78	1-3080
DOA++(w/o optimizations)¹	7M	18d 22h	2726/3328²	N.A.	1-3080
Minecraft-Nav(w/o optimizations)	8M	16d 4h	27% >= 70 14% >= 100	N.A.	1-V100

For comparison: 1M in 2d 7h vs 1M in 1d 5h (before and after optimizations resp.)
Best leaderboard score in DIAMBRA (11/7/2023)

Benchmarks

The training times of our implementations compared to the ones of Stable Baselines3 are shown below:

		SheepRL v0.4.0	SheepRL v0.4.9	SheepRL v0.5.2 (Numpy Buffers)	SheepRL v0.5.5 (Numpy Buffers)	StableBaselines3¹
PPO	1 device	192.31s ± 1.11	138.3s ± 0.16	80.81s ± 0.68	81.27s ± 0.47	77.21s ± 0.36
PPO	2 devices	85.42s ± 2.27	59.53s ± 0.78	46.09s ± 0.59	36.88s ± 0.30	N.D.
A2C	1 device	N.D.	N.D.	N.D.	84.76s ± 0.37	84.22s ± 0.99
A2C	2 devices	N.D.	N.D.	N.D.	28.95s ± 0.75	N.D.
SAC	1 device	421.37s ± 5.27	363.74s ± 3.44	318.06s ± 4.46	320.21 ± 6.29	336.06s ± 12.26
SAC	2 devices	264.29s ± 1.81	238.88s ± 4.97	210.07s ± 27	225.95 ± 3.65	N.D.
Dreamer V1	1 device	4201.23s	N.D.	2921.38s	2207.13s	N.D.
Dreamer V2	1 device	1874.62s	N.D.	1148.1s	906.42s	N.D.
Dreamer V3	1 device	2022.99s	N.D.	1378.01s	1589.30s	N.D.

Note

All experiments have been run on 4 CPUs in Lightning Studio. All benchmarks, but the Dreamers' ones, have been run 5 times and we have taken the mean and the std of the runs. We have disabled the test function, the logging, and the checkpoints. Moreover, the models were not registered using MLFlow.

Dreamers' benchmarks have been run 1 time with logging and checkpoints, without running the test function.

The StableBaselines3 version is v2.2.1, please install the package with pip install stable-baselines3==2.2.1

What

An easy-to-use framework for reinforcement learning in PyTorch, accelerated with Lightning Fabric.
The algorithms sheeped by sheeprl out-of-the-box are:

Algorithm	Coupled	Decoupled	Recurrent	Vector obs	Pixel obs	Status
A2C	✔️	❌	❌	✔️	❌	✔️
A3C	✔️	❌	❌	✔️	❌	🚧
PPO	✔️	✔️	❌	✔️	✔️	✔️
PPO Recurrent	✔️	❌	✔️	✔️	✔️	✔️
SAC	✔️	✔️	❌	✔️	❌	✔️
SAC-AE	✔️	❌	❌	✔️	✔️	✔️
DroQ	✔️	❌	❌	✔️	❌	✔️
Dreamer-V1	✔️	❌	✔️	✔️	✔️	✔️
Dreamer-V2	✔️	❌	✔️	✔️	✔️	✔️
Dreamer-V3	✔️	❌	✔️	✔️	✔️	✔️
Plan2Explore (Dreamer V1)	✔️	❌	✔️	✔️	✔️	✔️
Plan2Explore (Dreamer V2)	✔️	❌	✔️	✔️	✔️	✔️
Plan2Explore (Dreamer V3)	✔️	❌	✔️	✔️	✔️	✔️

and more are coming soon! Open a PR if you have any particular request 🐑

The actions supported by sheeprl agents are:

Algorithm	Continuous	Discrete	Multi-Discrete
A2C	✔️	✔️	✔️
A3C	✔️	✔️	✔️
PPO	✔️	✔️	✔️
PPO Recurrent	✔️	✔️	✔️
SAC	✔️	❌	❌
SAC-AE	✔️	❌	❌
DroQ	✔️	❌	❌
Dreamer-V1	✔️	✔️	✔️
Dreamer-V2	✔️	✔️	✔️
Dreamer-V3	✔️	✔️	✔️
Plan2Explore (Dreamer V1)	✔️	✔️	✔️
Plan2Explore (Dreamer V2)	✔️	✔️	✔️
Plan2Explore (Dreamer V3)	✔️	✔️	✔️

The environments supported by sheeprl are:

Algorithm	Installation command	More info	Status
Classic Control	`pip install sheeprl`		✔️
Box2D	`pip install sheeprl[box2d]`	Please install first `swig` with `pip install swig`	✔️
Mujoco (Gymnasium)	`pip install sheeprl[mujoco]`	how_to/mujoco	✔️
Atari	`pip install sheeprl[atari]`	how_to/atari	✔️
DeepMind Control	`pip install sheeprl[dmc]`	how_to/dmc	✔️
MineRL	`pip install sheeprl[minerl]`	how_to/minerl	✔️
MineDojo	`pip install sheeprl[minedojo]`	how_to/minedojo	✔️
DIAMBRA	`pip install sheeprl[diambra]`	how_to/diambra	✔️
Crafter	`pip install sheeprl[crafter]`	https://github.com/danijar/crafter	✔️
Super Mario Bros	`pip install sheeprl[supermario]`	https://github.com/Kautenja/gym-super-mario-bros/tree/master	✔️

Why

We want to provide a framework for RL algorithms that is at the same time simple and scalable thanks to Lightning Fabric.

Moreover, in many RL repositories, the RL algorithm is tightly coupled with the environment, making it harder to extend them beyond the gym interface. We want to provide a framework that allows to easily decouple the RL algorithm from the environment, so that it can be used with any environment.

How to use it

Installation

Three options exist for installing SheepRL

Install the latest version directly from the PyPi index
Clone the repo and install the local version
pip-install the framework using the GitHub clone URL

Instructions for the three methods are shown below.

Install SheepRL from PyPi

You can install the latest version of SheepRL with

pip install sheeprl

Note

To install optional dependencies one can run for example pip install sheeprl[atari,box2d,dev,mujoco,test]

For a detailed information about all the optional dependencies you can install please have a look at the What section

Cloning and installing a local version

First, clone the repo with:

git clone https://github.com/Eclectic-Sheep/sheeprl.git
cd sheeprl

From inside the newly created folder run

pip install .

Note

To install optional dependencies one can run for example pip install .[atari,box2d,dev,mujoco,test]

Installing the framework from the GitHub repo

If you haven't already done so, create an environment with your choice of venv or conda.

The example will use Python standard's venv module and assumes macOS or Linux.

# create a virtual environment
python3 -m venv .venv

# activate the environment
source .venv/bin/activate

# if you do not wish to install extras such as mujuco, atari do
pip install "sheeprl @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with atari and mujuco environment support, do
pip install "sheeprl[atari,mujoco,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with box2d environment support, do
pip install swig
pip install "sheeprl[box2d] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with minedojo environment support, do
pip install "sheeprl[minedojo,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with minerl environment support, do
pip install "sheeprl[minerl,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with diambra environment support, do
pip install "sheeprl[diambra,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install with super mario bros environment support, do
pip install "sheeprl[supermario,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

# or, to install all extras, do
pip install swig
pip install "sheeprl[box2d,atari,mujoco,minerl,supermario,dev,test] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

Additional: installing on an M-series Mac

Caution

If you are on an M-series Mac and encounter an error attributed box2dpy during installation, you need to install SWIG using the instructions shown below.

It is recommended to use homebrew to install SWIG to support Gym.

# if needed install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# then, do
brew install swig

# then attempt to pip install with the preferred method, such as
pip install "sheeprl[atari,box2d,mujoco,dev,test] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"

Additional: MineRL and MineDojo

Note

If you want to install the minedojo or minerl environment support, Java JDK 8 is required: you can install it by following the instructions at this link.

Caution

MineRL and MineDojo environments have conflicting requirements, so DO NOT install them together with the pip install sheeprl[minerl,minedojo] command, but instead install them individually with either the command pip install sheeprl[minerl] or pip install sheeprl[minedojo] before running an experiment with the MineRL or MineDojo environment, respectively.

Run an experiment with SheepRL

Now you can use one of the already available algorithms, or create your own. For example, to train a PPO agent on the CartPole environment with only vector-like observations, just run

python sheeprl.py exp=ppo env=gym env.id=CartPole-v1

if you have installed from a cloned repo, or

sheeprl exp=ppo env=gym env.id=CartPole-v1

if you have installed SheepRL from PyPi.

Similarly, you check all the available algorithms with

python sheeprl/available_agents.py

if you have installed from a cloned repo, or

sheeprl-agents

if you have installed SheepRL from PyPi.

That's all it takes to train an agent with SheepRL! 🎉

Before you start using the SheepRL framework, it is highly recommended that you read the following instructional documents:

How to run experiments

How to modify the default configs

How to work with steps

How to select observations

Moreover, there are other useful documents in the howto folder, these documents contain some guidance on how to properly use the framework.

📈 Check your results

Once you trained an agent, a new folder called logs will be created, containing the logs of the training. You can visualize them with TensorBoard:

tensorboard --logdir logs

tensorboard.mp4

🤓 More about running an algorithm

What you run is the PPO algorithm with the default configuration. But you can also change the configuration by passing arguments to the script.

For example, in the default configuration, the number of parallel environments is 4. Let's try to change it to 8 by passing the --num_envs argument:

sheeprl exp=ppo env=gym env.id=CartPole-v1 env.num_envs=8

All the available arguments, with their descriptions, are listed in the sheeprl/config directory. You can find more information about the hierarchy of configs here.

Running with Lightning Fabric

To run the algorithm with Lightning Fabric, you need to specify the Fabric parameters through the CLI. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:

sheeprl fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2 exp=ppo env=gym env.id=CartPole-v1

You can check the available parameters for Lightning Fabric here.

Evaluate your Agents

You can easily evaluate your trained agents from checkpoints: training configurations are retrieved automatically.

sheeprl-eval checkpoint_path=/path/to/checkpoint.ckpt fabric.accelerator=gpu env.capture_video=True

For more information, check the corresponding howto.

📖 Repository structure

The repository is structured as follows:

algos: contains the implementations of the algorithms. Each algorithm is in a separate folder, and (possibly) contains the following files:
- <algorithm>.py: contains the implementation of the algorithm.
- <algorithm>_decoupled.py: contains the implementation of the decoupled version of the algorithm, if present.
- agent: optional, contains the implementation of the agent.
- loss.py: contains the implementation of the loss functions of the algorithm.
- utils.py: contains utility functions for the algorithm.
configs: contains the default configs of the algorithms.
data: contains the implementation of the data buffers.
envs: contains the implementation of the environment wrappers.
models: contains the implementation of some standard models (building blocks), like the multi-layer perceptron (MLP) or a simple convolutional network (NatureCNN)
utils: contains utility functions for the framework.

Coupled vs Decoupled

In the coupled version of an algorithm, the agent interacts with the environment and executes the training loop.

In the decoupled version, a process is responsible only for interacting with the environment, and all the other processes are responsible for executing the training loop. The two processes communicate through distributed collectives, adopting the abstraction provided by Fabric's TorchCollective.

Coupled

The algorithm is implemented in the <algorithm>.py file.

There are 2 functions inside this script:

main(): initializes all the components of the algorithm, and executes the interactions with the environment. Once enough data is collected, the training loop is executed by calling the train() function.
train(): executes the training loop. It samples a batch of data from the buffer, computes the loss, and updates the parameters of the agent.

Decoupled

The decoupled version of an algorithm is implemented in the <algorithm>_decoupled.py file.

There are 3 functions inside this script:

main(): initializes all the components of the algorithm, the collectives for the communication between the player and the trainers, and calls the player() and trainer() functions.
player(): executes the interactions with the environment. It samples an action from the policy network, executes it in the environment, and stores the transition in the buffer. After a predefined number of interactions with the environment, the player randomly splits the collected data into almost equal chunks and sends them separately to the trainers. It then waits for the trainers to finish the agent update.
trainer(): executes the training loop. It receives a chunk of data from the player, computes the loss, and updates the parameters of the agent. After the agent has been updated, the first of the trainers sends back the updated agent weights to the player, which can interact again with the environment.

Algorithms implementation

You can check inside the folder of each algorithm the README.md file for the details about the implementation.

All algorithms are kept as simple as possible, in a CleanRL fashion. But to allow for more flexibility and also more clarity, we tried to abstract away anything that is not strictly related to the training loop of the algorithm.

For example, we decided to create a models folder with already-made models that can be composed to create the model of the agent.

For each algorithm, losses are kept in a separate module, so that their implementation is clear and can be easily utilized for the decoupled or the recurrent version of the algorithm.

🗂️ Buffer

For the buffer implementation, we choose to use a wrapper around a dictionary of Numpy arrays.

To enable a simple way to work with numpy memory-mapped arrays, we implemented the sheeprl.utils.memmap.MemmapArray, a container that handles the memory-mapped arrays.

This flexibility makes it very simple to implement, with the classes ReplayBuffer, SequentialReplayBuffer, EpisodeBuffer, and EnvIndependentReplayBuffer, all the buffers needed for on-policy and off-policy algorithms.

🔍 Technical details

The shape of the Numpy arrays in the dictionary is (T, B, *), where T is the number of timesteps, B is the number of parallel environments, and * is the shape of the data.

For the ReplayBuffer to be used as a RolloutBuffer, the proper buffer_size must be specified. For example, for PPO, the buffer_size must be [T, B], where T is the number of timesteps and B is the number of parallel environments.

🙇 Contributing

The best way to contribute is by opening an issue to discuss a new feature or a bug, or by opening a PR to fix a bug or to add a new feature.

📭 Contacts

You can contact us for any further questions or discussions:

Federico Belotti: [email protected]
Davide Angioni: [email protected]
Refik Can Malli: [email protected]
Michele Milesi: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 948 Commits
.github		.github
assets/images		assets/images
benchmarks		benchmarks
docs		docs
examples		examples
howto		howto
hydra_plugins		hydra_plugins
notebooks		notebooks
sheeprl		sheeprl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
sheeprl.py		sheeprl.py
sheeprl_eval.py		sheeprl_eval.py
sheeprl_model_manager.py		sheeprl_model_manager.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ SheepRL 🐑

Benchmarks

What

Why

How to use it

Installation

Install SheepRL from PyPi

Cloning and installing a local version

Installing the framework from the GitHub repo

Additional: installing on an M-series Mac

Additional: MineRL and MineDojo

Run an experiment with SheepRL

📈 Check your results

🤓 More about running an algorithm

Running with Lightning Fabric

Evaluate your Agents

📖 Repository structure

Coupled vs Decoupled

Coupled

Decoupled

Algorithms implementation

🗂️ Buffer

🔍 Technical details

🙇 Contributing

📭 Contacts

About

Releases

Packages

Languages

License

MeritonD/sheeprl

Folders and files

Latest commit

History

Repository files navigation

⚡ SheepRL 🐑

Benchmarks

What

Why

How to use it

Installation

Install SheepRL from PyPi

Cloning and installing a local version

Installing the framework from the GitHub repo

Additional: installing on an M-series Mac

Additional: MineRL and MineDojo

Run an experiment with SheepRL

📈 Check your results

🤓 More about running an algorithm

Running with Lightning Fabric

Evaluate your Agents

📖 Repository structure

Coupled vs Decoupled

Coupled

Decoupled

Algorithms implementation

🗂️ Buffer

🔍 Technical details

🙇 Contributing

📭 Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages