We provide here an environment for a predator/prey game. We explore two methods: a simple DQN architecture as well as a true Multi-Agent algorithm architecture using a Policy Gradient approach: Multi-Agent Deep Deterministic Policy Gradient (Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (pp. 6379-6390)).
After 1400 episodes of training.
DDQN 2vs2 | MADDPG 2vs2 | DDQN 2v1 Magic Switch |
---|---|---|
Blue dots represent preys and orange dots are predators.
The action space is discrete.
Every agent can do one of none
, left
, right
, top
, bottom
.
The state is perfectly known by all the agents.
The state is the 3D coordinates (x, y, z) for every agent.