-
Notifications
You must be signed in to change notification settings - Fork 113
Using more than 2 GPUs with Polybeast #31
Comments
That's a great question, and doing so is not currently easy. One way to go about this would be to add PyTorch In practise, we've used our GPU fleet to run more experiments in parallel instead. Later this year we also hope to share an update to TorchBeast that allows using more GPUs for a single learner, but it isn't quite ready yet. |
Hi, Thanks for the answer. After my colleague posted this question, we actually managed to add Having done that, we are looking for other ways to speed up training further, and would greatly appreciate feedback from your end on the following ideas:
Thanks a lot for your time. |
Hey @ege-k, That's great and we're more than happy to have a pull request for this. As for your questions:
Distribution the actor model: I'd assume a ratio of 1:3 or 1:4 for actor GPUs to learner GPUs is ideal in a typical setting. Once you want to use many more learner GPUs, distributing the actor model makes sense. This could be done by having different learners with different addresses and telling each actor which one to use. Dynamic batching would still happen, but only on that learner. Your third question is the hardest one. Unfortunately, in RL often "everything depends on everything" so I cannot rule out that the number of actors influences the learning dynamics and therefore also changes the optimal hyperparameter setting. It certainly would if you also change batch sizes, which is likely required in order to find the best throughput. I don't think I know of a better way than to try various settings -- aiming to slightly overshoot as modern Linux kernels are quite efficient around thread/process scheduling and thus not a lot of waste is generated by the context switching. As for the second part of your question: TorchBeast can be fully distributed if you find a way to tell each node the names of its peers. E.g., if you know about your setup and have fixed IP addresses, you could hardcode them. Often, that's not the case and you'll need some other means of communicating the names/addresses. E.g., you could use a shared file system (lots of issues around that, but it can work "in practise"), or a real name service, or a lock service on top of something like like etcd. BTW, we are working on somewhat alternative designs currently and might have an update on that in a few weeks. Feel free to drop me an email if you would like to get an early idea of what we want to do. |
Hey everyone! I've since left Facebook, but my amazing colleagues are have written https://github.com/facebookresearch/moolib, which you should check out for a multi-GPU IMPALA. |
Hi,
Firstly, thanks for the repository!
As far as our understanding goes, IMPALA can be distributed across more than 2 GPUs. The example you have in the repo uses up to 2 GPUs. We have access to more GPUs in a single machine and want to utilize all in order to get the maximal throughput. What would be the best way to do it (more learners etc.) and what do we have to add/change to the code?
The text was updated successfully, but these errors were encountered: