Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More ZMQ Streamers #72

Open
cjacoby opened this issue Mar 30, 2017 · 5 comments
Open

More ZMQ Streamers #72

cjacoby opened this issue Mar 30, 2017 · 5 comments
Milestone

Comments

@cjacoby
Copy link
Collaborator

cjacoby commented Mar 30, 2017

Just some things I've been thinking about prototyping soon but I wanted to talk through them first / get your thoughts @bmcfee. This idea isn't fully fleshed out yet.

I've been doing some work with ZMQ at work lately, and I am learning that there are actually various different paradigms ZMQ is designed for. Currently, we are using only the "paired" mode (request/recv).

In particular, I am thinking about creating multiple Streamers, wrapped as a sort of "Worker" in separate python processes, streaming to one central ZMQ receiver, which supplies batches to the training process. It should be possible to have the ZMQ Workers live on other machines, as well, and therefore enabling a sort of "CloudStreamer" through AWS/GCP/etc.

I have this pyzmq example in mind. (Although it's currently unclear to me if we'd want a Queue device or a Streamer ZMQ device).

A first step might be to create something like a ZMQWorkerStreamer, which is a Streamer, but sends to the intermediate Queue-like space. Then, you have a ZMQConsumerStreamer which might operate in much the way that ZMQStream does now, except with external sources.

(This also might be along the lines of the asyncio version we're talked a little about, with the added bonus of external sources).

@bmcfee
Copy link
Collaborator

bmcfee commented Mar 30, 2017

Sounds interesting.. at this point, I don't really have the time or inclination to become an expert in ZMQ. But if you want to take a stab at some version of this (however it makes sense), I'm all for it.

@cjacoby
Copy link
Collaborator Author

cjacoby commented Mar 30, 2017

Yup, I didn't really expect you to dive in, so much as to offer any high-level API or feature or design guidance if you had any to offer. Otherwise, I'll just try to prototype a thing and submit a PR (...after I finish my taxes).

@bmcfee
Copy link
Collaborator

bmcfee commented Mar 30, 2017

I guess the one thing that comes to mind is that we should maybe think about the right distribution model here. Before diving down into this, it would be worth scoping out what kinds of problems this would be appropriate for, compared to, say, Dask, and then plan accordingly.

For my purposes, ZMQStreamer exists just to make it easy to decouple the data generator from the model fitter; distributed computation doesn't really enter into the picture.

@cjacoby
Copy link
Collaborator Author

cjacoby commented Mar 30, 2017 via email

@ejhumphrey
Copy link
Collaborator

potentially related (somewhat maybe?) tensorflow/tensorflow#8728

@bmcfee bmcfee added this to the 3.3.0 milestone Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants