StochasticMux Process Pool #160

beasteers · 2020-07-13T13:05:48Z

It would be super convenient if you could create a process pool with StochasticMux so that multiple files can be processed simultaneously with a fixed number of processes. I know you can wrap each file streamer in a ZMQStreamer or wrap StochasticMux in one, but in the first case, you'd be spawning like len(files) * epoch_size processes with the overhead of starting a fresh process each time, and in the second the entire mux is in the single separate process and is still loaded sequentially.

I suppose this might(?) get complicated with sharding+sampling and files so I'm assuming that's why it hasn't been implemented.

And if I'm missing something and this is already simple to do lmk!

The text was updated successfully, but these errors were encountered:

bmcfee · 2020-07-13T13:12:33Z

Yeah, we've thought a lot about this kind of thing.

There's a slightly more fundamental issue here, which is that streamers are, at the end of the day, generators, and will therefore block until their current output is consumed. This means that even if you have a pool of streamers in parallel, each one will be idle most of the time, even if it lives in its own process.

I think the best way to resolve this is to first go asynchronous #30, and then once that works, go parallel with the async generators. Making this work efficiently will probably also require some kind of buffering structure, eg so that a generator can spin for up to some fixed number of outputs (to be collected later) until it blocks.

beasteers · 2020-07-13T13:40:46Z

Hmm yeah I've never had much luck with asyncio. It just seems like you need to rewrite everything using async modules which seems quite daunting. (Like I don't understand how you would use librosa.load asynchronously?). But I can see how it could be the better option.

Maybe this won't work, but from a quick poke at how ZMQStreamer is implemented, you might be able to do it if you had a pool equal to the size of the number of active streamers and when one of the active streamer is exhausted and you need to activate a new streamer you pool.submit it as a new ZMQ worker.

(This is under the assumption that zmq.socket.send_multipart is non-blocking, but idk)

And if it is blocking, maybe you could have a thread that collects the arrays in a fixed length queue that blocks when it's full (buffering structure u mentioned).

(Also,,,, does keras work with asyncio generators? because I feel like it's probably not implementing an event loop, but idk maybe it is)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StochasticMux Process Pool #160

StochasticMux Process Pool #160

beasteers commented Jul 13, 2020

bmcfee commented Jul 13, 2020

beasteers commented Jul 13, 2020 •

edited

Loading

StochasticMux Process Pool #160

StochasticMux Process Pool #160

Comments

beasteers commented Jul 13, 2020

bmcfee commented Jul 13, 2020

beasteers commented Jul 13, 2020 • edited Loading

beasteers commented Jul 13, 2020 •

edited

Loading