-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate workers for parsing and database insertions #137
Comments
The apply/reducer architecture, which has been used by the snorkel-extraction project, may be used here too. |
The Snorkel team has discussed multithreaded reduce at snorkel-team/snorkel#562. |
This is a great question! Is there a way to use the other package to do this instead of building our own code? |
We could use (Py)Spark, Dask, etc. for distributed computing but the bottleneck would be the data persistence layer, i.e., PostgreSQL. One idea is to use different appliers for different storage backends: one for in-memory, another for PostgreSQL, one another for Hive, etc. |
That's one idea! I think it would be better to modularize so we can 1) have better support for distributed computing from other parties (e.g., PySpark, Dask ); 2) easy to extend to other data layers. |
Is your feature request related to a problem? Please describe.
Decouple
UDF
processes from the backend/database session.Right now, when we run
UDFRunner.apply_mt()
, we create a number ofUDF
worker processes. These processes all own an sqlalchemySession
object and add/commit to the database at the end of their respective parsing loop.Describe the solution you'd like
Make the UDF processes backend-agnostic, e.g. by having a set of separate
BackendWorker
processes handle the insertion of sentences. One possible way: Connect the output_queue ofUDF
to the input ofBackendWorker
, which receiveSentence
lists and handle the sqlalchemy commits.This will not fully decouple
UDF
from the backend, because the parser returns sqlalchemy-specificSentence
objects, but it could be one step towards that goal.Additional context
This feature request refers to decoupling of parsing and backend.
There's likely more coupling with the backend later in the processing pipeline.
The text was updated successfully, but these errors were encountered: