-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use temporary SQLite db instead of list[dict] #45
Comments
Considering all options, looks like iteratively build a SQLite database would be the best option (or at least worth trying) |
I've done some experiments, and the temporary SQLite db seems to be the way to go. |
Writing the records out to SQLite works nicely, barely any memory consumption. Not terribly happy with the code and it now needs work to convert to Parquet, but overall this seems to be an elegant and sufficiently efficient solution. |
Tests are still fine in this state as they don't read the files. |
SQLite DBs are now converted to Parquet after creation. |
In the previous version we had array columns:
:-\ |
alto_info/alto4pandas is done now, too. |
That's a whopping 28 GB memory after reading just 22% of the data...
→ Need a more memory-efficient way to handle this.
Progress:
The text was updated successfully, but these errors were encountered: