Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DNM: Dumb Read Parquet Implementation
This is a dumb, mostly-from-scratch implementation of read_parquet. It only supports - local and s3 - column selection - grouping partitions when we have fewer columns (+ threads!) - arrow engine/filesystem It is very broken in many ways, but ... - It's only around 100 lines of code - I get 250 MB/s bandwidth on full column reads on an m6i.xlarge (only 50 MB/s when reading columns though) See dask/dask#10602
- Loading branch information