-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a to_cudf method for reading directly into GPU memory #17
Comments
I could see it either way, as an argument to to_pandas (and/or to_dask), or as its own method. How many of the sources do you think it would apply to? I know cuDF have performant parquet and CSV readers. |
True, since it's possible to have dataframes loaded into a single GPU (ala
One problem with Option 1 is that the
Looking at cudf's IO readers at https://docs.rapids.ai/api/cudf/stable/api.html#module-cudf.io.csv, these file formats are currently available:
Perhaps we should discuss this upstream at https://github.com/intake/intake too 😁 |
Hi there,
Just wondering if there's scope for a
to_cudf
type functionality so that users can read Parquet files directly into GPU memory (bypassing the CPU). This would be using thecudf.read_parquet
function.Happy to submit a Pull Request for this, but would like to have a discussion around the implementation, whether it should be handled as a
to_cudf
method, or via something likeengine="cudf"
(thoughcudf
also has a "pyarrow" engine like pandas).One issue though is that
cudf
cannot read multi-file Parquet folders yet (see rapidsai/cudf#1688), only single binary parquet files. This might get implemented in the future (v0.16?)cudf
release though.The text was updated successfully, but these errors were encountered: