Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose load from csv file via Python driver #504

Open
rad-pat opened this issue Nov 6, 2024 · 5 comments
Open

feat: expose load from csv file via Python driver #504

rad-pat opened this issue Nov 6, 2024 · 5 comments

Comments

@rad-pat
Copy link

rad-pat commented Nov 6, 2024

Currently, only exposed is to load a data array via stream_load method on Python driver which involves reading the data from file and passing into the method
It would be good to have a method to load from CSV file with options, and/or the ability to stage a file with the driver bindings.

@everpcpc
Copy link
Member

How about adding support to stream_load a pandas DataFrame?

@rad-pat
Copy link
Author

rad-pat commented Dec 19, 2024

We're looking for the fastest/most optimal way to load data into Databend. If we have a local CSV file, what is the best option? Currently we're pushing the file to cloud storage and executing COPY INTO statement.
With Greenplum/Postgres - we could feed the file through STDIN into COPY command.

@everpcpc
Copy link
Member

everpcpc commented Dec 19, 2024

If you are just seeking to load files, then you may try out with bendsql:

bendsql --query='INSERT INTO http_books_02 VALUES;' --format=csv --data=@cli/tests/data/books.csv

# or with STDIN

bendsql --query='INSERT INTO http_books_01 VALUES;' --format=csv --data=@- <cli/tests/data/books.csv

# or with more options

bendsql \
    --query='INSERT INTO http_ontime_03 VALUES;' \
    --format=csv \
    --format-opt="compression=gzip" \
    --format-opt="skip_header=1" \
    --data=@cli/tests/data/ontime_200.csv.gz

@rad-pat
Copy link
Author

rad-pat commented Dec 19, 2024

Yep, that's good for a test. How can I replicate with Python driver, passing the file name?

@everpcpc
Copy link
Member

We will add support for python driver in next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants