Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the brand new Pandas Dataframe alternatives #620

Open
juarezr opened this issue Apr 25, 2022 · 0 comments
Open

Support the brand new Pandas Dataframe alternatives #620

juarezr opened this issue Apr 25, 2022 · 0 comments
Labels
Feature A nice to have thing that we don't have yet Help Wanted We are volunteers. We'll be happy if you join us.

Comments

@juarezr
Copy link
Member

juarezr commented Apr 25, 2022

Problem description

It would be nice to support the brand new Dataframe besides Pandas.

Two interesting candidates would be:

Modin Overview

Scale your pandas workflow by changing a single line of code

Modin uses Ray or Dask to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.

Polars Overview

Lightning-fast DataFrame library for Rust and Python

Polars is a lightning fast DataFrame library/in-memory query engine. Its embarrassingly parallel execution, cache efficient algorithms and expressive API makes it perfect for efficient data wrangling, data pipelines, snappy APIs and so much more.

Problem Description

Currently petl supports Pandas by using the functions petl.io.pandas.dataframe and petl.io.pandas.todataframe

Evolving this kind of feature would be important to research:

  • How do they fit in petl use cases.
  • What are the best ergonomic APIs that we need to consider either for adding new functions or adding support to existing ones.
  • What additional burden is needed for supporting it properly. Ex:
    • CI: acceptance tests
    • CD: impact on the releases
    • documentation: details on API, caveats, proper setup, FAQ, and troubleshooting
  • What happens when the upstream projects break compatibilities between versions
@juarezr juarezr added Feature A nice to have thing that we don't have yet Help Wanted We are volunteers. We'll be happy if you join us. labels Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A nice to have thing that we don't have yet Help Wanted We are volunteers. We'll be happy if you join us.
Projects
None yet
Development

No branches or pull requests

1 participant