Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] single join fully implemented in numba #1304

Merged
merged 20 commits into from
Dec 13, 2023

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Nov 3, 2023

PR Description

Please describe the changes proposed in the pull request:

  • implement single conditional join fully in numba (for less than and greater than conditions)
  • not equal operator implemented with numpy (no numba implementation)

This PR resolves #1302 .

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@samukweku samukweku added the enhancement New feature or request label Nov 3, 2023
@samukweku samukweku self-assigned this Nov 3, 2023
@ericmjl
Copy link
Member

ericmjl commented Nov 3, 2023

Copy link

codecov bot commented Nov 3, 2023

Codecov Report

Merging #1304 (8ed2462) into dev (04fa118) will increase coverage by 1.76%.
Report is 4 commits behind head on dev.
The diff coverage is 96.58%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1304      +/-   ##
==========================================
+ Coverage   92.85%   94.61%   +1.76%     
==========================================
  Files          78       78              
  Lines        4142     4254     +112     
==========================================
+ Hits         3846     4025     +179     
+ Misses        296      229      -67     

@samukweku samukweku force-pushed the samukweku/numba_cond_join_single branch from 36f3eeb to c8a366b Compare November 5, 2023 04:42
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving pending docstrings!

@settings(deadline=None, max_examples=10)
@given(df=conditional_df(), right=conditional_right())
def test_single_condition_greater_than_floats_keep_last_numba(df, right):
"""Test output for a single condition. "<"."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samukweku I took the liberty of asking ChatGPT for a better docstring on this test.

@pytest.mark.turtle
@settings(deadline=None, max_examples=10)
@given(df=conditional_df(), right=conditional_right())
def test_single_condition_greater_than_floats_keep_last_numba(df, right):
    """
    Test the functionality of conditional_join with a single 'greater than' condition on floating-point data, while keeping the last match using Numba.

    This test sorts and filters dataframes 'df' and 'right' by columns 'B' and 'Numeric' respectively, removing NaN values. It then performs a backward merge_asof operation on these sorted dataframes. The expected outcome is a dataframe where each row from 'df' is merged with the last row from 'right' where 'Numeric' is greater than 'B'.

    The actual outcome is produced by the conditional_join method with a 'greater than' condition, left join type, sorted by appearance, keeping the last match, and utilizing Numba for performance optimization. The test asserts that the actual dataframe matches the expected dataframe, ensuring correct functionality of the conditional_join under these specific parameters.
    """
    # Test implementation continues...

Is it accurate? If so, I might begin writing a template for testing! Also, if it is accurate, could you update the docstrings for these two tests please? (I will get them generated for the other one.)

@settings(deadline=None, max_examples=10)
@given(df=conditional_df(), right=conditional_right())
def test_single_condition_greater_than_floats_keep_last(df, right):
"""Test output for a single condition. "<"."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.turtle
@settings(deadline=None, max_examples=10)
@given(df=conditional_df(), right=conditional_right())
def test_single_condition_greater_than_floats_keep_last_numba(df, right):
    """
    Test the functionality of conditional_join with a single 'greater than' condition on floating-point data, while keeping the last match using Numba.

    This test sorts and filters dataframes 'df' and 'right' by columns 'B' and 'Numeric' respectively, removing NaN values. It then performs a backward merge_asof operation on these sorted dataframes. The expected outcome is a dataframe where each row from 'df' is merged with the last row from 'right' where 'Numeric' is greater than 'B'.

    The actual outcome is produced by the conditional_join method with a 'greater than' condition, left join type, sorted by appearance, keeping the last match, without utilizing Numba for performance optimization. The test asserts that the actual dataframe matches the expected dataframe, ensuring correct functionality of the conditional_join under these specific parameters.
    """
    # Test implementation continues...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like a lot of docstring info for tests, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point, actually. I'm still a bit conflicted on whether to be verbose on test docstrings.

@ericmjl
Copy link
Member

ericmjl commented Dec 13, 2023

@samukweku are we good to merge? I think we should, please let me know.

@samukweku
Copy link
Collaborator Author

@ericmjl yes it is ok to merge

@ericmjl ericmjl merged commit 44152a2 into dev Dec 13, 2023
6 checks passed
@ericmjl
Copy link
Member

ericmjl commented Dec 13, 2023

Thank you very much, @samukweku!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] implement full numba version of a single conditional_join
2 participants