[ENH] single join fully implemented in numba #1304

samukweku · 2023-11-03T13:20:25Z

PR Description

Please describe the changes proposed in the pull request:

implement single conditional join fully in numba (for less than and greater than conditions)
not equal operator implemented with numpy (no numba implementation)

This PR resolves #1302 .

PR Checklist

Please ensure that you have done the following:

PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.

If you're not on the contributors list, add yourself to AUTHORS.md.

Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
- Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

Building a preview of the docs on Netlify
Automatically linting the code
Making sure the code is documented
Making sure that all tests are passed
Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@ericmjl

ericmjl · 2023-11-03T13:23:04Z

🚀 Deployed on https://deploy-preview-1304--pyjanitor.netlify.app

codecov · 2023-11-03T13:25:21Z

Codecov Report

Merging #1304 (8ed2462) into dev (04fa118) will increase coverage by 1.76%.
Report is 4 commits behind head on dev.
The diff coverage is 96.58%.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1304      +/-   ##
==========================================
+ Coverage   92.85%   94.61%   +1.76%     
==========================================
  Files          78       78              
  Lines        4142     4254     +112     
==========================================
+ Hits         3846     4025     +179     
+ Misses        296      229      -67

ericmjl

Pre-approving pending docstrings!

ericmjl · 2023-11-11T13:19:37Z

tests/functions/test_conditional_join.py

+@settings(deadline=None, max_examples=10)
+@given(df=conditional_df(), right=conditional_right())
+def test_single_condition_greater_than_floats_keep_last_numba(df, right):
+    """Test output for a single condition. "<"."""


@samukweku I took the liberty of asking ChatGPT for a better docstring on this test.

@pytest.mark.turtle @settings(deadline=None, max_examples=10) @given(df=conditional_df(), right=conditional_right()) def test_single_condition_greater_than_floats_keep_last_numba(df, right): """ Test the functionality of conditional_join with a single 'greater than' condition on floating-point data, while keeping the last match using Numba. This test sorts and filters dataframes 'df' and 'right' by columns 'B' and 'Numeric' respectively, removing NaN values. It then performs a backward merge_asof operation on these sorted dataframes. The expected outcome is a dataframe where each row from 'df' is merged with the last row from 'right' where 'Numeric' is greater than 'B'. The actual outcome is produced by the conditional_join method with a 'greater than' condition, left join type, sorted by appearance, keeping the last match, and utilizing Numba for performance optimization. The test asserts that the actual dataframe matches the expected dataframe, ensuring correct functionality of the conditional_join under these specific parameters. """ # Test implementation continues...

Is it accurate? If so, I might begin writing a template for testing! Also, if it is accurate, could you update the docstrings for these two tests please? (I will get them generated for the other one.)

ericmjl · 2023-11-11T13:20:07Z

tests/functions/test_conditional_join.py

+@settings(deadline=None, max_examples=10)
+@given(df=conditional_df(), right=conditional_right())
+def test_single_condition_greater_than_floats_keep_last(df, right):
+    """Test output for a single condition. "<"."""


@pytest.mark.turtle @settings(deadline=None, max_examples=10) @given(df=conditional_df(), right=conditional_right()) def test_single_condition_greater_than_floats_keep_last_numba(df, right): """ Test the functionality of conditional_join with a single 'greater than' condition on floating-point data, while keeping the last match using Numba. This test sorts and filters dataframes 'df' and 'right' by columns 'B' and 'Numeric' respectively, removing NaN values. It then performs a backward merge_asof operation on these sorted dataframes. The expected outcome is a dataframe where each row from 'df' is merged with the last row from 'right' where 'Numeric' is greater than 'B'. The actual outcome is produced by the conditional_join method with a 'greater than' condition, left join type, sorted by appearance, keeping the last match, without utilizing Numba for performance optimization. The test asserts that the actual dataframe matches the expected dataframe, ensuring correct functionality of the conditional_join under these specific parameters. """ # Test implementation continues...

feels like a lot of docstring info for tests, no?

You have a good point, actually. I'm still a bit conflicted on whether to be verbose on test docstrings.

…r equi joins

ericmjl · 2023-12-13T01:44:13Z

@samukweku are we good to merge? I think we should, please let me know.

samukweku · 2023-12-13T06:07:42Z

@ericmjl yes it is ok to merge

ericmjl · 2023-12-13T16:33:22Z

Thank you very much, @samukweku!

single join fully implemented in numba

d2d2ede

samukweku added the enhancement New feature or request label Nov 3, 2023

samukweku requested review from ericmjl, hectormz, thatlittleboy and apatao November 3, 2023 13:20

samukweku self-assigned this Nov 3, 2023

ericmjl and others added 2 commits November 4, 2023 19:59

Merge dev into samukweku/numba_cond_join_single

d3d30ff

get single equi join in numba; check for monotonicity before sorting

c8a366b

samukweku force-pushed the samukweku/numba_cond_join_single branch from 36f3eeb to c8a366b Compare November 5, 2023 04:42

ericmjl and others added 5 commits November 5, 2023 14:41

Merge dev into samukweku/numba_cond_join_single

7cfa959

improve logic for monotonicity in numba equi join

172f216

Merge dev into samukweku/numba_cond_join_single

c087018

add tests

67188f9

simplify boolean capture logic

34b3bb5

ericmjl approved these changes Nov 11, 2023

View reviewed changes

ericmjl and others added 11 commits November 17, 2023 21:03

Merge dev into samukweku/numba_cond_join_single

6092f7d

restrict column check to only non-equi joins, or if use_numba=True fo…

f8d80e7

…r equi joins

add docstrings to tests

22153d8

Merge dev into samukweku/numba_cond_join_single

6ea4ce1

update dtype check

450e129

simplify dtype check further

c817290

Update conditional_join.ipynb

8f09ef8

add validate keyword for equi joins

5d2aa11

fix docstrings

31baa3d

further fix docstrings

a36dba9

remove validate keyword

2a7c004

Merge dev into samukweku/numba_cond_join_single

8ed2462

ericmjl merged commit 44152a2 into dev Dec 13, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] single join fully implemented in numba #1304

[ENH] single join fully implemented in numba #1304

samukweku commented Nov 3, 2023 •

edited

Loading

ericmjl commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 3, 2023 •

edited

Loading

ericmjl left a comment

ericmjl Nov 11, 2023

ericmjl Nov 11, 2023

samukweku Nov 13, 2023

ericmjl Dec 13, 2023

ericmjl commented Dec 13, 2023

samukweku commented Dec 13, 2023

ericmjl commented Dec 13, 2023

[ENH] single join fully implemented in numba #1304

[ENH] single join fully implemented in numba #1304

Conversation

samukweku commented Nov 3, 2023 • edited Loading

PR Description

PR Checklist

Automatic checks

Relevant Reviewers

ericmjl commented Nov 3, 2023 • edited Loading

codecov bot commented Nov 3, 2023 • edited Loading

Codecov Report

ericmjl left a comment

Choose a reason for hiding this comment

ericmjl Nov 11, 2023

Choose a reason for hiding this comment

ericmjl Nov 11, 2023

Choose a reason for hiding this comment

samukweku Nov 13, 2023

Choose a reason for hiding this comment

ericmjl Dec 13, 2023

Choose a reason for hiding this comment

ericmjl commented Dec 13, 2023

samukweku commented Dec 13, 2023

ericmjl commented Dec 13, 2023

samukweku commented Nov 3, 2023 •

edited

Loading

ericmjl commented Nov 3, 2023 •

edited

Loading

codecov bot commented Nov 3, 2023 •

edited

Loading