Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy array default hasher #43

Open
filyp opened this issue Feb 8, 2021 · 4 comments
Open

Numpy array default hasher #43

filyp opened this issue Feb 8, 2021 · 4 comments

Comments

@filyp
Copy link

filyp commented Feb 8, 2021

I cache a function which is called with large numpy arrays. There is an efficient way to hash them with https://github.com/ifduyue/python-xxhash

I think it's a common use case, so I can merge my method to the default hasher in this library.

The only problem is that hash collisions are possible, and the cache would then work incorrectly, but it's very rare, and also the existing functionality isn't broken. I can also add an option to not hash them but just convert to string with array.tobytes(), which prevents collisions but is only is suited for small arrays.

The same can be done for regular python lists (with xxhash or str(list)).

If you want, I can make a PR with this.

@shaypal5 shaypal5 assigned shaypal5 and filyp and unassigned shaypal5 Feb 9, 2021
@shaypal5
Copy link
Collaborator

shaypal5 commented Feb 9, 2021

Sounds great! Please find a way to benchmark the improvement, though!

Thank you for suggesting a PR!

@Borda
Copy link
Contributor

Borda commented Jan 30, 2024

How about converting the list to a tuple? #136

@Borda
Copy link
Contributor

Borda commented Feb 15, 2024

I think this is resolved with #183; see:

import numpy as np
import cachier

count = 0

@cachier.cachier()
def dummy_func(a, b=np.array([2])):
    global count
    count += 1
    return a + b

assert count == 0
dummy_func(np.array([1]))
dummy_func(np.array([1]), np.array([2]))
dummy_func(np.array([1]), b=np.array([2]))
dummy_func(a=np.array([1]), b=np.array([2]))
assert count == 1

@Borda Borda unassigned filyp Feb 15, 2024
@shaypal5
Copy link
Collaborator

I think we need to have to things to close this confidently:

  1. A test covering this case.
  2. A benchmark making sure performance, time-wise, for very large numpy array using the current solution are comparable to using hpython-xxhash.

Is 1 already in place?
If so, a contribution of 2 will enable me to feel comfortable with closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants