Lazy arrays for asymptotically better performance #480
Replies: 9 comments
-
Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a |
Beta Was this translation helpful? Give feedback.
-
I think that there is value in having a fully immediate library for
multi-dimensional spare arrays. Not everyone is going to want laziness,
even given the performance that it brings. I would like to suggest that
some version of this library remains non-lazy and publicly targetable
without a version modifier. Perhaps it would make sense to have a
different sparse_lazy library that depended strongly on this one?
…On Mon, Feb 17, 2020 at 2:13 AM Hameer Abbasi ***@***.***> wrote:
Here's my proposal: We declare 0.x a backwards-compatible API, and
consider 1.x to be based on TACO. We will emit a FutureWarning in 0.*
about the API change.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#326?email_source=notifications&email_token=AACKZTDVMRHZZMQI26TU633RDJPM5A5CNFSM4KWOPEJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL52IYY#issuecomment-586916963>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTG7KTPLZF3QS4TRIULRDJPM5ANCNFSM4KWOPEJQ>
.
|
Beta Was this translation helpful? Give feedback.
-
I think if we were to do this, it’d be the other way around: immediate operations would just do |
Beta Was this translation helpful? Give feedback.
-
That sounds nicer to me from an integration perspective. Also, to be clear, I'm very excited about seeing what this can do in terms of performance. My original hesitation is mostly around thinking about other libraries depending on pydata/sparse. My guess is that folks will be more hesitant to depend on anything that has any sort of non-traditional behavior. |
Beta Was this translation helpful? Give feedback.
-
I think there's also a good amount of evidence that eager is better as a default, see e.g. PyTorch modes and Tensorflow moving to eager by default. So +1 for eager by default, and advertising lazy mode prominently as a potential speedup method. |
Beta Was this translation helpful? Give feedback.
-
So I think the final decision is:
|
Beta Was this translation helpful? Give feedback.
-
Potentially OT, but I'm curious how this relates to |
Beta Was this translation helpful? Give feedback.
-
@dhirschfeld An early version of |
Beta Was this translation helpful? Give feedback.
-
That makes sense! |
Beta Was this translation helpful? Give feedback.
-
I've already brought this up in another thread (dask/dask#5879), but I was planning to make sparse collections within this library lazy for asymptotically better performance in certain situations. See the following research papers for details:
And the following talks:
These research papers define a method to generate efficient kernels for a broad range of storage formats. They can do things composed of element-wise operations (with broadcasting) and reductions, but they can't do things like (for example) eigendecompositions (which we intend to do with SciPy wrappers for LAPACK, et. al.).
With this in mind, would it make sense to make sparse collections lazy, with the caveat of an API break? These would have an API similar to Dask, having to do
arr.compute()
for the final result. As discussed in dask/dask#5879, it would also follow the protocols for dask custom collections.If we manage to do this right, adding GPU support shouldn't be difficult either. But the question arises, is it worth the break an API compatibility to do this?
Beta Was this translation helpful? Give feedback.
All reactions