Lazy arrays for asymptotically better performance #480

hameerabbasi · 2020-02-17T10:11:35Z

hameerabbasi
Feb 17, 2020
Maintainer

I've already brought this up in another thread (dask/dask#5879), but I was planning to make sparse collections within this library lazy for asymptotically better performance in certain situations. See the following research papers for details:

And the following talks:

These research papers define a method to generate efficient kernels for a broad range of storage formats. They can do things composed of element-wise operations (with broadcasting) and reductions, but they can't do things like (for example) eigendecompositions (which we intend to do with SciPy wrappers for LAPACK, et. al.).

With this in mind, would it make sense to make sparse collections lazy, with the caveat of an API break? These would have an API similar to Dask, having to do arr.compute() for the final result. As discussed in dask/dask#5879, it would also follow the protocols for dask custom collections.

If we manage to do this right, adding GPU support shouldn't be difficult either. But the question arises, is it worth the break an API compatibility to do this?

hameerabbasi · 2020-02-17T10:13:34Z

hameerabbasi
Feb 17, 2020
Maintainer Author

Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a FutureWarning in 0.* about the API change.

0 replies

mrocklin · 2020-02-17T15:39:26Z

mrocklin
Feb 17, 2020
Maintainer

I think that there is value in having a fully immediate library for multi-dimensional spare arrays. Not everyone is going to want laziness, even given the performance that it brings. I would like to suggest that some version of this library remains non-lazy and publicly targetable without a version modifier. Perhaps it would make sense to have a different sparse_lazy library that depended strongly on this one?

…

On Mon, Feb 17, 2020 at 2:13 AM Hameer Abbasi ***@***.***> wrote: Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a FutureWarning in 0.* about the API change. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#326?email_source=notifications&email_token=AACKZTDVMRHZZMQI26TU633RDJPM5A5CNFSM4KWOPEJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL52IYY#issuecomment-586916963>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTG7KTPLZF3QS4TRIULRDJPM5ANCNFSM4KWOPEJQ> .

0 replies

hameerabbasi · 2020-02-17T18:01:49Z

hameerabbasi
Feb 17, 2020
Maintainer Author

Perhaps it would make sense to have a
different sparse_lazy library that depended strongly on this one?

I think if we were to do this, it’d be the other way around: immediate operations would just do .compute() at the end. How about I leave the main namespace API as-is move the lazy part to sparse.perf.

0 replies

mrocklin · 2020-02-17T18:57:04Z

mrocklin
Feb 17, 2020
Maintainer

That sounds nicer to me from an integration perspective. Also, to be clear, I'm very excited about seeing what this can do in terms of performance. My original hesitation is mostly around thinking about other libraries depending on pydata/sparse. My guess is that folks will be more hesitant to depend on anything that has any sort of non-traditional behavior.

0 replies

rgommers · 2020-02-17T20:11:46Z

rgommers
Feb 17, 2020

I think there's also a good amount of evidence that eager is better as a default, see e.g. PyTorch modes and Tensorflow moving to eager by default.

So +1 for eager by default, and advertising lazy mode prominently as a potential speedup method.

0 replies

hameerabbasi · 2020-02-26T14:20:16Z

hameerabbasi
Feb 26, 2020
Maintainer Author

So I think the final decision is:

Build the eager method on top of the lazy.
Expose the lazy method as well for potential speedups, as a separate submodule.

0 replies

dhirschfeld · 2020-02-26T19:52:52Z

dhirschfeld
Feb 26, 2020

Potentially OT, but I'm curious how this relates to uarray - isn't that lazily evaluated? And if so, couldn't you just create a sparse backend?

0 replies

hameerabbasi · 2020-02-26T19:54:41Z

hameerabbasi
Feb 26, 2020
Maintainer Author

@dhirschfeld An early version of uarray that was renamed to metadsl was lazy. The current iteration is not lazy.

0 replies

saulshanabrook · 2020-03-11T15:27:37Z

saulshanabrook
Mar 11, 2020

Build the eager method on top of the lazy.

That makes sense!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy arrays for asymptotically better performance #480

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Lazy arrays for asymptotically better performance #480

hameerabbasi Feb 17, 2020 Maintainer

Replies: 9 comments

hameerabbasi Feb 17, 2020 Maintainer Author

mrocklin Feb 17, 2020 Maintainer

hameerabbasi Feb 17, 2020 Maintainer Author

mrocklin Feb 17, 2020 Maintainer

rgommers Feb 17, 2020

hameerabbasi Feb 26, 2020 Maintainer Author

dhirschfeld Feb 26, 2020

hameerabbasi Feb 26, 2020 Maintainer Author

saulshanabrook Mar 11, 2020

hameerabbasi
Feb 17, 2020
Maintainer

hameerabbasi
Feb 17, 2020
Maintainer Author

mrocklin
Feb 17, 2020
Maintainer

hameerabbasi
Feb 17, 2020
Maintainer Author

mrocklin
Feb 17, 2020
Maintainer

rgommers
Feb 17, 2020

hameerabbasi
Feb 26, 2020
Maintainer Author

dhirschfeld
Feb 26, 2020

hameerabbasi
Feb 26, 2020
Maintainer Author

saulshanabrook
Mar 11, 2020