[WIP] better performance in `collect_similar` #200

asinghvi17 · 2024-10-23T03:52:09Z

Currently this has a runtime of ~80 microseconds on my test case. Built on top of #198. Feel free to ignore the first two commits.

rafaqz · 2024-10-23T08:29:57Z

src/generator.jl

+    # If the array is chunked, read each chunk and apply the function
+    # via broadcasting.
+    if DiskArrays.haschunks(input) isa DiskArrays.Chunked
+        # TODO: change this if DiskArrays ever supports uneven chunks


It does already, but what would need to change?

This comment can be disregarded I think, at least for now. I need more testing to determine the best approach here.

For some more context, the approach I had in an earlier iteration of this was that I pre-allocated a single array that would hold the contents read by the diskarray. Then you could readblock! directly into that array allowing us to get away with minimal allocations.

However, this approach seemed to be slower than even the current approach, not sure why.

asinghvi17 added 4 commits October 22, 2024 17:17

Implement collect_similar like collect for DiskGenerators

e97ab9b

Add a test

9a33e9a

Add a flexible approach that should work with irregular chunks (.1ms)

2df5c29

cut runtime and alloc amount (but NOT allocs) (.09ms)

f7d69aa

meggart mentioned this pull request Oct 23, 2024

Implement collect_similar like collect for DiskGenerators #198

Merged

rafaqz reviewed Oct 23, 2024

View reviewed changes

rafaqz mentioned this pull request Nov 12, 2024

map on a diskarray is very very slow, compared to a regular array #199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] better performance in `collect_similar` #200

[WIP] better performance in `collect_similar` #200

asinghvi17 commented Oct 23, 2024 •

edited

Loading

rafaqz Oct 23, 2024

asinghvi17 Oct 23, 2024

asinghvi17 Nov 7, 2024

[WIP] better performance in collect_similar #200

Are you sure you want to change the base?

[WIP] better performance in collect_similar #200

Conversation

asinghvi17 commented Oct 23, 2024 • edited Loading

rafaqz Oct 23, 2024

Choose a reason for hiding this comment

asinghvi17 Oct 23, 2024

Choose a reason for hiding this comment

asinghvi17 Nov 7, 2024

Choose a reason for hiding this comment

[WIP] better performance in `collect_similar` #200

[WIP] better performance in `collect_similar` #200

asinghvi17 commented Oct 23, 2024 •

edited

Loading