Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readblock! outputs values in wrong positions #192

Closed
lazarusA opened this issue Sep 15, 2024 · 11 comments
Closed

readblock! outputs values in wrong positions #192

lazarusA opened this issue Sep 15, 2024 · 11 comments

Comments

@lazarusA
Copy link
Contributor

lazarusA commented Sep 15, 2024

As of this version, the output of this looks wrong.
using the file from here

But, it also happens for other .tif and .nc chunked files.

This looks to happened only when the data is stored into multi chunks, then the reading of those gets corrupted. If is only one, then no issue, or if we use ds["DQF"].data[1:2000, 1:2000] is also fine.

DiskArrays 0.4.4

using YAXArrays
using NetCDF
using GLMakie

ds = open_dataset("noaa.nc")
heatmap(ds["DQF"])
Screenshot 2024-09-15 at 10 56 11

DiskArrays 0.3.23 Outputs the correct array

Screenshot 2024-09-15 at 10 46 43

see also, rafaqz/Rasters.jl#735

@lazarusA lazarusA changed the title roadblock! outputs values in wrong positions readblock! outputs values in wrong positions Sep 15, 2024
@asinghvi17
Copy link
Member

asinghvi17 commented Sep 17, 2024

ooh interesting, thanks Lazaro! Good to know that it's a DiskArrays thing.

Curious that NetCDF.jl also has the same problem with the _Unsigned attribute. (The random high value areas are actually sign errors from UInt16 to Int16)...

@meggart
Copy link
Collaborator

meggart commented Oct 22, 2024

The main issue is that heatmap seems to fall back to reading a DiskArray through the iterator interface. That means the data is looped through chunk by chunk while Makie is assuming it goes through normal julia iteration order, A small reproducer would be this one:

using DiskArrays.TestTypes
using GLMakie

data = [i+j for i in 1:200, j in 1:100]
da = AccessCountDiskArray(data, chunksize=(10,10))


heatmap(da)

while the following plot would be correct:

heatmap(da[:,:])

A solution to the problem at hand would be to read the data into memory before plotting heatmap(ds["DQF"][:,:]).

@asinghvi17
Copy link
Member

This is a more general issue with collect(diskarray) then? Good to know, but a lot of people will use it naively.

Should collect specifically return what one would expect on a diskarray, instead of looping chunk-wise?

@rafaqz
Copy link
Collaborator

rafaqz commented Oct 22, 2024

No collect does the right thing. Makie must be actually iterating with iterate somewhere. Would be good to find that and switch it to a broadcast.

iterate and eachindex are both out of order (in chunk order). But if you use them together you get the right answers. Of course that can break in places.

@asinghvi17
Copy link
Member

Hmm, looks like heatmap's convert_args call el32convert which calls elconvert. Have a look at these lines:
https://github.com/MakieOrg/Makie.jl/blob/6dfb420445065c56f5af7944c87ecdfc22eff507/src/conversions.jl#L707C1-L718C4
elconvert is calling map if it finds an abstract array with missings, or convert(AbstractArray{NewType}, input) otherwise. Either of these might be failing, I'm trying to narrow down where that might be happening.

@asinghvi17
Copy link
Member

asinghvi17 commented Oct 22, 2024

OK, I have a smaller MWE:

map(identity, parent(ds["DQF"]))

gives me a bad heatmap.

map(f, A::AbstractArray) = collect_similar(A, Generator(f,A)) is the implementation that Julia points me to, and I can confirm that calling Base.collect_similar(parent(ds["DQF"]), Base.Generator(identity, parent(ds["DQF"]))) also has the same striping problem.

This is a more general problem, because map on a DiskArray may not work like you'd think it would. Maybe Base.collect_similar needs to be adjusted here, so that it iterates in the correct order?

@rafaqz
Copy link
Collaborator

rafaqz commented Oct 22, 2024

Shouldn't DiskGenerator fix that? Maybe it's not being used somehow. I thought we had fixed this 😅

@asinghvi17
Copy link
Member

According to @which at least, it's not being used...

@asinghvi17
Copy link
Member

Ah this is #144 again, I hadn't seen that issue

@rafaqz
Copy link
Collaborator

rafaqz commented Oct 22, 2024

Oh right we didn't actually fix that 😭

@asinghvi17
Copy link
Member

asinghvi17 commented Nov 7, 2024

I don't have access to YAXArrays on my current system, but I tested with Rasters/DD/DiskArrays latest, and this is indeed working. Probably fixed by #198.

@rafaqz rafaqz closed this as completed Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants