-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON not properly decoded by backends #415
Comments
This is exactly what zarr expects: JSON data encoded as (ascii) bytestrings. When you load this reference set with zarr, that's when the decoding happens. |
huh, thanks. So is there an un-translated version accessible from within kerchunk that can actually be treated as a dictionary? |
SingleHDF actually uses zarr to fill in the JSON metadata into the references dict, so no: it converts the data immediately. You could of course open the dataset with zarr, and use its .attrs to get dicts back. |
does |
It does, but this issue was more of a complaint that as-is the references are in the wrong form to be traversable and manipulatable.
I personally think that kerchunk should create a useful full internal model of zarr (similar to the Zarr Object Models idea), manipulate that, then at the end encode it before serializing. Rather than carrying around just some part of the zarr info as dictionaries in encoded form. |
Would it not be fine to just open the created reference set with zarr? The thing is, the bytestrings are written by zarr in the format you see, so we would have to decode them, only to encode them again at the time of writing to a file. |
Kerchunk doesn't properly decode the JSON for zarr array-level attributes, instead leaving dictionaries as long strings. For example:
Notice that this is only partially decoded - the top two levels are nested python dictionaries, but below that the various zarr attributes are stored as long strings, e.g:
This seems silly, why not just decode the whole thing properly at the beginning so you can always treat it like a nested python dictionary? (Or even better use a dedicated abstraction like suggested in #375)
The text was updated successfully, but these errors were encountered: