-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zarr Encoder #38
base: develop
Are you sure you want to change the base?
Add Zarr Encoder #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments on documentation but otherwise looks good.
General design comment / question. Can this be generalized such that one can do:
BUFR to IODA
BUFR to ZARR
ZARR to IODA
ZARR to BUFR
IODA to BUFR
IODA to ZARR
?
Just a thought. I'm not sure if all of those combos would be needed but I imagine there will be a need to write BUFR for some purpose to disseminate to WMO? @emilyhcliu any thoughts?
@@ -134,13 +133,11 @@ It has the following sub-sections: | |||
Encoder Description | |||
~~~~~~~~~~~~~~~~ | |||
|
|||
The **ioda** section defines the ObsGroup objects that will be created. Here is an example: | |||
The **encoder** section defines the ObsGroup objects that will be created. Here is an example: | |||
|
|||
.. code-block:: yaml | |||
|
|||
encoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not specify in the YAML which encoder you want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
encoder:
type: netcdf
dimensions:
globals:
variables:
Currently the type
does not have any effect for netcdf or ioda format.
Do we need to specify type to zarr when the output is zarr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for the encoder is currently not dependent on the encoder type. This is partially due to the fact that all the output formats are similar to HDF5. Currently no need to indicate the output type in the YAML file (the spec is the same).
There is no ability to output BUFR files of any kind... This would be a HUGE deal, probably leading us to dump the dependency on NCEPLib-bufr.
The IODA Encoder for bufr is part of the IODA project, and is not documented here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilyhcliu The type attribute is not used or needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks @rmclaren for the clarification
@@ -134,13 +133,11 @@ It has the following sub-sections: | |||
Encoder Description | |||
~~~~~~~~~~~~~~~~ | |||
|
|||
The **ioda** section defines the ObsGroup objects that will be created. Here is an example: | |||
The **encoder** section defines the ObsGroup objects that will be created. Here is an example: | |||
|
|||
.. code-block:: yaml | |||
|
|||
encoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
encoder:
type: netcdf
dimensions:
globals:
variables:
Currently the type
does not have any effect for netcdf or ioda format.
Do we need to specify type to zarr when the output is zarr?
@@ -16,13 +20,140 @@ namespace py = pybind11; | |||
|
|||
using bufr::encoders::Description; | |||
|
|||
|
|||
template<typename T> | |||
class PyGlobalWriter : public bufr::encoders::GlobalWriter<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmclaren Here, are you adding a Python interface to write global attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilyhcliu No need too specify zarr, as the description between the different outputs is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilyhcliu Yes exactly. The global attributes are objects with different types (specified in the yaml file). You need to supply it with a writer object so it can write its value. This implementation of the writer writes the value into a python dictionary. This machinery allows the C++ compiler to determine the correct types at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mostly to understand the code.
Cloned it but couldn't run -
source /home/Emily.Liu/modules/env_jedi.sh
python bufr2zarr.py /TEMP/bufr-query/tools/bufr
Traceback (most recent call last):
File "/scratch1/NCEPDEV/da/Iliana.Genkova/TEMP/bufr-query/tools/bufr2zarr/bufr2zarr.py", line 8, in
import bufr
ModuleNotFoundError: No module named 'bufr'
Likely my environment is not set right.
import re | ||
from typing import Union | ||
|
||
import zarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "zarr" off-the-shelf/standard Python package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be installed separatly (pip install zarr). The python package is the official implementation for that format.
@@ -45,5 +45,6 @@ void setupMpi(py::module& m) | |||
py::class_<bufr::mpi::Comm>(m, "Comm") | |||
.def(py::init<const std::string&>()) | |||
.def("name", &bufr::mpi::Comm::name) | |||
.def("rank", &bufr::mpi::Comm::rank); | |||
.def("rank", &bufr::mpi::Comm::rank) | |||
.def("size", &bufr::mpi::Comm::size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was "size" not necessary before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just never ended up using it, so never noticed it was missing. Should have been there all along..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where it could be done, but some documentations of the difference between "rank" and "size" might be helpful.
else: | ||
container = bufr.Parser(data_path, mapping_path).parse() | ||
|
||
if comm.rank() == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmclaren No executable for bufr2zarr. The data will be encoded through zarr.Encoder
in Python, correct?
dataset = next(iter(zarr.Encoder(YAML_PATH).encode(container, OUTPUT_PATH).values()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bufr2zarr.py is installed in the bin directory and can be treated just like bufr2netcdf.x. It even takes the same arguments. The zarr encoder is implemented in python as the zarr library has no c/c++ interface, that is true.
|
@emilyhcliu I was thinking more like will there be a need to have something 'like prepBUFR' written out, I guess I misspoke with the WMO comment. |
@ilianagenkova It sounds like you need to extend your python path too point at the correct directory. So something like this: export PYTHONPATH=<my project dir>/bufr-query/build/lib/python3.12/site-packages:$PYTHONPATH |
@CoryMartin-NOAA I hope there is no need for this in the future :-(. I will ask Daryl if writing out data (obs, analysis, ....etc) to BUFR again is required. |
I believe the BUFR format is related to disseminating data via GTS. |
This pull request adds a new encoder for the ZARR format. Here is an example of what the structure of the ZARR file looks like:
The zarr file has the following basic structure:
The global information is added to the root groups attributes.
The special attribute _ARRAY_DIMENSIONS is added for each dataset which maps the data to the associated dimension arrays.