Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cubewrite() workflow steps into process() #145

Open
8 tasks
truth-quark opened this issue Oct 9, 2024 · 4 comments
Open
8 tasks

Refactor cubewrite() workflow steps into process() #145

truth-quark opened this issue Oct 9, 2024 · 4 comments
Assignees

Comments

@truth-quark
Copy link
Collaborator

truth-quark commented Oct 9, 2024

Early October 2024 work has simplified cubewrite() enough to move its contents into process() & unify the top level workflow under a single function.

This is dependent on merging #114, #118 and #144.

Some tasks for this include:

  • Move cube modification functions before the netCDF writer ctx manager
  • Move cube writing directly into the netCDF writer ctx manager
  • Refactor tests
    • Discard mock.patch("umpost.um2netcdf.cubewrite")
    • Discard mock.Mock(name="mock_sman")

Tasks:

  • Some funcs return new cubes which cause custom item_code to be lost
    • Breaks some unit tests expecting the same cube as output
    • For asserts, how do we compare test fixture cubes against copies from process_cubes()? (what assertions are needed for the output cubes?)
  • rename sman to dest. (e.g. dest.write(cube, ...))
  • Can fill_value be grabbed from the cube for sman.write()
  • Is unlimited_dimensions stored in the cube?
  • Generate netCDF with the aiihca.paa1jan & Martin's original version
  • Expand binary test script to optionally run tests against aiihca.paa1jan (larger data tests)
@truth-quark truth-quark self-assigned this Oct 9, 2024
@truth-quark
Copy link
Collaborator Author

truth-quark commented Oct 9, 2024

The above can be combined with splitting the combined process() into a functional core/imperative shell design.

Generator Design: split like:

def process(infile, outfile, args):
    # this is the imperative shell
    ff = mule.load_umfile(str(infile))  # I/O
    mv = process_mule_vars(ff)
    cubes = sorted(iris.load(infile), key=something)  # I/O

    with iris.fileformats.netcdf.Saver(outfile, ...) as dest:  # this is I/O
        # write global vars

        for cube in process_cubes(cubes, mv, ...):
            dest.write(cube, zlib=True, complevel=compression, ...)  # also I/O

def process_cubes(cubes, mv, ...):
    # this is the top of the functional core
    for cube in cubes:
        # workflow steps are here
        fix_var_name()
        cube = modified_cube(cube)  # for example
        yield cube  # modified cubes returned to caller

The test_process_...() funcs will need:

  • renaming
  • discard all I/O mocks
  • call process_cubes()
  • handle process_cubes() as a generator

@truth-quark
Copy link
Collaborator Author

truth-quark commented Oct 9, 2024

Alternative Coroutine Design:

def process(infile, outfile, args):
    # this is the imperative shell
    ff = mule.load_umfile(str(infile))  # I/O
    mv = process_mule_vars(ff)
    cubes = sorted(iris.load(infile), key=something)   # I/O
    con = consumer(args, ...)
    process_cubes(cubes, mv, con, ...):

def consumer(outfile, ...):
    # part of the imperative shell
    with iris.fileformats.netcdf.Saver(outfile, ...) as dest:  # I/O
        # write global vars here, then
        cube = (yield)
        dest.write(cube, zlib=True, complevel=compression, ...)  # I/O

# testing this is fiddlier due to the con/coroutine dependency
def process_cubes(cubes, mv, con, ...):
    # this is the top level of the functional core
    for cube in cubes:
        # workflow func calls go here
        fix_var_name()
        cube = modified_cube(cube)  # for example
        con.send(cube)  # generic coroutine mechanism, separates this from I/O

    # also return cubes???
    return ...

This pattern is fiddlier, but may provide an alt option if the generator version isn't feasible. There's potential benefits if multiple forms of consumer are required (e.g. different versions of libs).

@aidanheerdegen
Copy link
Member

aidanheerdegen commented Oct 10, 2024

The generator design looks clear and clean to me.

Could that structure be easily adapted for one variable per file output?

#89

Is there a 1:1 relationship between cubes and variables?

Would context lib.ExitStack work to switch between single file and multiple output?

@blimlim
Copy link
Collaborator

blimlim commented Oct 16, 2024

Because there's a few separate tasks involved, I'll try and address them in bitesize pieces and make small PRs into the 145/fcis-architecture branch – hopefully this will be easier for the reviewers. Once all the changes are in 145/fcis-architecture I'll then make a PR into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants