Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a geoSeries class as a subclass of Series. Same for MultipleSeries/EnsembleSeries. #326

Closed
khider opened this issue Feb 13, 2023 · 12 comments
Assignees

Comments

@khider
Copy link
Member

khider commented Feb 13, 2023

Is your feature request related to a problem? Please describe.
Adding location is important for some applications but not others. Current implementation only allows geo information through LipdSeries. The development branch has an implementation of Series that would take lat/lon arg as optional since most of our methods do not require geolocation. For the methods that do, we return an error if lat/lon are not specified, which is not ideal.

Describe the solution you'd like
We consider two solutions:

  • Make lat/lon mandatory. There are obvious downsides to this. First, it will break most of our existing tutorials/examples/workflows. Second, since lat/lon is now required, it will force users to enter more metadata that they may not need for most of the applications.
  • [Preferred] Create a child of Series, called GeoSeries that would require the use of lat/lon for mapping functionaries but reuse the existing functions in Series class for all other applications. This schema would need to be extended to GeoMultipleSeries to allow for EOF/PCA analysis.
@CommonClimate
Copy link
Collaborator

I like the GeoSeries idea, and it poses no problem to extend it to GeoMultipleSeries (child of MultipleSeries). The only thing to work out is the requirements of a geoEnsembleSeries class. It can only be a child of EnsembleSeries if we have a constraint that all the lat/lon are the same in __init __.

This poses no particular problem and it should not break anything (famous last words). However, we do need to decide where this falls in terms of development. I'd like to do this after pandas integration is complete, what do you think?

@khider
Copy link
Member Author

khider commented Feb 13, 2023

Definitely after the pandas integration is complete. Since LiPD/LiPDSeries will become deprecated after the pylipd integration, I suggest we do it then.

@CommonClimate CommonClimate modified the milestones: Changes due to new implementation with PyLiPD, geolocated Series Feb 13, 2023
@CommonClimate
Copy link
Collaborator

CommonClimate commented Feb 20, 2023

TODO
geoSeries

  • make geoSeries a child of Series (in series.py or geoseries.py)
  • implement check on lat/lon range
  • move over lipdSeries.dashboard and Series.map to this class
  • update conftest.py with metadata for Series class (remove lat/lon)
  • update metadata.yml to remove lat/lon from all but one of the Series
  • write new CI tests
  • add rich metadata discussed May 10 at the team hackathon (depth, sensorType, observationType, etc)

MultipleGeoSeries

  • make MultipleSeries a child of MultipleSeries (in geomultipleseries.py)
  • create fixture for CI tests
  • Enable mapping (repurpose MapAllArchive)
  • Enable hue semantics: by archives/elevation/Median Resolution (use resolution() class)
  • enable mapping for PCA on geomultipleseries: use superclass to call MultipleSeries.PCA via super, then pass locs as a matrix made from self.lon, self.lat. See Enable mapping for Decompositions #422
  • create Euro2k.json file in data folder and use for docstrings and CI

EnsembleGeoSeries

  • make EnsembleGeoSeries a child of EnsembleSeries (geoseries.py or geoensembleseries.py)
  • endow it with a dashboard() method like GeoSeries, but with ensemble ts plot and ensemble spectrum plot
  • paragon: either IODP846 graphdb endpoint (open with pyLipd) (heavily truncated for computational convenience in CI tests) for docstring or a SISAL record.
  • add parameters lat/lon (floats), set to None by defaults.
  • implement check that all lat/lon in object are identical (error if not). if lat/lon are None, lift them from self.series_list[0]

MultipleEnsembleGeoSeries

  • enable mapping for MC-PCA

@CommonClimate
Copy link
Collaborator

For GeoSeries

  • make lat/lon mandatory (after time/value)
  • keep Marco's metadata, but lat/lon cannot be None
    For MultipleGeoseries
  • check that all constituents are GeoSeries
    For MultipleSeries
  • check that all constituents are either Series or geoSeries

@khider
Copy link
Member Author

khider commented May 9, 2023

For GeoSeries, the following methods should be moved:

  • map
  • dashboard
  • mapNearRecord

Map and dashboard implemented in 9e1322e. mapNearRecord requires MulitpleGeoSeries

@CommonClimate
Copy link
Collaborator

Update: by popular demand, GeoSeries now includes as optional arguments elevation (float), depth (array), SensorType(str), ObservationType(str).

@CommonClimate
Copy link
Collaborator

geoseries done as of #420. Now onto the rest...

@khider
Copy link
Member Author

khider commented May 30, 2023

One of the mapping is missing from this list: mapNearRecord. It was attached to LipdSeries but required to pass a Lipd object to get the other information.

Should we attach it to GeoSeries and pass a MulitpleGeoSeries object to it (which would be the closest to what we have with Lipd/LipdSeries or should we attach it to MultipleGeoSeries and ask the use to pass the name of the GeoSeries to use as reference. The first solution might be the easiest on a user.

Either way, needs to be added to the to-do list.

@khider
Copy link
Member Author

khider commented May 3, 2024

EnsembleGeoSeries:

  • Need a helper function to map a numpy array to GeoSeries axis. - need one for EnsembleSeries

@alexkjames
Copy link
Contributor

alexkjames commented May 10, 2024

EnsembleSeries helper function from_AgeEnsembleArray (use @classfunction):

Needs four things:

  • Values [vector + metadata] (extracted from GeoSeries or Series so we have access to the metadata)
  • Depths associated with values [vector] (should be same length as values)
  • Age ensemble matrix [matrix] (values x realizations)
  • Depths associated with age ensemble matrix [vector] (Same number of values as rows in age ensemble matrix)

A: For Series, two main cases to handle:

  1. Age ensemble matrix already on same depth vector as value vector: Check that shapes align correctly, both depth variables should be None in this case. (this will be the default behavior).
  2. Age ensemble matrix is not aligned to the value vector: Check that all of the shapes align appropriately, then map age ensemble to value vector via depth vectors. Both depth variables should be filled in this case (list or numpy.array, has to be vector).

B: For GeoSeries, four main cases to handle:

Same as above, only now depth can be present in the GeoSeries object.

  1. If depth is present in GeoSeries, but not given for the ensemble, assume Case A1. If depth is present in both GeoSeries and passed as an argument, default to the argument, but print warning (Case A2).
  2. If depth is not present in GeoSeries, and not given elsewhere we assume Case A1. Otherwise, Case A2.

For GeoSeries also accept a depth unit argument and, if depth is being pulled from the GeoSeries, check that the units match (assuming GeoSeries has an associated depth unit).

Check list:

  • Check dimensions
  • Check presence of necessary arguments
  • When appropriate, check that the original time axis and the ensemble time axis look similar-ish, and/or that the depth axes look similar-ish
  • Calculate and return Ensemble, or throw error

Note: Update the lipdutils function that does this with the notebook function and use that for the machinery (do error handling and whatnot separately).

Second Note: Need to think about extrapolation handling at some point. extrap=True should be default, this will be normal np.interp behavior, extrap=False will truncate (in this case set left=np.nan and right=np.nan).

@khider
Copy link
Member Author

khider commented May 14, 2024

@alexkjames I think your latest pull request took care of the majority of the issues (I checked the boxes). One thing: the doctoring example use made up time series, which is fine for this. But one thing that may be missing: we are not checking that all the lat/lon are the same; we are imposing the first one. Is that what we want to do?

@CommonClimate
Copy link
Collaborator

closing this as only #594 is left, and it not does hinder functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

No branches or pull requests

3 participants