Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic import/export function for Obs and ObsCollection classes #124

Open
tdmeij opened this issue Apr 25, 2023 · 9 comments
Open

Generic import/export function for Obs and ObsCollection classes #124

tdmeij opened this issue Apr 25, 2023 · 9 comments

Comments

@tdmeij
Copy link

tdmeij commented Apr 25, 2023

Would it be possible to add a generic import/export functionality to the Obs and ObsCollection classes? This would make temporarily saving of intemediate results easier for users who would like to avoid the technical challenges of setting up a pystore workflow.

Currently, hydropandas supports import of a lot of different external data sources. However, given the creative skills of software developers that seem to invent a new data format every week, unsupported data formats will probably keep on popping up forever. For instance, Hydropandas current version 0.7.3 doesn't seem to support the WaterWeb, Dawaco or HydroMonitor csv export formats.
Fortunately, it is relatively easy for Hydropandas users to write their own import classes and create Obs and ObsCollection instances from raw data files and process their data. However, given the size and format of data files, reading files can take quite some time. In the current Hydropandas version 0.7.3, I can save data to a json format (using the inherited Pandas method), but I can not read this data directly back into a Obs or ObsCollection instance because an import method is missing from the Obs and ObsCollection classes.

Therefore, it would be convenient to be able to import raw data into an Obs or Obscollection class, resample data to a more manageable frequency, and save these intermediate result to a temporarily file that can be read directly using Obs or ObsCollection methods.

@dbrakenhoff
Copy link
Collaborator

You can pickle the ObsCollection/Observations using the to_pickle() method. Loading these using
pandas (pd.read_pickle()) will give you back the original ObsCollection or Observation. Maybe that solves your issue somewhat?

I'm all for a generic human-readable export format for Observations. I'd suggest some kind of CSV format that includes some information about its Obs type(?). Then I guess we need to the define some kind of header format and then write the time series data below that. If we want to attempt to maintain data types on import, that will be a bit of a challenge. An ObsCollection could then just use that Observation export format to write CSV files for each Observation in the collection.

If anyone else has any suggestions regarding this topic, feel free to post them here.

@tdmeij
Copy link
Author

tdmeij commented Apr 25, 2023

Thank you David, this answers my question. After reading back the pickled object, I even get an ObsCollection object instead of the DataFrame I had expected. Magic still happens, apparently.

@OnnoEbbens
Copy link
Collaborator

Hahaha, I had the same first reaction when the pd.read_pickle() returned an ObsCollection object. It is magic!

There is also a to_excel() method for an ObsCollection. This will create an excel file with one tab with all the metadata and another tab for each observation object with the measurement time series. This is imo the best way to export to a human-readable format. Unfortunately we don't have a read_excel() method yet for an ObsCollection but I think it is not too hard to create one.

@martinvonk
Copy link
Collaborator

Maybe we can create a simple hpd.ObsCollection.from_pickle() method that calls pandas.read_pickle()? To increase findability.

@OnnoEbbens
Copy link
Collaborator

I've added a read_excel and read_pickle function to hydropandas. I updated the example notebook 01_groundwater_observations with calls to the excel and pickle functions for an ObsCollections.

@TomHottentot
Copy link

Rather than open a new issue, I feel it fits better here.

I'm all for a generic human-readable export format for Observations. I'd suggest some kind of CSV format that includes some information about its Obs type(?). Then I guess we need to the define some kind of header format and then write the time series data below that. If we want to attempt to maintain data types on import, that will be a bit of a challenge. An ObsCollection could then just use that Observation export format to write CSV files for each Observation in the collection.

If anyone else has any suggestions regarding this topic, feel free to post them here.

I would like to export and ObsCollection as a HydroMonitor Format. It's a format that fits the above wishes, is used quite widely and is easily human readable. Is this a feature you would consider adding to HydroPandas, or maybe it is already possible?

At the moment HydroPandas is an amazing tool to use and saves me quite a lot of time. However when I need to get the data out of a python environment I struggle to find the best way. I have however managed to get an ObsCollection to IPF conversion working.

@dbrakenhoff
Copy link
Collaborator

Hi @TomHottentot,

Glad to hear you enjoy using hydropandas! As for extra export functionality, we'd absolutely welcome contributions to export ObsCollections to IPF files or HydroMonitor files. This is not something we're likely to implement ourselves soon as we haven't run into a need for that as of yet and we're quite busy at the moment, but we're open to contributions from others!

So if you feel comfortable to add ObsCollection.to_hydromonitor() or ObsCollection.to_ipf() methods we can take a look and get those features added to hydropandas. If you're less comfortable with Github and submitting Pull Requests, you can also attach some script/code (that writes those files based on multiple time series) in this issue and we can work that into hydropandas.

Let us know if any of the above options work for you? I'll reopen this issue as a reminder for this work.

@dbrakenhoff dbrakenhoff reopened this Aug 2, 2024
@TomHottentot
Copy link

Hi, @dbrakenhoff

I would be happy to give it a go, but I'm still quite new to both python and github, so it might take a while. I will start by rewriting my to_ipf python code to a function. If I get that working I'll try the hydromonitor as well.

@dbrakenhoff
Copy link
Collaborator

Hi @TomHottentot, no worries, feel free to post your code/or questions here if you have any questions or could use some help. Once you have it in function form, and the data in pandas Series or DataFrames, the step to getting it implemented in hydropandas isn't too big :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants