Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add geopackage driver #36

Open
amsnyder opened this issue Dec 21, 2023 · 7 comments
Open

add geopackage driver #36

amsnyder opened this issue Dec 21, 2023 · 7 comments

Comments

@amsnyder
Copy link

I would like to store some geopackage files in an intake catalog, but I don't currently see any drivers for this file type. A pangeo colleague suggested I open an issue here. Do you all have any plans to add this driver? Thanks!

@martindurant
Copy link
Member

What is geopackage, please? I am looking at https://www.geopackage.org/spec/#_sqlite_container (an sqlite3 file with specific conventions and file extension).

Yes, a driver would be fine, but I would rather do it for "v2", currently in development. How do you currently read these data?

@amsnyder
Copy link
Author

I'm not sure how to answer the question about what a geopackage is - I don't know the details of the file format. I can try to help dig up information if I know what you're looking for.

Here is an example of how I would open one:

import geopandas as gpd
import fsspec

fs_read = fsspec.filesystem(
    's3',
    anon=True,
    client_kwargs={'endpoint_url': 'https://usgs.osn.mghpcc.org'}
)

with fs_read.open('hytest/wbd/huc12/huc12.gpkg', mode='rb') as f:
    huc12_basins_geopackage = gpd.read_file(f, layer='huc12', driver="GPKG")  

@martindurant
Copy link
Member

I added the following to the Intake Take2 (v2) branch:

class Geopackage(SQLite):
    filepattern = "gpkg$"

and this allows

In [2]: import intake

In [3]: intake.datatypes.recommend(u, storage_options={'endpoint_url': 'https://usgs.osn.mghpcc.org', 'anon': True}, head=None)
Out[3]: [intake.readers.datatypes.Geopackage]

In [4]: data = intake.readers.datatypes.Geopackage(u, storage_options={'endpoint_url': 'https://usgs.osn.mghpcc.org', 'anon': True})

In [5]: reader = data.to_reader()

In [6]: reader.read()
Out[6]:
                                         TNMID                            METASOURCEID  ... SHAPE_Area                                           geometry
0       {B1EF0C55-72ED-4FF6-A3BA-97A87C6A6C47}                                     NaN  ...   0.004859  MULTIPOLYGON (((-86.15784 31.42164, -86.15783 ...
1       {F0D9874D-52BA-4FDC-A5E6-E259B627764D}                                     NaN  ...   0.014214  MULTIPOLYGON (((-86.18406 31.53503, -86.18406 ...
2       {2E0CB201-5672-45B5-8CA7-A60070122697}                                     NaN  ...   0.009979  MULTIPOLYGON (((-86.29029 31.27059, -86.29089 ...
3       {9D39E120-C6DF-401F-AA8F-1748E9423AA0}                                     NaN  ...   0.009897  MULTIPOLYGON (((-86.30253 31.45077, -86.30251 ...

Making readers is much simpler in V2! This reader object can then be put into a catalog and saved as YAML.

Note on "anon": we trialed having s3fs "fall back" to trying anon in the case that credentials were missing or invalid, but this caused problems for everyone, so it's better to explicitly label datasets that need no creds.

@martindurant
Copy link
Member

Note on head= in recommend(): if this is True (the default) the start of the file gets scanned, and the possible datatypes then includes SQLite.

@amsnyder
Copy link
Author

Awesome, thanks @martindurant. Is there a timeline for when intake v2 will be released?

@martindurant
Copy link
Member

Very alpha is available now as 2.0.0a2 (or .aX, as I have time). I was planning for beta/RC release at the new year, and then full release depending on feedback. I might call the package "intake2" or "take2" for a transitional time (but nor until release).

@ian-r-rose
Copy link
Collaborator

ian-r-rose commented Dec 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants