Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate intake-stac and geopandas #36

Open
scottyhq opened this issue Dec 16, 2019 · 3 comments
Open

How to integrate intake-stac and geopandas #36

scottyhq opened this issue Dec 16, 2019 · 3 comments
Labels
enhancement New feature or request

Comments

@scottyhq
Copy link
Collaborator

Converting STAC catalogs (JSON) to intake catalogs is great for facilitating browsing images with intake.gui and loading remote data directly into xarray objects. When loading data returned from a dynamic API like sat-search (radiantearth/stac-spec#691), we could also take advantage of Geopandas for querying the catalog and visualizing metadata, but currently (0.2.2) the integration is a bit awkward:

properties =  ["eo:row=027",
               "eo:column=047",
               "landsat:tier=T1"] 
results = satsearch.Search.search(collection='landsat-8-l1', 
                        sort=['<datetime'], #earliest scene first
                        property=properties)
items = results.items()

It would be great to easily load results.items() into a geodataframe directly

results.items().geojson() # returns a python dictionary
#gf = gpd.GeoDataFrame(results.items().geojson()) #ValueError: arrays must all be same length

This works and provides a very convenient tabular HTML display

items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')
display(gf)

Screenshot 2019-12-15 17 04 20

We could consider adding the same geopandas HTML view directly to the catalog and LocalCatalogEntry objects

# prints <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>>
cat = intake.open_stac_item_collection(results.items())
display(cat)

# also HTML table?
display(cat[sceneid].metadata)

Screenshot 2019-12-15 17 05 47

Enabling geopandas-like methods on an Intake catalog: <class 'satstac.itemcollection.ItemCollection would be very useful! Maybe it's best to just add a cat.to_geopandas() function to enable things like the following?:

gf = cat.to_geopandas()
gf.iloc[0]
gf[gf["eo:cloud_cover"] < 50]
#gf.query("eo:cloud_cover < 50") # Geopandas doesn't like ":" in column names
gf.rename(columns={"eo:cloud_cover": "cloud_cover"}).query("cloud_cover < 50")
gf.plot() #plots footprint polygons of STAC items
@jhamman
Copy link
Collaborator

jhamman commented Dec 16, 2019

I think this is a great idea. Intake-esm does something similar, providing a pandas DataFrame as an attribute on the catalog. This allows intake-esm to use Pandas query logic for search/subsetting of the catalog. The awkward part is figuring out how to move between the two forms of the catalog (intake and DataFrame).

Perhaps a good starting point would be to_/from_geopandas methods for intake-stac.

@andersy005
Copy link
Member

What is the expected behavior of from_geopandas() method? As I was working on implementing this method, it occurred to me that when the following is performed:

items.save('subset.geojson')
gf = gpd.read_file('subset.geojson')

the created GeoDataframe doesn't include all the information present in the subset.geojson file, specifically the extra information in the assets sections of the GeoJSON:

"assets": {
            "index": {
                "type": "text/html",
                "title": "HTML index page",
                "href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_MTL.txt"
            },
            "thumbnail": {
                "title": "Thumbnail image",
                "type": "image/jpeg",
                "href": "https://landsat-pds.s3.amazonaws.com/c1/L8/162/068/LC08_L1TP_162068_20190530_20190605_01_T1/LC08_L1TP_162068_20190530_20190605_01_T1_thumb_large.jpg"
            },

I presume that this is extra information (useful in STAC context), but unrecognizable to GeoPandas. So, how do we construct the <Intake catalog: <class 'satstac.itemcollection.ItemCollection'>> from a GeoDataFrame that doesn't have all necessary information? Am I missing something?

@jhamman jhamman added the enhancement New feature or request label Mar 3, 2020
@scottyhq
Copy link
Collaborator Author

scottyhq commented Nov 20, 2020

Just wanted to note the current syntax here is giving a futurewarning:

if crs is None:
crs = {'init': 'epsg:4326'}

FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

it's an easy fix, we just need to chage to crs='EPSG:4326'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants