Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for satstac.ItemCollection #29

Merged
merged 7 commits into from
Dec 6, 2019
Merged

add support for satstac.ItemCollection #29

merged 7 commits into from
Dec 6, 2019

Conversation

jhamman
Copy link
Collaborator

@jhamman jhamman commented Dec 4, 2019

This will allow intake-stac to accept ItemCollection objects as inputs. Importantly, this will facilitate integration with sat-search tooling.

cc @scottyhq & @matthewhanson

try:
name = kwargs.pop("name", self._stac_obj.id)
except AttributeError:
name = str(type(self._stac_obj))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhanson - here is another place where we were tripped up while using a ItemCollection. There is not an id attribute on the class object. Not sure if it is possible, but it would be nice if ItemCollection inherited from satstac.Thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamman In sat-stac an ItemCollection isn't really a STAC ItemCollection, it's more of a Single File STAC as per the extension:
https://github.com/radiantearth/stac-spec/tree/dev/extensions/single-file-stac

The idea behind this was that a self-contained STAC catalog would be like a regular STAC catalog, except as you pointed out there's no id, nor are there any other catalog fields:
https://github.com/radiantearth/stac-spec/blob/dev/catalog-spec/catalog-spec.md#catalog-fields

I've posted an issue for STAC recommending that all the Catalog fields get added to the Single File STAC extension:
radiantearth/stac-spec#691

# use github/sat-utils/sat-stac/master/test/items.json
return satstac.ItemCollection.load(
os.path.join(os.path.dirname(__file__), "items.json") # noqa: E501
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary work around here. See sat-utils/sat-stac#52 for more detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Load the STAC Item Collection.
"""
for item in self._stac_obj:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhanson - I could use a sanity check on this section.

self._stac_obj is a satstac.ItemsCollection. Is this the best way to unpack this into individual items?

Alternatively, we can/should we be grouping these items by collection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure grouping is necessary. The most common use case I think is a catalog with a single collection of data. Additionally, a user might want to group them according to any of the other metadata fields, so I think we can leave it to the user to decide how to sort/use them.

@jhamman
Copy link
Collaborator Author

jhamman commented Dec 5, 2019

This is basically done, pending some review from @matthewhanson. The failing test is due to a futurewarning in pandas. I will fix that shortly. I'd like to get this in asap, ahead of AGU so I will likely merge this by COB tomorrow if I don't hear any objections.

@scottyhq
Copy link
Collaborator

scottyhq commented Dec 6, 2019

Looking good, but I'm running into some issues that I think are related to our assumption that satstac.ItemCollection behaves like the previous satstac.Items that @matthewhanson can help clarify

First some test code

properties =  ["eo:row=027",
               "eo:column=047",
               "landsat:tier=T1"] 
results = Search.search(collection='landsat-8-l1', 
                        sort=['<datetime'], #earliest scene first
                        property=properties)

itemCollection = results.items() #this was previously a satstac.Items object, now it is satstac.ItemCollection
item = itemCollection[0] #satstac.Item

intake_catalog = intake.open_stac_item_collection(itemCollection)
sceneid = list(intake_catalog)[0] #LC80470272019096 (https://earth-search.aws.element84.com/collections/landsat-8-l1/items/LC80470272019096)
entry = intake_catalog[sceneid] #intake.catalog.local.LocalCatalogEntry

@scottyhq
Copy link
Collaborator

scottyhq commented Dec 6, 2019

intake_catalog[sceneid].stack_bands(['B1', 'B2'])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-4c68fdc7051d> in <module>
----> 1 stacked = catalog[sceneid].stack_bands(['B1', 'B2'])

~/Documents/GitHub/intake-stac/intake_stac/catalog.py in stack_bands(self, bands, regrid)
    230         titles = []
    231         assets = self._stac_obj.assets
--> 232         band_info = self._stac_obj.collection().properties.get("eo:bands")
    233 
    234         for band in bands:

AttributeError: 'NoneType' object has no attribute 'properties'

You can see that the collection is defined in itemCollection.save('my-collection.geojson') but within properties: ("properties":{"collection":"landsat-8-l1","eo:gsd":15,"eo:platform":"landsat-8","eo:instrument":"OLI_TIRS","eo:off_nadir":0,"eo:bands": ...)

@matthewhanson - I'm also noticing the returned ItemCollection also has a lot of duplicated 'collection' information. Could the ItemCollection reference a 'collection' key at the top level -
https://earth-search.aws.element84.com/collections/landsat-8-l1)

Specifically the band definitions:

"eo:platform":"landsat-8","eo:instrument":"OLI_TIRS","eo:off_nadir":0,"eo:bands":[{"name":"B1","common_name":"coastal","gsd":30,"center_wavelength":0.44,"full_width_half_max":0.02},{"name":"B2","common_name":"blue","gsd":30,"center_wavelength":0.48,"full_width_half_max":0.06},{"name":"B3","common_name":"green","gsd":30,"center_wavelength":0.56,"full_width_half_max":0.06},{"name":"B4","common_name":"red","gsd":30,"center_wavelength":0.65,"full_width_half_max":0.04},{"name":"B5","common_name":"nir","gsd":30,"center_wavelength":0.86,"full_width_half_max":0.03},{"name":"B6","common_name":"swir16","gsd":30,"center_wavelength":1.6,"full_width_half_max":0.08},{"name":"B7","common_name":"swir22","gsd":30,"center_wavelength":2.2,"full_width_half_max":0.2},{"name":"B8","common_name":"pan","gsd":15,"center_wavelength":0.59,"full_width_half_max":0.18},{"name":"B9","common_name":"cirrus","gsd":30,"center_wavelength":1.37,"full_width_half_max":0.02},{"name":"B10","common_name":"lwir11","gsd":100,"center_wavelength":10.9,"full_width_half_max":0.8},{"name":"B11","common_name":"lwir12","gsd":100,"center_wavelength":12,"full_width_half_max":1}]

This might be intentional to keep every Item/Feature stand-alone...

@jhamman
Copy link
Collaborator Author

jhamman commented Dec 6, 2019

Thanks @scottyhq. This is a useful review. It sounds like a (temporary?) workaround to the band retrieval step could look like this (not tested):

try:
    band_info = self._stac_obj.collection().properties.get("eo:bands")
except AttributeError:
    band_info = self._stac_obj.properties.get("eo:bands")

Ideally though, an satstac.Item should always have the collection/properties features so we don't have to fall back like this.

@scottyhq
Copy link
Collaborator

scottyhq commented Dec 6, 2019

@jhamman - that fix works. A couple other thoughts/questions on how this is structured.

Every local catalog entry is getting the assets assigned as attributes - for example entry.B*? :

entry.B1
entry.B10
entry.B11
entry.B2
entry.B3
entry.B4
...

You can do list(entry), maybe this list should be returned from display(entry) or accessible more transparently as entry.assets? Along those lines, stack_bands can take the common band names defined in STAC metadata (see #32), so it would be great if users could access entry['red']

Also, how can a user specify dask chunks when loading an image from a link? I expected this to work: entry.B1.to_dask(chunks=dict(band=1,x=2048,y=2048))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-73443d1479e9> in <module>
----> 1 entry.B1.to_dask(chunks=dict(band=1,x=2048,y=2048))

TypeError: to_dask() got an unexpected keyword argument 'chunks'

@jhamman
Copy link
Collaborator Author

jhamman commented Dec 6, 2019

@scottyhq - to set chunk size in intake, you (somewhat unintuitively ) invoke the entry. So your example would be:

entry.B1(chunks=dict(band=1,x=2048,y=2048)).to_dask()

@scottyhq
Copy link
Collaborator

scottyhq commented Dec 6, 2019

@jhamman - looks like a simple formatting failure ./intake_stac/catalog.py:236:81: E501 line too long (81 > 80 characters)

I'm for merging this and doing a 0.2.2 release. The other points can be addressed in separate PRs

@jhamman jhamman merged commit 6ce38bf into intake:master Dec 6, 2019
@matthewhanson
Copy link
Collaborator

@jhamman @scottyhq

Regarding the duplicated fields in the Item from the Collection. At the last STAC sprint this was discussed and decided was that best practice would be that a dynamic API should return complete Items, with all the properties that are common across all Items in a Collection. A client can always remove the duplicated information if desired.

Static catalogs on the other hand can benefit from not duplicating info across multiple files so best practice is to not duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants