-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for satstac.ItemCollection #29
Conversation
try: | ||
name = kwargs.pop("name", self._stac_obj.id) | ||
except AttributeError: | ||
name = str(type(self._stac_obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewhanson - here is another place where we were tripped up while using a ItemCollection
. There is not an id
attribute on the class object. Not sure if it is possible, but it would be nice if ItemCollection
inherited from satstac.Thing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhamman In sat-stac an ItemCollection isn't really a STAC ItemCollection, it's more of a Single File STAC as per the extension:
https://github.com/radiantearth/stac-spec/tree/dev/extensions/single-file-stac
The idea behind this was that a self-contained STAC catalog would be like a regular STAC catalog, except as you pointed out there's no id
, nor are there any other catalog fields:
https://github.com/radiantearth/stac-spec/blob/dev/catalog-spec/catalog-spec.md#catalog-fields
I've posted an issue for STAC recommending that all the Catalog fields get added to the Single File STAC extension:
radiantearth/stac-spec#691
# use github/sat-utils/sat-stac/master/test/items.json | ||
return satstac.ItemCollection.load( | ||
os.path.join(os.path.dirname(__file__), "items.json") # noqa: E501 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary work around here. See sat-utils/sat-stac#52 for more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this and release 0.3.1:
https://github.com/sat-utils/sat-stac/blob/master/CHANGELOG.md#v031---2019-12-06
""" | ||
Load the STAC Item Collection. | ||
""" | ||
for item in self._stac_obj: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewhanson - I could use a sanity check on this section.
self._stac_obj
is a satstac.ItemsCollection
. Is this the best way to unpack this into individual items?
Alternatively, we can/should we be grouping these items by collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure grouping is necessary. The most common use case I think is a catalog with a single collection of data. Additionally, a user might want to group them according to any of the other metadata fields, so I think we can leave it to the user to decide how to sort/use them.
This is basically done, pending some review from @matthewhanson. The failing test is due to a futurewarning in pandas. I will fix that shortly. I'd like to get this in asap, ahead of AGU so I will likely merge this by COB tomorrow if I don't hear any objections. |
Looking good, but I'm running into some issues that I think are related to our assumption that First some test code properties = ["eo:row=027",
"eo:column=047",
"landsat:tier=T1"]
results = Search.search(collection='landsat-8-l1',
sort=['<datetime'], #earliest scene first
property=properties)
itemCollection = results.items() #this was previously a satstac.Items object, now it is satstac.ItemCollection
item = itemCollection[0] #satstac.Item
intake_catalog = intake.open_stac_item_collection(itemCollection)
sceneid = list(intake_catalog)[0] #LC80470272019096 (https://earth-search.aws.element84.com/collections/landsat-8-l1/items/LC80470272019096)
entry = intake_catalog[sceneid] #intake.catalog.local.LocalCatalogEntry |
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-62-4c68fdc7051d> in <module>
----> 1 stacked = catalog[sceneid].stack_bands(['B1', 'B2'])
~/Documents/GitHub/intake-stac/intake_stac/catalog.py in stack_bands(self, bands, regrid)
230 titles = []
231 assets = self._stac_obj.assets
--> 232 band_info = self._stac_obj.collection().properties.get("eo:bands")
233
234 for band in bands:
AttributeError: 'NoneType' object has no attribute 'properties' You can see that the collection is defined in @matthewhanson - I'm also noticing the returned ItemCollection also has a lot of duplicated 'collection' information. Could the ItemCollection reference a 'collection' key at the top level - Specifically the band definitions:
This might be intentional to keep every Item/Feature stand-alone... |
Thanks @scottyhq. This is a useful review. It sounds like a (temporary?) workaround to the band retrieval step could look like this (not tested): try:
band_info = self._stac_obj.collection().properties.get("eo:bands")
except AttributeError:
band_info = self._stac_obj.properties.get("eo:bands") Ideally though, an satstac.Item should always have the collection/properties features so we don't have to fall back like this. |
@jhamman - that fix works. A couple other thoughts/questions on how this is structured. Every local catalog entry is getting the assets assigned as attributes - for example
You can do list(entry), maybe this list should be returned from display(entry) or accessible more transparently as Also, how can a user specify dask chunks when loading an image from a link? I expected this to work: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-73443d1479e9> in <module>
----> 1 entry.B1.to_dask(chunks=dict(band=1,x=2048,y=2048))
TypeError: to_dask() got an unexpected keyword argument 'chunks' |
@scottyhq - to set chunk size in intake, you (somewhat unintuitively ) invoke the entry. So your example would be: entry.B1(chunks=dict(band=1,x=2048,y=2048)).to_dask() |
@jhamman - looks like a simple formatting failure I'm for merging this and doing a 0.2.2 release. The other points can be addressed in separate PRs |
Regarding the duplicated fields in the Item from the Collection. At the last STAC sprint this was discussed and decided was that best practice would be that a dynamic API should return complete Items, with all the properties that are common across all Items in a Collection. A client can always remove the duplicated information if desired. Static catalogs on the other hand can benefit from not duplicating info across multiple files so best practice is to not duplicate. |
This will allow intake-stac to accept ItemCollection objects as inputs. Importantly, this will facilitate integration with sat-search tooling.
cc @scottyhq & @matthewhanson