-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intake-STAC with NASA CMR STAC proxy: Authentication #60
Comments
Since there will be lots of valuable data like this that is not in a cloud-optimized data store and format, I think it makes sense to have a
thoughts @matthewhanson @jhamman @apawloski @martindurant ? |
What is contained in the .netcdf file, is it user/password for the HTTP call? In general, you can use fsspec.open_local and a URL containing caching (or a local path), and get an experience on par with other fsspec operations. Parallel downloading of multiple files should not be far off either. |
cat ~/.netrc looks like this:
It looks like the url = item['data'].urlpath
with open('test.nc', 'wb') as f:
resp = requests.get(url)
f.write(resp.content) |
fsspec uses aiohttp, not requests, so maybe that's why it's not getting picked up automatically. In this case, it should work like
or
Actually, after a little reading, it seems that aiohttp does support this, if the client is passed |
related PR over in sat-stac sat-utils/sat-stac#62 |
Hi @martindurant - after trying a few other approaches to see how this works behind the scenes I'm a bit confused. The following code works using aiohttp directly: import aiohttp
url = item['data'].urlpath
auth=aiohttp.BasicAuth(username,password)
async with aiohttp.ClientSession(auth=auth) as session:
async with session.get(url) as resp:
print(resp.status)
with open('local.nc', 'wb') as f:
f.write(await resp.read()) I can't seem to get the ~/.netrc picked up, reading the PR you linked to and docs, maybe there is a separate workflow dealing with proxies that this gets into, because the following returns async with aiohttp.ClientSession(trust_env=True) as session:
async with session.get(url) as resp:
print(resp.text)
with open('local.nc', 'wb') as f:
f.write(await resp.read()) If I use fssepc as you suggested with the following i get a (username, account, password) = netrc.netrc().authenticators("urs.earthdata.nasa.gov")
auth=(username,password)
of = fsspec.open(url, "rb", auth=(username, password))
with of as remote:
with open('local.nc', 'wb') as local:
local.write(remote.read()) Finally, I thought this might work, but I get a fs = fsspec.filesystem("http", auth=aiohttp.BasicAuth(username,password))
with fs.open(url, "rb") as remote:
with open('local.nc', 'wb') as local:
local.write(remote.read()) Interestingly for the last case, the traceback provides a link that if I click on in my browser the download works!? ClientResponseError: 401, message='Unauthorized', url=URL('https://urs.earthdata.nasa.gov/oauth/authorize?app_type=401&client_id=iwntGSgHy9yoog7Mjag0dQ&response_type=code&redirect_uri=https://grfn.asf.alaska.edu/door/oauth&state=aHR0cDovL2dyZm4uYXNmLmFsYXNrYS5lZHUvZG9vci9kb3dubG9hZC9TMS1HVU5XLUEtUi0wODctdG9wcy0yMDE0MTAyM18yMDE0MTAxMS0xNTM4NTYtMjc1NDVOXzI1NDY0Ti1QUC0xYTFhLXYyXzBfMi5uYw') Could you please advise on how to use fsspec directly? And where would be best to implement the reading of credentials (intake,fsspec,aiohttp,intake-stac?) from ~/.netrc so that a user doesn't have to write code to load them? |
The HttpFileSystem ought to have an option, so that you can pass the trust_env parameter - although it seems maybe that isn't working for you. I've never heard of .netrc before, but it doesn't sound stac-specific. If we can't get aiohttp to find and use it automatically, then fsspec would be the place to handle it. Is there any chance you can share some creds privately so that I can test what works? |
thanks for you help @martindurant ! there are definitely two things to figure out: 1) how to correctly pass username and password explicitly to httpfilesystem (the last code block seems close!) and 2) getting the netrc read correctly behind the scenes. I can send you creds via keybase or however you prefer, it's also easy to register (https://urs.earthdata.nasa.gov/home) this is NASA's standard login which anyone can sign up for w/ some basic info. |
OK, I can sign up - but I won't get to this until next week now. |
It turns out, if you manually follow the redirect - i.e., apply the auth again to the generated URL - you can get the file. I feel like I'm getting somewhere. |
With fsspec/filesystem_spec#400 , you can do
I don't know why passing in the |
Thanks @martindurant !
There definitely is something odd with how aiohttp handles the netrc auth. Short of opening an issue upstream, I'm wondering if in fsspec we could have an option that generates the I'm still unclear about how to get this into intake-stac/intake_stac/catalog.py Line 15 in 0fcde70
from intake import open_stac_catalog
catalog_url = 'https://raw.githubusercontent.com/cholmes/sample-stac/master/stac/catalog.json'
cat = open_stac_catalog(catalog_url, netrc_auth="urs.earthdata.nasa.gov") Such that whenever a user opens a file, the auth settings are in place: item = catalog['myitem']
da = item['data'].to_dask() |
Seem like it needs to migrate to this ilne, where we know the URL, and can do the login lookup. That should be the default, but probably the user should be able to override. |
The newer fsspec 0.8.0 uses aiohttp for http requests, and that breaks the netrc authentication to the Earthdata site. Using fsspec 0.8.2 helps a bit, but still throws an error like "ClientResponseError: 401, message='Unauthorized', url=URL('https://urs.earthdata.nasa.gov/oauth/authorize?app_type=401...". Need to figure out how to inject the credentials into the intake_xarray/intake/fsspec/aiohttp stack somehow, but need to temporarily downgrading for now. See also relevant discussion on intake/intake-stac#60.
The |
As part of STAC-sprint 6 I was trying out intake-stac with https://github.com/nasa/cmr-stac. It would be absolutely amazing to integrate intake-stac with that endpoint to facilitate working with NASA datasets! But there multiple things to work out. First and foremost is how to deal with Authentication.
Unlike boto3 cloud credentials, NASA uses and 'Earthdata login' (https://urs.earthdata.nasa.gov/documentation). Typically, science users keep their username and password in a ~/.netrc file for any time you try to retrieve a file. This mechanism doesn't currently work with the intake-stac .to_dask() method. For example:
Leads to a big traceback:
Full example here: https://gist.github.com/scottyhq/04fe1e2d0b946b97228f6922cf001bbd
The text was updated successfully, but these errors were encountered: