Skip to content

Commit

Permalink
🔍 Documentation update (#409)
Browse files Browse the repository at this point in the history
* update landing page of documentation
* revise hurdat2 documentation
* Include version in generated docs

---------

Co-authored-by: Kevin Santana <[email protected]>
  • Loading branch information
selipot and kevinsantana11 authored Apr 18, 2024
1 parent ec1284a commit cfe2300
Show file tree
Hide file tree
Showing 7 changed files with 39 additions and 21 deletions.
7 changes: 4 additions & 3 deletions .vscode/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
{
"label": "[TASK] Build distribution package",
"type": "shell",
"command": "conda run -n clouddrift python -m build",
"command": "rm -rf dist/ && conda run -n clouddrift python -m build",
"group": {
"kind": "none",
"isDefault": true
Expand All @@ -178,7 +178,8 @@
"label": "[TASK] Install distribution package",
"type": "shell",
"dependsOn": ["[TASK] Build distribution package"],
"command": "conda run -n clouddrift pip install dist/clouddrift-*.whl",
// No cache dir to prevent caching previous builds using the same version number.
"command": "conda run -n clouddrift pip uninstall clouddrift -y && conda run -n clouddrift pip install --no-cache-dir dist/clouddrift-*.whl",
"group": {
"kind": "none",
"isDefault": true
Expand All @@ -195,7 +196,7 @@
"label": "[TASK] Generate documentation site",
"type": "shell",
"dependsOn": ["[TASK] Install distribution package"],
"command": "cd docs && make html",
"command": "cd docs && make clean && make html",
"group": {
"kind": "none",
"isDefault": true
Expand Down
5 changes: 5 additions & 0 deletions clouddrift/adapters/hurdat2.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This module defines functions used to adapt the HURDAT2 cyclone track data as
a ragged-array dataset.
"""

import enum
import os
import re
Expand Down
5 changes: 4 additions & 1 deletion clouddrift/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ def hurdat2(basin: _BasinOption = "both", decode_times: bool = True) -> xr.DataA
xarray.Dataset
HURDAT2 dataset as a ragged array.
Standard usage of the dataset.
>>> from clouddrift.datasets import hurdat2
>>> ds = hurdat2()
>>> ds
Expand Down Expand Up @@ -268,7 +270,8 @@ def hurdat2(basin: _BasinOption = "both", decode_times: bool = True) -> xr.DataA
summary: The National Hurricane Center (NHC) conducts a post-sto...
...
If you would like to select a specific ocean basin like the Atlantic Ocean you would do so like this:
To only retrieve records for the Atlantic Ocean basin.
>>> from clouddrift.datasets import hurdat2
>>> ds = hurdat2(basin="atlantic")
>>> ds
Expand Down
9 changes: 1 addition & 8 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,7 @@ Auto-generated summary of CloudDrift's API. For more details and examples, refer
:template: module.rst
:recursive:

adapters.andro
adapters.gdp
adapters.gdp1h
adapters.gdp6h
adapters.glad
adapters.mosaic
adapters.subsurface_floats
adapters.yomaha
adapters
datasets
kinematics
pairs
Expand Down
3 changes: 3 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,16 @@
import os
import sys

import clouddrift

sys.path.insert(0, os.path.abspath("../.."))

# -- Project information -----------------------------------------------------

project = "CloudDrift"
copyright = "2022-2023, CloudDrift"
author = "Philippe Miron"
version = clouddrift.version

# -- General configuration ---------------------------------------------------

Expand Down
2 changes: 2 additions & 0 deletions docs/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ Currently available datasets are:
- :func:`clouddrift.datasets.yomaha`: The YoMaHa'07 dataset as a ragged array
processed from the upstream dataset hosted at the `Asia-Pacific Data-Research
Center (APDRC) <http://apdrc.soest.hawaii.edu/projects/yomaha/>`_.
- :func:`clouddrift.datasets.hurdat2`: The HURricane DATa 2nd generation (HURDAT2)
processed from the upstream dataset hosted at the `NOAA AOML Hurricane Research Devision <https://www.aoml.noaa.gov/hrd/hurdat/Data_Storm.html>`_.

The GDP and the Spotters datasets are accessed lazily, so the data is only downloaded when
specific array values are referenced. The ANDRO, GLAD, MOSAiC, Subsurface Floats, and YoMaHa'07
Expand Down
29 changes: 20 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,39 @@
CloudDrift, a platform for accelerating research with Lagrangian climate data
=============================================================================

Version: |version|
----------------------

Lagrangian data typically refers to oceanic and atmosphere information acquired by observing platforms drifting with the flow they are embedded within, but also refers more broadly to the data originating from uncrewed platforms, vehicles, and animals that gather data along their unrestricted and often complex paths. Because such paths traverse both spatial and temporal dimensions, Lagrangian data can convolve spatial and temporal information that cannot always readily be organized in common data structures and stored in standard file formats with the help of common libraries and standards.

As such, for both originators and users, Lagrangian data present challenges that the CloudDrift project aims to overcome. This project is funded by the `NSF EarthCube program <https://www.earthcube.org/info>`_ through `EarthCube Capabilities Grant No. 2126413 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2126413>`_.

Motivations
-----------
Scope and Key Features
----------------------

The `Global Drifter Program (GDP) <https://www.aoml.noaa.gov/phod/gdp/>`_ of the US National Oceanic and Atmospheric Administration has released to date nearly 25,000 drifting buoys, or drifters, with the goal of obtaining observations of oceanic velocity, sea surface temperature, and sea level pressure. From these drifter observations, the GDP generates two data products: one of oceanic variables estimated along drifter trajectories at `hourly <https://www.aoml.noaa.gov/phod/gdp/interpolated/data/all.php>`_ time steps, and one at `six-hourly <https://www.aoml.noaa.gov/phod/gdp/hourly_data.php>`_ steps.
The scope of the Clouddrift library includes:

There are a few ways to retrieve the data, but all typically require time-consuming preprocessing steps in order to prepare the data for analysis. As an example, the datasets can be retrieved through an `ERDDAP server <https://data.pmel.noaa.gov/generic/erddap/tabledap/gdp_hourly_velocities.html>`_, but requests are limited in size. The latest `6-hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/netcdf/>`_ is distributed as a collection of thousands of individual NetCDF files or as a series of `ASCII files <https://www.aoml.noaa.gov/phod/gdp/>`_. Until recently, the `hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/hourly/v2.00/netcdf/>`_ was distributed as a collection of individual NetCDF files (17,324 for version 1.04c) but is now distributed by NOAA NCEI as a `single NetCDF file <https://doi.org/10.25921/x46c-3620>`_ containing a series of ragged arrays, thanks to the work of CloudDrift. A single file simplifies data distribution, decreases metadata redundancies, and efficiently stores a Lagrangian data collection of uneven lengths.
1. **Working with contiguous ragged array representations of data, wether they originate from geosciences or any other field**. Ragged array representations are typically useful when the data lengths of the instances of a feature (variable) are not all equal. With such representations the data for each features are stored contiguously in memory, and the number of elements that each feature has is contained in a count variable which Clouddrift calls *rowsize*.

2. **Delivering functions and methods to perform scientific analysis of Lagrangian data, oceanographic or otherwise, structured as ragged arrays or otherwise**. A straightforward example of Lagrangian analysis provided by Clouddrift is the derivation of Lagrangian velocities from a sequence of Lagrangian positions, and vice versa. Another more involved example is the discovery of pairs of Lagrangian data prescribed by distances in space and time. Both of these methods are currently available with Clouddrift.

3. **Processing publicly available Lagrangian datasets into the common ragged array data structure and format**. Through data *adapters*, this type of processing includes not only converting Lagrangian data from typically regular arrays to ragged arrays but also aggregating data and metadata from multiple data files into a single data file. The canonical example of the Clouddrift library is constituted of the data from the NOAA Global Drifter Program (see Motivations below).

CloudDrift's analysis functions are centered around the ragged-array data
structure:
4. **Making cloud-optimized ragged array datasets easily accessible**. This involves opening in a local computing environment, without unnecessary download, Lagrangian datasets available from cloud servers, as well as opening Lagrangian dataset which have been seamlessly processed by the Clouddrift data *adapters*.

CloudDrift's analysis functions are principally centered around the ragged-array data structure:

.. image:: img/ragged_array.png
:width: 800
:align: center
:alt: Ragged array schematic

CloudDrift's goals are to simplify the necessary steps to get started with
Lagrangian datasets and to provide a cloud-ready library to accelerate
Lagrangian analysis.
Motivations
-----------

The `Global Drifter Program (GDP) <https://www.aoml.noaa.gov/phod/gdp/>`_ of the US National Oceanic and Atmospheric Administration has released to date nearly 25,000 drifting buoys, or drifters, with the goal of obtaining observations of oceanic velocity, sea surface temperature, and sea level pressure. From these drifter observations, the GDP generates two data products: one of oceanic variables estimated along drifter trajectories at `hourly <https://www.aoml.noaa.gov/phod/gdp/interpolated/data/all.php>`_ time steps, and one at `six-hourly <https://www.aoml.noaa.gov/phod/gdp/hourly_data.php>`_ steps.

There are a few ways to retrieve the data, but all typically require time-consuming preprocessing steps in order to prepare the data for analysis. As an example, the datasets can be retrieved through an `ERDDAP server <https://data.pmel.noaa.gov/generic/erddap/tabledap/gdp_hourly_velocities.html>`_, but requests are limited in size. The latest `6-hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/netcdf/>`_ is distributed as a collection of thousands of individual NetCDF files or as a series of `ASCII files <https://www.aoml.noaa.gov/phod/gdp/>`_. Until recently, the `hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/hourly/v2.00/netcdf/>`_ was distributed as a collection of individual NetCDF files (17,324 for version 1.04c) but is now distributed by NOAA NCEI as a `single NetCDF file <https://doi.org/10.25921/x46c-3620>`_ containing a series of ragged arrays, thanks to the work of CloudDrift. A single file simplifies data distribution, decreases metadata redundancies, and efficiently stores a Lagrangian data collection of uneven lengths.

Getting started
---------------
Expand Down

0 comments on commit cfe2300

Please sign in to comment.