🔍 Documentation update (#409)

* update landing page of documentation * revise hurdat2 documentation * Include version in generated docs --------- Co-authored-by: Kevin Santana <[email protected]>
Cloud-Drift · Apr 18, 2024 · cfe2300 · cfe2300
1 parent ec1284a
commit cfe2300
Show file tree

Hide file tree

Showing 7 changed files with 39 additions and 21 deletions.
diff --git a/.vscode/tasks.json b/.vscode/tasks.json
@@ -161,7 +161,7 @@
         {
             "label": "[TASK] Build distribution package",
             "type": "shell",
-            "command": "conda run -n clouddrift python -m build",
+            "command": "rm -rf dist/ && conda run -n clouddrift python -m build",
             "group": {
                 "kind": "none",
                 "isDefault": true
@@ -178,7 +178,8 @@
             "label": "[TASK] Install distribution package",
             "type": "shell",
             "dependsOn": ["[TASK] Build distribution package"],
-            "command": "conda run -n clouddrift pip install dist/clouddrift-*.whl",
+            // No cache dir to prevent caching previous builds using the same version number.
+            "command": "conda run -n clouddrift pip uninstall clouddrift -y && conda run -n clouddrift pip install --no-cache-dir dist/clouddrift-*.whl",
             "group": {
                 "kind": "none",
                 "isDefault": true
@@ -195,7 +196,7 @@
             "label": "[TASK] Generate documentation site",
             "type": "shell",
             "dependsOn": ["[TASK] Install distribution package"],
-            "command": "cd docs && make html",
+            "command": "cd docs && make clean && make html",
             "group": {
                 "kind": "none",
                 "isDefault": true

diff --git a/clouddrift/adapters/hurdat2.py b/clouddrift/adapters/hurdat2.py
@@ -1,3 +1,8 @@
+"""
+This module defines functions used to adapt the HURDAT2 cyclone track data as
+a ragged-array dataset.
+"""
+
 import enum
 import os
 import re

diff --git a/clouddrift/datasets.py b/clouddrift/datasets.py
@@ -236,6 +236,8 @@ def hurdat2(basin: _BasinOption = "both", decode_times: bool = True) -> xr.DataA
     xarray.Dataset
         HURDAT2 dataset as a ragged array.
 
+    Standard usage of the dataset.
+
     >>> from clouddrift.datasets import hurdat2
     >>> ds = hurdat2()
     >>> ds
@@ -268,7 +270,8 @@ def hurdat2(basin: _BasinOption = "both", decode_times: bool = True) -> xr.DataA
         summary:          The National Hurricane Center (NHC) conducts a post-sto...
     ...
 
-    If you would like to select a specific ocean basin like the Atlantic Ocean you would do so like this:
+    To only retrieve records for the Atlantic Ocean basin.
+
     >>> from clouddrift.datasets import hurdat2
     >>> ds = hurdat2(basin="atlantic")
     >>> ds

diff --git a/docs/api.rst b/docs/api.rst
@@ -10,14 +10,7 @@ Auto-generated summary of CloudDrift's API. For more details and examples, refer
     :template: module.rst
     :recursive:
 
-    adapters.andro
-    adapters.gdp
-    adapters.gdp1h
-    adapters.gdp6h
-    adapters.glad
-    adapters.mosaic
-    adapters.subsurface_floats
-    adapters.yomaha
+    adapters
     datasets
     kinematics
     pairs

diff --git a/docs/conf.py b/docs/conf.py
@@ -12,13 +12,16 @@
 import os
 import sys
 
+import clouddrift
+
 sys.path.insert(0, os.path.abspath("../.."))
 
 # -- Project information -----------------------------------------------------
 
 project = "CloudDrift"
 copyright = "2022-2023, CloudDrift"
 author = "Philippe Miron"
+version = clouddrift.version
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/docs/datasets.rst b/docs/datasets.rst
@@ -71,6 +71,8 @@ Currently available datasets are:
 - :func:`clouddrift.datasets.yomaha`: The YoMaHa'07 dataset as a ragged array
   processed from the upstream dataset hosted at the `Asia-Pacific Data-Research
   Center (APDRC) <http://apdrc.soest.hawaii.edu/projects/yomaha/>`_.
+- :func:`clouddrift.datasets.hurdat2`: The HURricane DATa 2nd generation (HURDAT2)
+  processed from the upstream dataset hosted at the `NOAA AOML Hurricane Research Devision <https://www.aoml.noaa.gov/hrd/hurdat/Data_Storm.html>`_.
 
 The GDP and the Spotters datasets are accessed lazily, so the data is only downloaded when
 specific array values are referenced. The ANDRO, GLAD, MOSAiC, Subsurface Floats, and YoMaHa'07

diff --git a/docs/index.rst b/docs/index.rst
@@ -1,28 +1,39 @@
 CloudDrift, a platform for accelerating research with Lagrangian climate data
 =============================================================================
 
+Version: |version|
+----------------------
+
 Lagrangian data typically refers to oceanic and atmosphere information acquired by observing platforms drifting with the flow they are embedded within, but also refers more broadly to the data originating from uncrewed platforms, vehicles, and animals that gather data along their unrestricted and often complex paths. Because such paths traverse both spatial and temporal dimensions, Lagrangian data can convolve spatial and temporal information that cannot always readily be organized in common data structures and stored in standard file formats with the help of common libraries and standards.
 
 As such, for both originators and users, Lagrangian data present challenges that the CloudDrift project aims to overcome. This project is funded by the `NSF EarthCube program <https://www.earthcube.org/info>`_ through `EarthCube Capabilities Grant No. 2126413 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2126413>`_.
 
-Motivations
------------
+Scope and Key Features
+----------------------
 
-The `Global Drifter Program (GDP) <https://www.aoml.noaa.gov/phod/gdp/>`_ of the US National Oceanic and Atmospheric Administration has released to date nearly 25,000 drifting buoys, or drifters, with the goal of obtaining observations of oceanic velocity, sea surface temperature, and sea level pressure. From these drifter observations, the GDP generates two data products: one of oceanic variables estimated along drifter trajectories at `hourly <https://www.aoml.noaa.gov/phod/gdp/interpolated/data/all.php>`_ time steps, and one at `six-hourly <https://www.aoml.noaa.gov/phod/gdp/hourly_data.php>`_ steps.
+The scope of the Clouddrift library includes: 
 
-There are a few ways to retrieve the data, but all typically require time-consuming preprocessing steps in order to prepare the data for analysis. As an example, the datasets can be retrieved through an `ERDDAP server <https://data.pmel.noaa.gov/generic/erddap/tabledap/gdp_hourly_velocities.html>`_, but requests are limited in size. The latest `6-hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/netcdf/>`_ is distributed as a collection of thousands of individual NetCDF files or as a series of `ASCII files <https://www.aoml.noaa.gov/phod/gdp/>`_. Until recently, the `hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/hourly/v2.00/netcdf/>`_ was distributed as a collection of individual NetCDF files (17,324 for version 1.04c) but is now distributed by NOAA NCEI as a `single NetCDF file <https://doi.org/10.25921/x46c-3620>`_ containing a series of ragged arrays, thanks to the work of CloudDrift. A single file simplifies data distribution, decreases metadata redundancies, and efficiently stores a Lagrangian data collection of uneven lengths.
+1. **Working with contiguous ragged array representations of data, wether they originate from geosciences or any other field**. Ragged array representations are typically useful when the data lengths of the instances of a feature (variable) are not all equal. With such representations the data for each features are stored contiguously in memory, and the number of elements that each feature has is contained in a count variable which Clouddrift calls *rowsize*. 
+
+2. **Delivering functions and methods to perform scientific analysis of Lagrangian data, oceanographic or otherwise, structured as ragged arrays or otherwise**. A straightforward example of Lagrangian analysis provided by Clouddrift is the derivation of Lagrangian velocities from a sequence of Lagrangian positions, and vice versa. Another more involved example is the discovery of pairs of Lagrangian data prescribed by distances in space and time. Both of these methods are currently available with Clouddrift.
+
+3. **Processing publicly available Lagrangian datasets into the common ragged array data structure and format**. Through data *adapters*, this type of processing includes not only converting Lagrangian data from typically regular arrays to ragged arrays but also aggregating data and metadata from multiple data files into a single data file. The canonical example of the Clouddrift library is constituted of the data from the NOAA Global Drifter Program (see Motivations below).
 
-CloudDrift's analysis functions are centered around the ragged-array data
-structure:
+4. **Making cloud-optimized ragged array datasets easily accessible**. This involves opening in a local computing environment, without unnecessary download, Lagrangian datasets available from cloud servers, as well as opening Lagrangian dataset which have been seamlessly processed by the Clouddrift data *adapters*.    
+
+CloudDrift's analysis functions are principally centered around the ragged-array data structure:
 
 .. image:: img/ragged_array.png
   :width: 800
   :align: center
   :alt: Ragged array schematic
 
-CloudDrift's goals are to simplify the necessary steps to get started with
-Lagrangian datasets and to provide a cloud-ready library to accelerate
-Lagrangian analysis.
+Motivations
+-----------
+
+The `Global Drifter Program (GDP) <https://www.aoml.noaa.gov/phod/gdp/>`_ of the US National Oceanic and Atmospheric Administration has released to date nearly 25,000 drifting buoys, or drifters, with the goal of obtaining observations of oceanic velocity, sea surface temperature, and sea level pressure. From these drifter observations, the GDP generates two data products: one of oceanic variables estimated along drifter trajectories at `hourly <https://www.aoml.noaa.gov/phod/gdp/interpolated/data/all.php>`_ time steps, and one at `six-hourly <https://www.aoml.noaa.gov/phod/gdp/hourly_data.php>`_ steps.
+
+There are a few ways to retrieve the data, but all typically require time-consuming preprocessing steps in order to prepare the data for analysis. As an example, the datasets can be retrieved through an `ERDDAP server <https://data.pmel.noaa.gov/generic/erddap/tabledap/gdp_hourly_velocities.html>`_, but requests are limited in size. The latest `6-hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/netcdf/>`_ is distributed as a collection of thousands of individual NetCDF files or as a series of `ASCII files <https://www.aoml.noaa.gov/phod/gdp/>`_. Until recently, the `hourly dataset <https://www.aoml.noaa.gov/ftp/pub/phod/lumpkin/hourly/v2.00/netcdf/>`_ was distributed as a collection of individual NetCDF files (17,324 for version 1.04c) but is now distributed by NOAA NCEI as a `single NetCDF file <https://doi.org/10.25921/x46c-3620>`_ containing a series of ragged arrays, thanks to the work of CloudDrift. A single file simplifies data distribution, decreases metadata redundancies, and efficiently stores a Lagrangian data collection of uneven lengths.
 
 Getting started
 ---------------