This repo is a place holder for the class practice/tuto session and for projects developed by students.
-
To get familiar with data mining concepts
eg: clustering, classification, regression, reduction -
To learn about classic methods & specific vocabulary
eg: KMeans, ANN, features, training sets -
Practice standard analysis workflow
eg: scale, reduce, fit, predict, cross-validate -
To learn how to handle data mining of large datasets
eg: xarray/dask-ml, pyspark, tensorflow
-
Class 1: “Introduction to big data mining for oceanography” (4h00)
Jan. 21st, D104, 9h00-12h00 / 13h30-14h30 -
Class 2:“Identifying patterns: one method in details” (2h00)
Jan. 21st, D104, 14h30-16h30 -
Class 3: Tutorials (6h00)
Jan. 25th, B014, 9h00-12h00 / 13h30-16h30 -
Class 4: Projects (12h00)
Jan. 11st, D109, 13h30-16h30 (3h00)
Feb. 13rd, D109, 9h00-12h00 (3h00)
Feb. 15th, D109, 9h00-12h00 / 13h30-16h30 (6h)
Elements of this class were taken from the xarray, Dask and scikit-learn documentations.
Practice about handling methods for big data and binder config folder are mostly based and inspired from some material already published elsewhere (R. Abernathey at pangeo-tutorial-agu-2018)
The amazing machinery allowing us to conduct our projects in a friendly and effective environment arises from the Pangeo community.
This work is licensed under a Creative Commons Attribution 4.0 International License.