This project provides a pipeline to build rainfall forecast models. The pipeline can be configured with different meteorological data sources.
In the root directory of this repository, type the following command (you must have conda installed in your system):
./setup.sh
The project pipeline is defined as a sequence of three steps: (1) data retrieving, (2) data pre-processing and (3) model training. These steps are implemented as Python scripts in the ./src
directory.
All datasets retrieved and/or generated by the scripts will be stored in the ./data
folder.
- retrieve_ws_cor.py: This script retrieves observation from a user-provided weather station.
- retrieve_ws_inmet.py: This script retrieves observations for from a user-provided weather station.
- retrieve_as.py: this script retrieves atmospheric sounding data.
- retrieve_ERA5.py: this script retrieves numerical simulation data from the ERA5 portal.
This script will generate atmospheric instability indices for the data retrieveed by the script retrieve_as.py. Data from the SBGL sounding (located at the Galeão Airport, Rio de Janeiro - Brazil) will be used to calculate atmospheric instability indices, generating a new dataset. This new dataset contains one entry per sounding probe. SBGL sounding station produces two probes per day (at 00:00h and 12:00h UTC). Each entry in the produced contains the values of the computed instability indices for one probe. The following instability indices are computed:
- CAPE
- CIN
- Lift
- k
- Total totals
- Show alter
The preprocessing scripts are responsible for performing several operations on the original dataset, such as creating variables or aggregating data, which can be interesting for model training and its final result.
These scripts will build the train, validation and test dataset from the times series produced in the previous steps. These are the datasets to be given as input to the model training step.
The model generation script is responsible for performing the training and exporting the results obtained by the model after testing.
There are several Jupyter Notebooks in the notebooks directory. They were used for initial experiments and explorarory data analisys. These notebooks are not guaranteed to run 100% correctly due to the subsequent code refactor.