In this data competition, the goal was to predict rice harvest based on satellite images from the Sentinel-2 space mission. The training data consisted of the following locations, with the white dots representing the test data:
First, the images were downloaded through STAC's API, using a custom bounding box for each area and applying cloud filtering to capture only the relevant data:
To extract information from the photos, median values (which could also be mean, max, or min) were calculated for each of the satellite bands and for each image. Additionally, the data was resampled to every 5 days because the satellite's passage over the area is not consistent. Missing values due to cloudy days were also interpolated, resulting in the following dataset:
It's worth noting that various metrics were calculated based on the original bands, such as the NDVI (Normalized Difference Vegetation Index), NDWI (Normalized Difference Water Index), NDBSI (Normalized Difference Bare Soil Index), and NDRE (Normalized Difference Red Edge Index). Each of these metrics provides different information about the crop's status:
A regression task on rice yield is performed using the time series as data. Because our data samples are 2D (sequence length, satellite_bands) this task is ideal for a neural network, for which both a CNN and also a Transformer architecture were implemented with the idea of finding patterns in the time series. Additionally, aggregated data in tabular form was incorporated into the neural network, in parallel with the time series, with statistical descriptors of the time series such as standard deviation, skewness, kurtosis, etc.
However, with the aggregated data in tabular form, a gradient boosting algorithm can be used, such as CatBoost, or LightGBM (the 2 best performers for this case). This method turned out to be surprisingly more effective. This could be either because the images, and therefore the time series data, are highly noisy and thus not that useful, or simply the fact that the evolution of these metrics over time is not a determining factor of rice yield.