-
Notifications
You must be signed in to change notification settings - Fork 3
/
Chapter8_modeltime.qmd
203 lines (161 loc) · 6.06 KB
/
Chapter8_modeltime.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: "Chapter 8 | Conformal Prediction for Time Series and Forecasting"
author: "frankiethull"
format: gfm
---
## Chapter 8 to Practical Guide to Applied Conformal Prediction in **R**:
The following code is based on the recent book release: *Practical Guide to Applied Conformal Prediction in Python*. After posting a fuzzy GIF on X & receiving a lot of requests for a blog or Github repo, below is Chapter 8 of the practical guide with applications in R, instead of Python.
While the book is not free, the Python code is open-source and a located at the following github repo:
*https://github.com/PacktPublishing/Practical-Guide-to-Applied-Conformal-Prediction/blob/main/Chapter_08_NixtlaStatsforecastipynb*
While this is not copy/paste direct replica of the python notebook or book, this is a lite, supplemental R guide, & documentation for R users.
We will follow the example of time series and forecasting using fable & conformal prediction intervals using the modeltime package.
### R setup for fable & modeltime:
```{r}
# using tidymodel framework:
library(tidymodels) # ml modeling api
library(modeltime) # tidy time series
library(fable) # tidy time series
library(timetk) # temporal kit
library(tsibble) # temporal kit
library(dplyr) # pliers keep it tidy
library(ggplot2) # data viz
library(reticulate) # pass the python example dataset :)
library(doParallel) # model tuning made fast
```
### Load the dataset
```{r}
train = read.csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly.csv')
test = read.csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly-test.csv')
```
```{r}
train |> head()
```
### Train the models
we will only use the first 4 series of the dataset to reduce the total computational time.
```{r}
n_series <- 4
uids <- paste0("H", seq(1:n_series))
train <- train |> filter(unique_id %in% uids) |> group_by(unique_id)
test <- test |> filter(unique_id %in% uids)
```
```{r}
train |>
ggplot() +
geom_line(aes(x = ds, y = y, color = "train")) +
geom_line(inherit.aes = FALSE,
data = test,
aes(x = ds, y = y, color = "test")) +
facet_wrap(~unique_id, scales = "free") +
theme_minimal() +
theme(
legend.position = "top"
) +
labs(subtitle = "data split")
```
#### Create a list of models using fable
for this example we are using fable library
fable is a 'tidy' version of the forecast library.
Both are user-friendly & have accompanying books (fpp2 & fpp3 by rob hyndman).
##### plot prediction intervals
```{r}
train_fbl <- train |> tsibble::as_tsibble(index = ds, key = unique_id)
test_fbl <- test |> tsibble::as_tsibble(index = ds, key = unique_id)
train_fbl |>
model(
ets = ETS(y),
naive = NAIVE(y),
rw = RW(y),
snaive = SNAIVE(y)
) |>
forecast(new_data = test_fbl) |>
autoplot() +
geom_line(inherit.aes = FALSE,
data = train_fbl,
aes(x = ds, y = y, color = "train")) +
theme_minimal() +
labs(subtitle = "{fable} predictions")
```
```{r}
train_fbl |>
model(
auto_arima = ARIMA(y)
) |>
forecast(new_data = test_fbl) |>
autoplot() +
geom_line(inherit.aes = FALSE,
data = train_fbl,
aes(x = ds, y = y, color = "train")) +
theme_minimal() +
labs(subtitle = "AutoARIMA via {fable}")
```
The next section will switch to a modeltime workflow. modeltime is the tidymodels for time series.
#### Conformal Prediction with modeltime
There are two methods for conformal prediction in modeltime, it is the only tidy timeseries library I know of that supports conformal prediction options internally and by default.
The default method is quantile method but there is an option for split method as well.
##### train models
```{r}
# let's use for one location:
mt_train <- train |> filter(unique_id == uids[[1]]) |> mutate(ds = as.Date(ds))
mt_test <- test |> filter(unique_id == uids[[1]]) |> mutate(ds = as.Date(ds))
# ETS
ets_fit <- exp_smoothing(seasonal_period = 24) |>
set_engine("ets") |>
fit(y ~ ds, data = mt_train)
# Auto ARIMA
arima_fit <- arima_reg(seasonal_period = 24) |>
set_engine("auto_arima") |>
fit(y ~ ds, data = mt_train)
# XGB
xgb_fit <- boost_tree("regression") |>
set_engine("xgboost") |>
fit(y ~ ds, data = mt_train)
# modeltime workflow
modtime_fcst <-
modeltime_calibrate(
modeltime_table(
xgb_fit,
arima_fit,
ets_fit
),
new_data = mt_test,
quiet = FALSE,
id = "unique_id"
) |>
modeltime_forecast(
new_data = mt_test,
conf_interval = 0.80,
conf_method = "conformal_default",
conf_by_id = TRUE,
keep_data = TRUE
)
```
##### plot prediction intervals
```{r}
modtime_fcst |>
ggplot() +
geom_ribbon(aes(x = ds, ymin = .conf_lo, ymax = .conf_hi, fill = .model_desc),
alpha = 0.5) +
geom_line(aes(x = ds, y = .value, color = .model_desc)) +
geom_line(inherit.aes = FALSE,
data = mt_train,
aes(x = as.Date(ds), y = y, color = "train")) +
facet_wrap(~unique_id, scales = "free") +
theme_minimal() +
theme(legend.position = "top") +
labs(subtitle = "{modeltime} Default Conformal Prediction Intervals")
```
```{r}
modtime_fcst |>
filter(stringr::str_detect(.model_desc, "ARIMA")) |>
ggplot() +
geom_ribbon(aes(x = ds, ymin = .conf_lo, ymax = .conf_hi, fill = "ARIMA"),
alpha = 0.5) +
geom_line(aes(x = ds, y = .value, color = "ARIMA")) +
geom_line(inherit.aes = FALSE,
data = mt_train |> tail(-500),
aes(x = as.Date(ds), y = y, color = "train")) +
facet_wrap(~unique_id, scales = "free") +
theme_minimal() +
theme(legend.position = "top") +
labs(subtitle = "{modeltime} Default Conformal Prediction Intervals with ARIMA")
```