-
Notifications
You must be signed in to change notification settings - Fork 4
/
how_much_testing.Rmd
271 lines (231 loc) · 10 KB
/
how_much_testing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# How much testing?
```{r include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE, cache=TRUE)
```
```{r init}
library(sars2pack)
library(ggplot2)
library(dplyr)
```
## Outline of the problem
The COVID-19 pandemic has disrupted daily life throughout the
world. Without a vaccine to confer immunity and lacking effective
therapies once infected, public health measures such as social
distancing, contact tracing, and case surveillance rule the day with
respect to mitigating impacts of the disease on communities. As
individual countries emerge from variable levels of lockdown,
community testing to detect cases as quickly and thoroughly as
possible is a recognized component of controlling the pandemic.
There is considerable agreement that widespread testing is a required
component of moving beyond stay-at-home orders. The World Health
Organization (WHO) has highlighted the need for extensive and
widespread testing. Tedros Ghebreyesus, the chief executive of WHO,
has suggested “You cannot fight a fire blindfolded. Our key message is
test, test, test” [@who-bh]. Robert Gallo, director of the Institute
of Human Virology at the University of Maryland School of Medicine "is
absolutely essential to control the epidemic"
[@Brumfiel2020-qe]. Emily Gurley, an associate scientist at the Johns
Hopkins Bloomberg School of Public Health told NPR [@Brumfiel2020-qe],
"Everyone staying home is just a very blunt measure. That's what you
say when you've got really nothing else. Being able to test folks is
really the linchpin in getting beyond what we're doing now." Philip
J. Rosenthal describes how early application of diagnostic testing
lead to strong disease control in some countries [@Rosenthal2020-zs].
So, how much testing is enough? Michael Ryan, executive director of
the WHO Health Emergencies Program suggests that, "We would certainly
like to see countries testing at the level of ten negative tests to
one positive, as a general benchmark of a system that's doing enough
testing to pick up all cases." [@Huang2020-iz] For particularly
high-risk communities such as the elderly or those who are expected to
come into contact with others regularly, aiming for a much lower
proportion of positive test results is appropriate so as to capture
the highest possible proportion of infected and infectious
individuals.
Here, we present an intuitive and principled approach to visualizing
comparative testing data for multiple geographic areas that visually
presents:
- Quantity of testing across several orders-of-magnitude
- Proportion of positive test results
- Changes in testing and proportion positive tests over time
- Identifiable trends, including outlier behavior
- Progress toward meeting target proportion of positive testing
## Motivation for visualization
```{r data_setup}
# International dataset
owid = owid_data() %>% dplyr::rename(positive='confirmed')
# US states dataset
ct = covidtracker_data() %>%
dplyr::filter(positive>0 & as.integer(fips)<60) %>%
dplyr::mutate(tests=positive+negative,positive=positive)
```
We collected longitudinal testing datasets from Our World in Data
(OWID) [@owidcoronavirus] and the COVID Tracking Project
(covidtracking) [@covidtracking-vg] as provided by the R package,
`sars2pack` [@sars2pack-cx]. The OWID collection tracks glogal test
reporting at the national level, though test reporting level (sample,
person, case, etc.) varies somewhat by country. The covidtracking
resource tracks state-level testing in the United States, again with
various definitions for what constitutes a test.
Each dataset is composed of one row per observation:
- Location
- Positive test results
- Total tests performed
- Date (in one-day increments)
One path of evolution for visualization approach is given in Figure
\@ref(fig:priorApproach) with a representative subset of states in the
United States over 28 days ending `r format(Sys.Date(),'%Y-%m-%d')`.
Figure \@ref(fig:priorApproach)A depicts the proportion
of positive tests on one day but does not provide any visual prompt of
size of testing efforts. Figure \@ref(fig:priorApproach)B uses a
scatterplot approach where the threshold for positive tests is a
line. Let $y$ be the number of positive tests and $x$ be the total
number of tests.
\begin{equation}
y = mx + b
\end{equation}
In equation (1), $b$ is the y-intercept. Assuming that $b = 0$ (since
when no tests are done, $x=0$ and $y=0$). The threshold for "enough"
testing is when the slope, $m$, is equal to the desired proportion of
positive tests. Points that fall below the line given by equation (1)
are doing adequate testing while those above should strive for
more. The dashed line in Figure \@ref(fig:priorApproach)B is for
$m=0.1$ and the dotted line for $m=0.2$. Interpreting results near the
origin in Figure \@ref(fig:priorApproach)B is challenging given to the
scale.
$$
\log_{10} y = \log_{10} x + \log_{10} m
$$
```{r}
format_date = function(d) {
format(d, '%Y-%m-%d')
}
current_date = Sys.Date()
cap = sprintf("Testing and proportion of positive tests for several states in the United States over the past 28 days (%s to %s). Included in all panels for orientation, dashed line represents 10 threshold for positive tests and dotted line represents 20. Bar chart of proportion of positive tests at a single time point on the last day of the 28-day window (A) gives no sense of number of tests performed. Positive tests vs total number of tests (B) is hard to interpret near the origin. A log-log plot of positive tests vs total tests (C) deals with visualizing more clearly, but note that .... ",
format_date(current_date-28),
format_date(current_date))
```
```{r priorApproach, fig.height=8, fig.cap = cap}
interest_states = c(
"NY",
"AK",
"NE",
"MN",
"WA",
"LA",
"DC",
"SD",
"MD",
"TX",
"CA",
"NJ",
"TN",
"KY",
"UT"
)
linecolor = alpha('grey40',0.7)
lty1 = 'dashed'
lty2 = 'dotted'
ct = ct %>% dplyr::filter(state %in% interest_states)
library(cowplot)
p1 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date==max(date)) %>%
ungroup() %>%
ggplot(aes(x=reorder(state, -positive/tests),y=positive/tests)) +
xlab('State') + ylab('Prop. Positive') +
geom_hline(yintercept=0.1, linetype='dashed', color=linecolor) +
geom_hline(yintercept=0.2, linetype='dotdash', color=linecolor) +
geom_bar(stat='identity') + coord_flip() +
theme_light()
p2 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date==max(date)) %>%
ungroup() %>%
ggplot(aes(x=tests, y=positive,color=state)) +
geom_point(alpha=0.5) +
geom_abline(intercept=0, slope=0.1, color=linecolor, linetype = lty1) +
geom_abline(intercept=0, slope=0.2, color=linecolor, linetype = lty2) +
theme_light()
p3 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date==max(date)) %>%
ungroup() %>%
ggplot(aes(x=tests, y=positive,color=state)) +
geom_point(alpha=0.7) + scale_y_log10() + scale_x_log10() +
theme_light() +
geom_abline(intercept=log10(0.1), slope=1, color=linecolor, linetype = lty1) +
geom_abline(intercept=log10(0.2), slope=1, color=linecolor, linetype = lty2) +
theme(legend.position = 'none')
p4 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date>max(date)-28) %>%
ungroup() %>%
ggplot(aes(x=tests, y=positive,color=state)) +
geom_line(alpha=0.7) +
geom_abline(intercept=0, slope=0.1, color=linecolor, linetype = lty1) +
geom_abline(intercept=0, slope=0.2, color=linecolor, linetype = lty2) +
theme_light() + theme(legend.position = 'none')
p5 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date>max(date)-28) %>%
ungroup() %>%
ggplot(aes(x=tests, y=positive,color=state)) +
geom_line(alpha=0.7) +
geom_abline(intercept=log10(0.1), slope=1, color=linecolor, linetype = lty1) +
geom_abline(intercept=log10(0.2), slope=1, color=linecolor, linetype = lty2) +
scale_y_log10() + scale_x_log10() +
theme_light() + theme(legend.position = 'none')
p6 = ct %>% dplyr::group_by(state) %>%
dplyr::filter(date>max(date)-28) %>%
ungroup() %>%
ggplot(aes(x=tests, y=positive/tests,color=state)) +
geom_line(alpha=0.7) + scale_y_log10() + scale_x_log10() +
theme_light() + ylab('Prop. Positive') +
geom_hline(yintercept=0.1, color=linecolor, linetype = lty1) +
geom_hline(yintercept=0.2, color=linecolor, linetype = lty2) +
theme(legend.position = 'none')
legend_b <- get_legend(
p2 +
guides(color = guide_legend(nrow = 2)) +
theme(legend.position = "bottom")
)
pdat = plot_grid(p1,p2 + theme(legend.position='none'),p3,p4,p5,p6, ncol=2, labels='AUTO')
plot_grid(pdat, legend_b, ncol = 1, rel_heights = c(1, 0.1))
```
\begin{equation}
y = mx + b
\end{equation}
\begin{align}
b = 0\\
y = mx \\
\log_{10}y = \log_{10}xm \\
\log_{10}y = \log_{10}x + \log_{10}m \\
X = log_{10}x, Y=log_{10}y, M=log_{10}m \\
Y = X + M \\
X=0, Y=M=log_{10}m
\end{align}
## Intuitive visualization of amount of testing
See Figure \@ref(fig:states).
```{r states, fig.cap='United States testing results.', cache.vars='ct'}
p = ct %>% dplyr::filter(positive>0 & as.integer(fips)<60) %>% dplyr::group_by(state) %>%
dplyr::filter(date > max(date)-28) %>%
dplyr::ungroup() %>%
dplyr::mutate(tests=positive+negative,positive=positive) %>%
ggplot(aes(x=tests,y=positive/tests, color=state,group=state)) +
geom_line(alpha=0.7) + theme_light() +
scale_y_log10() + scale_x_log10() +
#xlab(expression(log[10](tests))) + ylab(expression(log[10]('proportion positive tests'))) +
theme(legend.position='none')
#library(plotly)
#ggplotly(p)
p
```
```{r owid, fig.cap='Worldwide testing results.'}
p = owid %>% dplyr::filter(positive>0 & tests>500) %>% dplyr::group_by(country) %>%
dplyr::filter(date > max(date)-28) %>%
dplyr::ungroup() %>%
ggplot(aes(x=tests,y=positive/tests, color=country,group=country)) +
geom_line(alpha=0.7) + theme_light() +
scale_y_log10() + scale_x_log10() +
xlab(expression(log[10](tests))) + ylab(expression('proportion positive tests',log[10])) +
theme(legend.position='none')
#library(plotly)
#ggplotly(p)
p
```