how_much_testing.Rmd

# How much testing?


```{r include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE, cache=TRUE)
```

```{r init}
library(sars2pack)
library(ggplot2)
library(dplyr)
```


## Outline of the problem

The COVID-19 pandemic has disrupted daily life throughout the
world. Without a vaccine to confer immunity and lacking effective
therapies once infected, public health measures such as social
distancing, contact tracing, and case surveillance rule the day with
respect to mitigating impacts of the disease on communities. As
individual countries emerge from variable levels of lockdown,
community testing to detect cases as quickly and thoroughly as
possible is a recognized component of controlling the pandemic.

There is considerable agreement that widespread testing is a required
component of moving beyond stay-at-home orders. The World Health
Organization (WHO) has highlighted the need for extensive and
widespread testing. Tedros Ghebreyesus, the chief executive of WHO,
has suggested “You cannot fight a fire blindfolded. Our key message is
test, test, test” [@who-bh]. Robert Gallo, director of the Institute
of Human Virology at the University of Maryland School of Medicine "is
absolutely essential to control the epidemic"
[@Brumfiel2020-qe]. Emily Gurley, an associate scientist at the Johns
Hopkins Bloomberg School of Public Health told NPR [@Brumfiel2020-qe],
"Everyone staying home is just a very blunt measure. That's what you
say when you've got really nothing else. Being able to test folks is
really the linchpin in getting beyond what we're doing now." Philip
J. Rosenthal describes how early application of diagnostic testing
lead to strong disease control in some countries [@Rosenthal2020-zs].

So, how much testing is enough? Michael Ryan, executive director of
the WHO Health Emergencies Program suggests that, "We would certainly
like to see countries testing at the level of ten negative tests to
one positive, as a general benchmark of a system that's doing enough
testing to pick up all cases." [@Huang2020-iz] For particularly
high-risk communities such as the elderly or those who are expected to
come into contact with others regularly, aiming for a much lower
proportion of positive test results is appropriate so as to capture
the highest possible proportion of infected and infectious
individuals.

Here, we present an intuitive and principled approach to visualizing
comparative testing data for multiple geographic areas that visually
presents:

- Quantity of testing across several orders-of-magnitude
- Proportion of positive test results
- Changes in testing and proportion positive tests over time
- Identifiable trends, including outlier behavior
- Progress toward meeting target proportion of positive testing

## Motivation for visualization

```{r data_setup}
# International dataset
owid = owid_data() %>% dplyr::rename(positive='confirmed')

# US states dataset
ct = covidtracker_data() %>%
    dplyr::filter(positive>0 & as.integer(fips)<60) %>%
    dplyr::mutate(tests=positive+negative,positive=positive)
```

We collected longitudinal testing datasets from Our World in Data
(OWID) [@owidcoronavirus] and the COVID Tracking Project
(covidtracking) [@covidtracking-vg] as provided by the R package,
`sars2pack` [@sars2pack-cx]. The OWID collection tracks glogal test
reporting at the national level, though test reporting level (sample,
person, case, etc.) varies somewhat by country. The covidtracking
resource tracks state-level testing in the United States, again with
various definitions for what constitutes a test.

Each dataset is composed of one row per observation:

- Location
- Positive test results
- Total tests performed
- Date (in one-day increments)

One path of evolution for visualization approach is given in Figure
\@ref(fig:priorApproach) with a representative subset of states in the
United States over 28 days ending `r format(Sys.Date(),'%Y-%m-%d')`. 
Figure \@ref(fig:priorApproach)A depicts the proportion
of positive tests on one day but does not provide any visual prompt of
size of testing efforts. Figure \@ref(fig:priorApproach)B uses a
scatterplot approach where the threshold for positive tests is a
line. Let $y$ be the number of positive tests and $x$ be the total
number of tests.

\begin{equation}
y = mx + b
\end{equation}

In equation (1), $b$ is the y-intercept. Assuming that $b = 0$ (since
when no tests are done, $x=0$ and $y=0$).  The threshold for "enough"
testing is when the slope, $m$, is equal to the desired proportion of
positive tests. Points that fall below the line given by equation (1)
are doing adequate testing while those above should strive for
more. The dashed line in Figure \@ref(fig:priorApproach)B is for
$m=0.1$ and the dotted line for $m=0.2$. Interpreting results near the
origin in Figure \@ref(fig:priorApproach)B is challenging given to the
scale.

$$
\log_{10} y = \log_{10} x + \log_{10} m
$$


```{r}
format_date = function(d) {
    format(d, '%Y-%m-%d')
}
current_date = Sys.Date()

cap = sprintf("Testing and proportion of positive tests for several states in the United States over the past 28 days (%s to %s). Included in all panels for orientation, dashed line represents 10 threshold for positive tests and dotted line represents 20. Bar chart of proportion of positive tests at a single time point on the last day of the 28-day window (A) gives no sense of number of tests performed. Positive tests vs total number of tests (B) is hard to interpret near the origin. A log-log plot of positive tests vs total tests (C) deals with visualizing more clearly, but note that .... ", 
              format_date(current_date-28),
              format_date(current_date))
```

```{r priorApproach, fig.height=8, fig.cap = cap}
interest_states = c(
    "NY",
    "AK",
    "NE",
    "MN",
    "WA",
    "LA",
    "DC",
    "SD",
    "MD",
    "TX",
    "CA",
    "NJ",
    "TN",
    "KY",
    "UT"
)
linecolor = alpha('grey40',0.7)
lty1 = 'dashed'
lty2 = 'dotted'

ct = ct %>% dplyr::filter(state %in% interest_states)
library(cowplot)
p1 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date==max(date)) %>%
    ungroup() %>%
    ggplot(aes(x=reorder(state, -positive/tests),y=positive/tests)) +
    xlab('State') + ylab('Prop. Positive') + 
    geom_hline(yintercept=0.1, linetype='dashed', color=linecolor) +
    geom_hline(yintercept=0.2, linetype='dotdash', color=linecolor) + 
    geom_bar(stat='identity') + coord_flip() +
    theme_light()

p2 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date==max(date)) %>%
    ungroup() %>%
    ggplot(aes(x=tests, y=positive,color=state)) +
    geom_point(alpha=0.5) +
    geom_abline(intercept=0, slope=0.1, color=linecolor, linetype = lty1) + 
    geom_abline(intercept=0, slope=0.2, color=linecolor, linetype = lty2) + 
    theme_light()

p3 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date==max(date)) %>%
    ungroup() %>%
    ggplot(aes(x=tests, y=positive,color=state)) +
    geom_point(alpha=0.7) + scale_y_log10() + scale_x_log10() +
    theme_light() + 
    geom_abline(intercept=log10(0.1), slope=1, color=linecolor, linetype = lty1) + 
    geom_abline(intercept=log10(0.2), slope=1, color=linecolor, linetype = lty2) +
    theme(legend.position = 'none') 

p4 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date>max(date)-28) %>%
    ungroup() %>%
    ggplot(aes(x=tests, y=positive,color=state)) +
    geom_line(alpha=0.7) +
    geom_abline(intercept=0, slope=0.1, color=linecolor, linetype = lty1) + 
    geom_abline(intercept=0, slope=0.2, color=linecolor, linetype = lty2) + 
    theme_light() + theme(legend.position = 'none')

p5 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date>max(date)-28) %>%
    ungroup() %>%
    ggplot(aes(x=tests, y=positive,color=state)) +
    geom_line(alpha=0.7) + 
    geom_abline(intercept=log10(0.1), slope=1, color=linecolor, linetype = lty1) + 
    geom_abline(intercept=log10(0.2), slope=1, color=linecolor, linetype = lty2) +
    scale_y_log10() + scale_x_log10() +
    theme_light() + theme(legend.position = 'none')

p6 = ct %>% dplyr::group_by(state) %>%
    dplyr::filter(date>max(date)-28) %>%
    ungroup() %>%
    ggplot(aes(x=tests, y=positive/tests,color=state)) +
    geom_line(alpha=0.7) + scale_y_log10() + scale_x_log10() +
    theme_light() + ylab('Prop. Positive') +
    geom_hline(yintercept=0.1, color=linecolor, linetype = lty1) +
    geom_hline(yintercept=0.2, color=linecolor, linetype = lty2) +
    theme(legend.position = 'none') 
legend_b <- get_legend(
  p2 + 
    guides(color = guide_legend(nrow = 2)) +
    theme(legend.position = "bottom")
)
pdat = plot_grid(p1,p2 + theme(legend.position='none'),p3,p4,p5,p6, ncol=2, labels='AUTO')
plot_grid(pdat, legend_b, ncol = 1, rel_heights = c(1, 0.1))
```

\begin{equation}
y = mx + b
\end{equation}

\begin{align}
b = 0\\
y = mx \\
\log_{10}y = \log_{10}xm \\
\log_{10}y = \log_{10}x + \log_{10}m \\
X = log_{10}x, Y=log_{10}y, M=log_{10}m \\
Y = X + M \\
X=0, Y=M=log_{10}m
\end{align}


## Intuitive visualization of amount of testing

See Figure \@ref(fig:states).

```{r states, fig.cap='United States testing results.', cache.vars='ct'}
p = ct %>% dplyr::filter(positive>0 & as.integer(fips)<60) %>% dplyr::group_by(state) %>%
    dplyr::filter(date > max(date)-28) %>% 
    dplyr::ungroup() %>%
    dplyr::mutate(tests=positive+negative,positive=positive) %>%
    ggplot(aes(x=tests,y=positive/tests, color=state,group=state)) + 
    geom_line(alpha=0.7) + theme_light() +
    scale_y_log10() + scale_x_log10() +
    #xlab(expression(log[10](tests))) + ylab(expression(log[10]('proportion positive tests'))) +
    theme(legend.position='none')
#library(plotly)
#ggplotly(p)
p
```

```{r owid, fig.cap='Worldwide testing results.'}
p = owid %>% dplyr::filter(positive>0 & tests>500) %>% dplyr::group_by(country) %>%
    dplyr::filter(date > max(date)-28) %>% 
    dplyr::ungroup() %>%
    ggplot(aes(x=tests,y=positive/tests, color=country,group=country)) + 
    geom_line(alpha=0.7) + theme_light() +
    scale_y_log10() + scale_x_log10() +
    xlab(expression(log[10](tests))) + ylab(expression('proportion positive tests',log[10])) +
    theme(legend.position='none')
#library(plotly)
#ggplotly(p)
p
```