03-ResearchDesign-TypesOfDesigns.Rmd

# (PART) Research design  {-}


# Types of study designs {#ResearchDesign}


<!-- Introductions; easier to separate by format -->
```{r, child = if (knitr::is_html_output()){'./introductions/03-ResearchDesign-TypesOfDesigns-HTML.Rmd'} else {'./introductions/03-ResearchDesign-TypesOfDesigns-LaTeX.Rmd'}}
```


## Three types of study designs {#Three-Research-Designs}

The RQ implies what data *must* be collected from the individuals in the study (the response and explanatory variables)...
but *how* are the data obtained?
After all, the data are the means by which the RQ is answered.

Different types of studies are used for different types of RQs:

* *Descriptive* studies (Sect.\ \@ref(DescriptiveStudies)) answer descriptive RQs;
* *Observational* studies (Sect.\ \@ref(ObservationalStudies)) answer relational RQs; or 
* *Experimental* studies (Sect.\ \@ref(ExperimentalStudies)) answer interventional RQs.

Observational and experimental studies are sometimes called *analytical studies*.


## Descriptive studies {#DescriptiveStudies}

*Descriptive studies* answer descriptive RQs (Fig.\ \@ref(fig:POCIDescriptive)), which specify a population and outcome.


::: {.definition #DescriptiveStudy name="Descriptive study"}
*Descriptive studies* answer descriptive research questions, and do not study relationships between variables.
:::


```{r POCIDescriptive, fig.cap="A descriptive study, used to answer a descriptive RQ", fig.align="center",fig.height=2, fig.width=6, out.width='55%'}
showPOCI(addArrows = TRUE,  
         addY = TRUE) 
```


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-kaboompics-com-6346.jpg" width="200px"/>
</div>


::: {.example #ResearchDesignWeightLoss name="Descriptive study"}
Consider this RQ:

> For obese men over 60, what is the average increase in heart rate after walking 400 metres?

The *outcome* is the average *increase* in heart rate.
The *response variable* is the *increase* in heart rate for each individual man, found by measuring each man's heart rate *before* and *after* the walk (measured *within-individuals*).

The *increase* in heart rate for each man would be computed as the *after* heart rate minus the *before* heart rate.
Some differences might be positive numbers (heart rate *increased*), and some may be negative numbers (heart rate *decreased*).

No between-individuals *comparison* is being made: every man in the study is treated in the same way.
This is a *descriptive* RQ, which can be answered by a *descriptive* study.
:::

We do not explicitly discuss descriptive studies further, as the necessary ideas are present in the discussion of observational and experimental studies.


## Observational studies {#ObservationalStudies}

*Observational studies* (Fig.\ \@ref(fig:POCIObservational)) answer *relational RQs* to study relationships.
They are commonly-used, and sometimes are the only study design possible.


::: {.definition #ObservationalStudy name="Observational study"}
*Observational studies* answer relational research questions.
:::


```{r POCIObservational, fig.cap="An observational study, used to answer a relational RQ", fig.align="center", fig.width=6, fig.height=2, out.width='60%'}
showPOCI(addC = TRUE, 
         addI = FALSE,
         addArrows = TRUE,  
         addY = TRUE,
         addX = TRUE)
```


::: {.definition #Conditions name="Condition"}
*Conditions*: The *conditions* are the values of the comparison or connection that those in the observational study experience, but are not imposed by the researchers.
:::


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-andrea-piacquadio-3807629.jpg" width="200px"/>
</div>


::: {.example #ObservationalRelationalEchinacea name="Observational study"}
Consider again this RQ [@data:barrett:echinacea]:

> Among Australian teens with a common cold, is the *average* duration of cold symptoms shorter for teens taking a daily dose of echinacea compared to teens taking no medication?

This would be a relational RQ if the researchers do not impose the taking of echinacea (that is, the individuals make this decision themselves).
The two *conditions* are 'taking echinacea', and 'not taking echinacea' (Fig.\ \@ref(fig:ObsStudiesImage)).
:::


```{r ObsStudiesImage, fig.cap="Observational studies. The dashed lines indicate steps not under the control of the researchers", fig.align="center", fig.width=7, fig.height=3, out.width='75%', cache=FALSE}

showStudyDesign(studyType = "Obs",  
                addIndividuals = TRUE,     
                addCNames = c("Echinacea",       
                              "No echinacea")) 

```


## Experimental studies {#ExperimentalStudies}

*Experimental studies* (Fig.\ \@ref(fig:POCIExperiment)), or *experiments*, are commonly-used to study relationships.
Well-designed experimental studies can establish a *cause-and-effect relationship* between the response and explanatory variables.
However, using experimental studies is not always possible.
Experiments have an [*intervention*](#def:Intervention), and so *experimental studies answer interventional RQs*.


::: {.definition #Experiment name="Experiment"}
*Experimental studies* (or *experiments*) answer interventional research questions.
:::


::: {.definition #Treatments name="Treatments"}
The *treatments* are the values of the comparison or connection that the researchers impose upon the individuals in the *experimental* study.
:::


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
In an **experimental study**, the unit of analysis (Def.\ \@ref(def:UnitOfAnalysis)) is the smallest collection of units of observations that can be randomly allocated to separate treatments.
:::


```{r POCIExperiment, fig.cap="An experimental study, used to answer interventional RQs", fig.align="center", fig.width=6, fig.height=2, out.width='65%'}
showPOCI(addC = TRUE,
         addI = TRUE,
         addY = TRUE, 
         addX = TRUE,  
         addArrows = TRUE)
```


Two types of experimental studies (Table\ \@ref(tab:ExperimentalStudyDesigns)) are [*true experiments*](#TrueExperiments) and [*quasi-experiments*](#QuasiExperiments).


```{r ExperimentalStudyDesigns}
ExpStudies <- array( dim = c(3, 4) )
colnames(ExpStudies) <- c("Study type",
                          "Do researchers allocate individuals to receive the comparison/connection?", 
                          "Do researchers allocate individuals to treatments", 
                          "Reference")

if( knitr::is_latex_output() ) {
   ExpStudies[1, ] <- c("True experiment",  
                        "Yes", 
                        "Yes", 
                        "Sect. \\ref{TrueExperiments}")
   ExpStudies[2, ] <- c("Quasi-experiment", 
                        "No",  
                        "Yes", 
                        "Sect. \\ref{QuasiExperiments}")
   ExpStudies[3, ] <- c("Observational",    
                        "No",  
                        "No",  
                        "Sect. \\ref{ObservationalStudies}")
  
   kable(ExpStudies[ c(3, 1, 2), ], # Change order
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE, # For latex to work in \rightarrow
        linesep = c( "\\addlinespace"), # Add a bit of space between all rows. 
        caption = "Comparing analytical designs (descriptive studies do not have any comparison or connection (C))",
        align = c("r", "c", "c", "l"))   %>%
   kable_styling(full_width = FALSE, font_size = 10) %>%
   row_spec(0, bold = TRUE) %>% # Columns headings in bold
   column_spec(column = 1, width = "27mm") %>% 
   column_spec(column = 2, width = "50mm") %>%
   column_spec(column = 3, width = "30mm") %>%
   column_spec(column = 4, width = "20mm")
}

if( knitr::is_html_output() ) {
   ExpStudies[1, ] <- c("True experiment",  
                        "Yes", 
                        "Yes", 
                        "Sect. \\@ref(TrueExperiments)")
   ExpStudies[2, ] <- c("Quasi-experiment", 
                        "No",  
                        "Yes", 
                        "Sect. \\@ref(QuasiExperiments)")
   ExpStudies[3, ] <- c("Observational",    
                        "No",  
                        "No",  
                        "Sect. \\@ref(ObservationalStudies)")
  
  kable(ExpStudies[ c(3, 1, 2), ], # Change order
        format = "html",
        align = c("r", "c", "c", "l"),
        longtable = FALSE,
        caption = "Comparing analytical designs (descriptive studies do not have any comparison or connection (C))",
        booktabs = TRUE)    
}
```


### True experimental studies {#TrueExperiments}

*True experiments* are commonly used, but are not always possible.
An example of a true experiment is a *randomised controlled trial*, often used in drug trials.


::: {.definition #TrueExperiment name="True experiment"}
In a *true experiment*, the researchers:

1. allocate treatments to groups of individuals (i.e., determine the values of the explanatory for the individuals), *and*
2. determine who or what individuals are in those groups.

While these may not happen in these *explicit* steps, they can happen *conceptually*.
:::


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-andrea-piacquadio-3807629.jpg" width="200px"/>
</div>


::: {.example name="True experiment"}
The echinacea study (Sect.\ \@ref(Writing-RQs)) could be designed as a *true experiment*.
The researchers would allocate individuals to one of two groups, and then decide which group took echinacea and which group did not (Fig.\ \@ref(fig:TrueExpStudiesImage)).

These steps may happen implicitly: Researchers may allocate each person at random to one of the two groups (echinacea; no echinacea).
This is still a true experiment, since the researchers could decide to switch which group receives echinacea; ultimately, the decision s still made by the researchers.
:::


```{r TrueExpStudiesImage, fig.cap="True experimental studies", fig.align="center", fig.width=7, fig.height=3, out.width='75%'}
showStudyDesign(studyType = "TrueExp", 
                addIndividuals = TRUE,
                addCNames = c("Echinacea",   
                              "No echinacea"))
```


### Quasi-experimental studies {#QuasiExperiments}

Quasi-experiments are similar to true experiments, but treatments are *allocated* to groups that *already exist* (i.e., may be naturally occurring).

::: {.definition #QuasiExperiment name="Quasi-experiment"}
In a *quasi-experiment*, the researchers:
  
* allocate treatments to groups of individuals (i.e., allocate the values of the explanatory variable to the individuals), but
* do **not** determine who or what individuals are in those groups.
:::
      

::: {.example #QuasiEchinacea name="Quasi-experiments"}
The echinacea study (Sect.\ \@ref(Writing-RQs)) could be designed as a quasi-experiment.
The researchers could *find* (not *create*) two existing groups of people (say, from Suburbs A and B), then decide to allocate people in Suburb A to take echinacea, and people in Suburb B to *not* take echinacea (Fig.\ \@ref(fig:QuasiExpStudiesImage)).
:::


```{r QuasiExpStudiesImage, fig.cap="Quasi-experimental studies. The dashed lines indicate steps not under the control of the researchers", fig.align="center", fig.width=7, fig.height=3, out.width='75%'}
showStudyDesign(studyType = "QuasiExp",
                addIndividuals = TRUE,  
                addCNames = c("Echinacea",   
                              "No echinacea"))
```


::: {.example #QuasiAlcoholAwareness name="Quasi-experiments"}
A researcher wants to examine the effect of an alcohol awareness program [@macdonald2008enough] on the average amount of alcohol consumed per student in a university Orientation Week.
She runs the program at University A only, then compares the average amount of alcohol consumed per person at two universities (A and B).

This study is a *quasi-experiment* since the researcher did not (and can not) determine the groups: the students (not the researcher) would have chosen University A or University B for many reasons.
However, the researcher *did* decide whether to allocate the program to University A or University B.
:::


## Comparing study types {#CompareStudyTypes}

Different RQs require different study designs (Table\ \@ref(tab:StudyTypes)).
In *experimental* studies, researchers *create* differences in the explanatory variable through allocation, and note the effect this has on the response variable.
In *observational* studies, researchers *observe* differences in the explanatory variable, and observe the values in the response variable.

Importantly, *only well-designed true experiments can show cause-and-effect*.
Nonetheless, well-designed observational and quasi-experimental studies can provide evidence to support cause-and-effect conclusions, especially when supported by other evidence.
Although only experimental studies can show cause-and-effect, experimental studies are often not possible for ethical, financial, practical or logistical reasons.

The advantages and disadvantages of each study type are discussed later (Sect.\ \@ref(InterpretStudyDesign)), after these study types are discussed in greater detail in the following chapters. 


```{r StudyTypes}
StudyRQ <- array( dim = c(3, 6) )
colnames(StudyRQ) <- c("RQ type", 
                       "P", 
                       "O", 
                       "C", 
                       "I", 
                       "Study type")

StudyRQ[1, ] <- c("Descriptive",    
                  "Yes", 
                  "Yes",    
                  "",    
                  "", 
                  "Descriptive")
StudyRQ[2, ] <- c("Relational",     
                  "Yes", 
                  "Yes", 
                  "Yes",    
                  "", 
                  "Observational")
StudyRQ[3, ] <- c("Interventional", 
                  "Yes", 
                  "Yes", 
                  "Yes", 
                  "Yes", 
                  "Experimental")

if( knitr::is_latex_output() ) {
  kable(StudyRQ,
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        linesep = c( "\\addlinespace"), # Add a bit of space between all rows. 
        caption = "Study types and research questions",
        align = c("r", "c", "c", "c", "c", "l"))   %>%
    kable_styling(full_width = FALSE, font_size = 10) %>%
    row_spec(0, bold = TRUE) # Columns headings in bold
}

if( knitr::is_html_output() ) {
  
  out <- kable(StudyRQ,
               format = "html",
               align = c("r", "c", "c", "c", "c", "l"),
               longtable = FALSE,
               caption = "Study types and research questions",
               booktabs = TRUE) 
  
    kable_styling(out, 
                  full_width = FALSE) %>%
      row_spec(row = 0, 
               bold = TRUE)  
}
```


::: {.example #Autism name="Cause and effect"}
Many studies report that the bacteria in the gut of people on the autism spectrum is different than the bacteria in the gut of people *not* on the autism spectrum (@kang2019long, @ho2020gut), and suggest the bacteria may contribute whether a person is autistic.
These studies were observational, so the 
`r if (knitr::is_latex_output()) {
   'suggestion of a cause-and-effect relationship may be inaccurate.'
} else {
  '[suggestion of a cause-and-effect relationship may be inaccurate](https://theconversation.com/gut-bacteria-dont-cause-autism-autistic-kids-microbiome-differences-are-due-to-picky-eating-170366).'
}`

Other studies [@yap2021autism] suggest that people on the autism spectrum are more likely to be "picky eaters", which contributes to the differences in gut bacteria.
:::


`r if (knitr::is_html_output()) {
   'The animation below compares observational, quasi-experimental and true experimental designs.'
}`

```{r StudyDesignsMovie, animation.hook="gifski",  interval=3.0, fig.align="center", fig.cap="The three main study designs", dev=if (is_latex_output()){"pdf"}else{"png"}}
if (knitr::is_html_output()) {

  for (i in (1:4)){

    par( mar = c(0.5, 0.5, 0.5, 0.5),
         pin = c(5, 3))
    
    if ( i == 1 ) {
      title.text <- "Observational study"
      sub.text <- expression( atop("Researchers "*bold(do~not)*" choose groups",
                                   "Researchers "*bold(do~not)*" choose what happens to groups"))
    }
    if ( i == 2 ) {
      title.text <- "Quasi-experimental study"
      sub.text <- expression( atop("Researchers "*bold(do~not)*" choose groups",
                                   "Researchers "*bold(do)*" choose what happens to groups"))
    }
    if ( i >= 3 ) {
      title.text <- "True experimental study"
      sub.text <- expression(atop("Researchers "*bold(do)*" choose groups",
                                  "Researchers "*bold(do)*" choose what happens to groups"))
    }
    
    openplotmat()
    title(main = title.text)
    title(sub = sub.text)
    
    pos <- array(NA, 
                 dim = c(6, 2))
		 
    pos[1, ] <- c(0.25, 0.15) # Group 1
    pos[2, ] <- c(0.25, 0.85) # Group 2
    pos[3, ] <- c(0.65, 0.85)   # No echincaea
    pos[4, ] <- c(0.65, 0.15)   # Echincae
    pos[5, ] <- c(0.50, 0.50)   # Compare
    pos[6, ] <- c(0.10, 0.50)   # People
    
    
    if ( i == 1 ){
      straightarrow(from = pos[1,], 
                    to = pos[4,], 
                    lcol = "grey",
                    arr.pos = 0.3,
                    lty = 1)
      straightarrow(from = pos[2,], 
                    to = pos[3,], 
                    lcol = "grey",
                    arr.pos = 0.3,
                    lty = 1)
      
    }
    if ( i >= 3 ){
      straightarrow(from = pos[6,], 
                    to = pos[1,], 
                    lty = 1)
      straightarrow(from = pos[6,], 
                    to = pos[2,], 
                    lty = 1)
      
    }
    
    if ( i > 1 ){
      straightarrow(from = pos[4,], 
                    to = pos[1,], 
                    arr.pos = 0.7,
                    lcol = "black",
                    lty = 1)
      straightarrow(from = pos[3,], 
                    to = pos[2,], 
                    arr.pos = 0.7,
                    lcol = "black",
                    lty = 1)
    }
    
    # All plots needs arrow to the "Compare" box:
    straightarrow(from = pos[2,], 
                  to = pos[5,], 
                  lty = 1)
    straightarrow(from = pos[1,], 
                  to = pos[5,], 
                  lty = 1)
    
    
    # TEXT
    textrect( pos[1,], 
              lab = "Group 1",
              radx = 0.065,
              rady = 0.1,
              shadow.size = 0,
              lcol = "darkseagreen1",
              box.col = "darkseagreen1")
    textrect( pos[2,], 
              lab = "Group 2", 
              radx = 0.065,
              rady = 0.1,
              shadow.size = 0,
              lcol = "darkseagreen1",
              box.col = "darkseagreen1")
    textrect( pos[3,], 
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.18,
              rady = 0.075,
              lab = "Chose not to\nuse echincacea")
    textrect( pos[4,], 
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.18,
              rady = 0.075,
              lab = "Chose to\nuse echincacea")
    
    textrect( pos[5,], 
              lab = "Compare", 
              radx = 0.075, 
              rady = 0.1, 
              shadow.size = 0,
              box.col = "antiquewhite",
              lcol = "antiquewhite")
    
    if ( i>= 3 ){
      textrect( pos[6,], 
                box.col = "white",
                lcol = "white",
                shadow.size = 0,
                radx = 0.16,
                rady = 0.075,
                lab = "People")
    }
  }
}
```


## Directionality {#Directionality}

Analytical research studies (observational; experimental) can be classified by their *directionality* (Table\ \@ref(tab:TypesOfObsStudies)):

* [*Forward direction*](#ForwardStudies): 
  The values of the explanatory variable are obtained, and the study determines what values of the response variable occur in the future.
  *All experimental studies have a forward direction.*
* [*Backward direction*](#BackwardStudies): 
  The values of the response variable are obtained, then the study determines what values of the explanatory variable occurred in the past.
* [*No direction*](#NondirectionalStudies): 
  The values of the response and explanatory variables are obtained at the same time.

Directionality is important for understanding cause-and-effect relationships.
If the comparison/connection occurs *before* the outcome is observed, a cause-and-effect relationship *may* be possible.
That is, studies with a forward direction are more likely to provide evidence of causality.


```{r TypesOfObsStudies}
ObsStudies <- array( dim = c(3, 3) )
colnames(ObsStudies) <- c("Type", 
                          "Explanatory variable",
                          "Response variable")

ObsStudies[1, ] <- c("Forward direction", 
                     "When study begins", 
                     "Determine in the future")
ObsStudies[2, ] <- c("Backward direction", 
                     "Determined from the past", 
                     "When study begins")
ObsStudies[3, ] <- c("No direction", 
                     "When study begins", 
                     "When study begins")

if( knitr::is_latex_output() ) {
  
  kable(ObsStudies,
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE, # For latex to work in \rightarrow
        caption = "Classifying observational studies",
        align = c("r", "c", "c"))   %>%
    kable_styling(full_width = FALSE, font_size = 10) %>%
    row_spec(0, bold = TRUE) 
}

if( knitr::is_html_output() ) {
  
  kable(ObsStudies,
        format = "html",
        align = c("r", "c","c"),
        longtable = FALSE,
        caption = "Classifying observational studies",
        booktabs = TRUE) 
}
```


<!-- <iframe src="https://docs.google.com/forms/d/e/1FAIpQLScUGUtuPBTArcmQo36tb3iYH49xGiJSl0Z_9XJdnQdV6Ej4ZQ/viewform?embedded=true" width="640" height="601" frameborder="0" marginheight="0" marginwidth="0"></iframe> -->

     
::: {.thinkBox .think data-latex="{iconmonstr-light-bulb-2-240.png}"}
In South Australia in 1988--1989, 25 cases of legionella infections (an unusually high number) were investigated [@data:oconnor:pottingmix].
All 25 cases were gardeners.\label{thinkBox:GardenersDirection}

Researchers compared 25 people with legionella infections with 75 similar people without the infection.
The use of potting mix in the previous four weeks was associated with an increase in the risk of contracting illness of about 4.7 times.

What *direction* does this observational study have?

`r if (!knitr::is_html_output()) '<!--'`
`r webexercises::hide()`
*Backward directionality*: people were identified with an infection, and then the researchers looked *back* at past activities.
`r webexercises::unhide()`
`r if (!knitr::is_html_output()) '-->'`
:::


<iframe src="https://learningapps.org/watch?v=p3i692osc22" style="border:0px;width:100%;height:500px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>


Research studies are sometimes described as 'prospective' or 'retrospective', but these terms can be misleading [@ranganathan2018study] and their use not recommended [@VANDENBROUCKE20141500].


*Experimental studies always have a forward direction.*
Observational studies may have any directionality, and are sometimes given different names accordingly.


### Forward-directional studies {#Forward}

All experimental studies have a forward direction, and include *randomised controlled trials* (RCTs) and *clinical trials*.

Observational studies with a *forward* direction are often called *cohort studies*.
Both experimental studies and cohort studies can be expensive and tricky: tracking a group of individuals (a *cohort*) into the future is not always easy, and the ability to track individuals into the future may be lost (*drop-outs*) as people move, die, decide to no longer participate, etc.
Forward-directional observational studies:

* may add support to cause-and-effect conclusions, since the comparison/connection occurs *before* the outcome (only well-designed experimental studies can establish cause-and-effect).
* can examine many different outcomes in one study, since the outcome(s) occur in the future. 
* can be problematic for rare outcomes, as the outcome of interest may not appear (or may appear rarely) in the future.


::: {.example name="Forward study"}
@chih2018incidence studied dogs and cats who had been recommended to receive intermittent nasogastric tube (NGT) aspiration for up to 36 hours.
Some pet owners did not give permission for NGT, while some did; thus, whether the animal received NGT was *not* determined by the researchers (so this study is observational).
The researchers then observed whether the animals developed hypochloremic metabolic alkalosis (HCMA) in the next 36 hours.

Since the explanatory variable (whether NGT was used or not) was recorded at the start of the study, and the response variable (whether HCMA was observed or not) was determined within the following 36 hours, this study has a *forward direction*.
:::


### Backward-directional studies {#Backward}

Observational studies with a *backward* direction are often called *case-control* studies.
Researchers find individuals with specific values of the response variable (the cases and the controls), and determine values of the explanatory variable from the past.
Case-control studies:

* only allow one outcome to be studied, since individuals are chosen to be in the study based on the value of the response variable of interest.
* are useful for rare outcomes: The researchers can purposely select large numbers with the rare outcome of interest.
* do not effectively eliminate other explanations for the relationship between the response and explanatory variables (called *confounding*; Def.\ \@ref(def:Confounding)).
* may suffer from *selection bias* (Sect.\ \@ref(SelectionBias)), as researchers try to locate individuals with a rare outcome.
* may suffer from *recall bias* (Sect.\ \@ref(Biases)) when the individuals are people: accurately recalling the past can be unreliable.

::: {.example name="Backwards study"}
A study [@data:Pamphlett:toxins] examined patients with and without sporadic motor neurone disease (SMND), and asked about *past* exposure to metals.

The response variable (whether or not the respondent had SMND) is assessed when the study begins, and whether or not they had exposure to metals (explanatory variable) is determined from the *past*.
This observational study has a *backward* direction.
:::


### Non-directional studies {#NonDirectional}

*Non-directional* observational studies are called *cross-sectional* studies.
Cross-sectional studies:

* are good for findings associations between variables (which may or may not be causation).
* are generally quicker and cheaper than other types of studies.
* are not useful for studying rare outcomes.
* do not effectively eliminate other explanations for the relationship between the response and explanatory variables (called *confounding*; Def.\ \@ref(def:Confounding)).


::: {.example name="Non-directional study"}
A study [@data:Russell2014:FoodInsecurity] asked older Australian their opinions of their own food security, and recorded their living arrangements.
Individuals' responses to both both the response variable and explanatory variable were gathered when the study began.
This observational study is *non-directional*.
:::


## Internal validity {#IntroInternalValidity}

*Internally validity* refers to how reasonable and logical it is to conclude that changes in the value of the response variable can be attributed to changes in the value of the explanatory variable; that is, it refers to the strength of the *inferences* made from those studied.
Internally valid studies are generally *accurate* and *repeatable*.

Studies with *high* internal validity show that changes in the response variable can confidently be related to changes in the explanatory variable *in the group that was studied*; the possibility of other explanations has been minimised.

In contrast, studies with *low* internal validity leave open other possibilities, apart from changes in value of the explanatory variable, to explain changes in the value of the response variable.
Experimental studies usually have higher internal validity than observational studies.

Ideally, all studies should be designed to be *internally valid* (Chaps.\ \@ref(DesignExperiment) and \@ref(DesignObservational)).


<div style="float:right; width: 75px; padding:10px">
<img src="Pics/iconmonstr-door-7-240.png" width="50px"/>
</div>


::: {.definition #InternalValidity name="Internal validity"}
*Internally validity* refers to how reasonable and logical it is to conclude that changes in the value of the response variable can be attributed to changes in the values of the explanatory variable; that is, the strength of the *inferences* made from those studied.

A study with *high* internal validity shows that the changes in the response variable can be attributed to changes in the explanatory variables; other explanations have been ruled out.
:::


::: {.example #LowInternal name="Low internal validity"}
In a review of studies that used double-fortified salt to manage iodine and iron deficiencies [@larson2021can], one conclusion was (p. 265):

> Internal validity of the efficacy trials was generally weak [...] because of issues around selection bias, unaccounted confounders, and participant withdrawals.
:::


One of many potential threats to internal validity is that the groups being compared are initially different; for example, if the group receiving echinacea is younger (on average) than the group receiving no medication.
This is a form of *confounding* (Def.\ \@ref(def:Confounding)).

To check this, the *baseline characteristics* of the individuals in the groups can be compared: the groups being compared should be as similar as possible, so that any differences in the outcome cannot be attributed to pre-existing difference in the groups.


::: {.example name="Baseline characteristics"}
In a study of treating depression in adults [@data:Danielsson2014:Depression], three treatments were compared: exercise, basic body awareness therapy, or advice.

If any differences in the patients receiving the different treatments were found, the researchers need to be confident that the differences were due to the treatment.
For this reason, the three groups were compared to ensure the groups were similar in terms of average ages, percentage of women, taking of anti-depressants, and many other aspects.
::: 


An *internally valid* study requires studies to be carefully designed, discussed in Chaps.\ \@ref(DesignExperiment) and\ \@ref(DesignObservational).
In general, well-designed experimental studies are more likely to be internally valid than observational studies.


## External validity {#IntroExternalValidity}

A study is *externally valid* if the results of the study are likely to generalise to the rest of the *population*, beyond just those studied in the sample.
To be *externally* valid, a study first needs to be *internally* valid, since the results must at least be sound for the group under study before being extended to other members of the population.

Using a *random sample* helps ensure external validity.
In addition, the use of [*inclusion* and *exclusion criteria*](#def:InclusionExclusionCriteria) (Sect.\ \@ref(Population)) helps clarify to whom or what the results may apply outside of the sample being studied.


<div style="float:right; width: 75px; padding:10px">
<img src="Pics/iconmonstr-share-11-240.png" width="50px"/>
</div>


::: {.definition #ExternalValidity name="External validity"}
*External validity* refers to the ability to generalise the results to the rest of the population, beyond just those in the sample studied.
For a study to be truly externally valid, the sample must be a random sample (Chap.\ \@ref(Sampling)) from the population.
:::


*External validity* does *not* mean that the results apply more widely than the intended population.


::: {.example #ExternalValidPop name="External validity"}
Suppose the *population* in a study is *Californian university students*.
The sample comprises the Californian university students actually studied.
The study is externally valid if the sample is a random sample from the population of all Californian university students.

The results will not necessarily apply to university students outside of Californian (though they may), or all Californian residents.
However, this *is irrelevant for external validity*.
External validity concerns how the *sample* represents the intended population in the RQ, which is *Californian university students*.
The study is not concerned with all Californian residents, or with non-Californian university students.
:::


## The importance of design {#DesignImportance}

Choosing the *type* of study is only one part of research design.
Planning the data collection process, and actually collecting the data, is still required.
Sometimes, data may be already available (called *secondary data*), or may need collecting (called *primary data*).

Either way, knowing *how* the data are obtained is important.
The design phase is concerned with planning the best approach to obtaining the data, to ensure the study is *internally* and *externally* valid, as far as possible.

*Internal validity* considerations include:

* *What else* might influence the values of the response variable, apart from the explanatory variable? (Chap.\ \@ref(FactorsInfluenceY))
* How can the study be designed *effectively* to maximise internal validity? (Chaps.\ \@ref(DesignExperiment) and\ \@ref(DesignObservational))

*External* validity considerations include:

* Sampling: Since the whole population cannot be studied, *who* or *what* do we study in the population (Chap.\ \@ref(Sampling))?
  And *how many* do we need to study?
  (We need to learn more before we can answer this critical question in Chap.\ \@ref(EstimatingSampleSize).)

The details of how the data will be *collected* (Chap.\ \@ref(CollectingDataProcedures)) and *ethical* issues (Chap.\ \@ref(Ethics)) must also be considered.
Furthermore, the limitations of the study must be communicated (Chap.\ \@ref(Interpretation)).


`r if (knitr::is_html_output()){
  'The following short (humourous) video demonstrates the importance of understanding the design!'
}`

<div style="text-align:center;">
<iframe width="560" height="315" src="https://www.youtube.com/embed/BKorP55Aqvg" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture"></iframe>
</div>


## Summary {#Chap3-Summary}

Three types of research studies are: *Descriptive studies* (for descriptive RQs), *observational studies* (for relational RQs), and *experimental* (for interventional RQs).

Observational studies can usually be classified as having a *forward direction* (cohort studies), *backward direction* (case-control studies), or *no direction* (cross sectional studies).
Experimental studies always have a forward direction, and can be classified as *true experiments* or *quasi-experiments*.
Cause-and-effect conclusions can only be made from well-designed *true experiments*.

Ideally studies should be designed to be *internally* and *externally* valid.
In general, experimental studies have better internal validity than observational studies.


```{r Chap3Summary, animation.hook="gifski",  interval=1.5, fig.cap="Chapter 3 summary", fig.height = 3, fig.align="center", dev=if (is_latex_output()){"pdf"}else{"png"}}
if (knitr::is_html_output()) {

for (i in (1:9)){
  
  par( mar = c(0.1, 0.1, 0.1, 0.1) ) # Number of margin lines on each side

  diagram::openplotmat()
  pos <- array(NA, 
               dim = c(4, 2))
  pos[1, ] <- c(0.35, 0.6) # P 
  pos[2, ] <- c(0.45, 0.6) # O
  pos[3, ] <- c(0.55, 0.6)   # C
  pos[4, ] <- c(0.65, 0.6)   # I

  
  if (i <=  2){
    textrect( colMeans( pos[1:2, ] )  + c(0, 0.15),
              lab = "Descriptive RQ",
              radx = 0.15,
              rady = 0.04,
              shadow.size = 0,
              lcol = "azure",
              box.col = "azure",
              cex = 1.0)
    if (i == 2) {
      textrect( colMeans( pos[1:2, ] ) + c(0, 0.30),
               lab = "Answer using a Descriptive study",
               lcol = "beige",
               box.col = "beige",
               radx = 0.25,
               rady = 0.04,
               shadow.size = 0,
               cex = 0.85)
      
    }
  }
  
  if ( (i == 3) | ( i == 4) ){
    textrect( colMeans( pos[1:3, ] ) + c(0, 0.15),
               lab = "Relational RQ",
              radx = 0.15,
              rady = 0.04,
              shadow.size = 0,
              lcol = "azure",
              box.col = "azure",
              cex = 1.0)
    if (i == 4) {
      textrect( colMeans( pos[1:3, ] ) + c(0, 0.30),
                 lab = "Answer using a Observational study",
                lcol = "beige",
                box.col = "beige",
                radx = 0.25,
                rady = 0.04,
                shadow.size = 0,
                cex = 0.85)
      
    }
    
  }
  
  if ( (i >= 5) & ( i <= 7) ){
    textrect( colMeans( pos[1:4, ] ) + c(0, 0.15),
               lab = "Interventional RQ",
              radx = 0.15,
              rady = 0.04,
              shadow.size = 0,
              lcol = "azure",
              box.col = "azure",
              cex = 1.0)
    if (i >= 6) {
      textrect( colMeans( pos[1:4, ] ) + c(0, 0.30),
                 lab = "Answer using a Experimental study",
                lcol = "beige",
                box.col = "beige",
                radx = 0.25,
                rady = 0.04,
                shadow.size = 0,
                cex = 0.85)
    }
    if (i == 7) {
      textplain( colMeans( pos[1:4, ] ) - c(0, 0.1),
                 lab = "(Intervention: when C can be manipulated by the researchers)",
                 cex = 0.85)
      
    }
  }
 
  # Show some arrow, and then over-write

  if ( i == 8 ) {
    straightarrow( from = colMeans( pos[1:4, ] ) + c(-0.1, -0.3),
                   to = pos[2, ],
                   lwd = 2)
  }
  if ( i == 9 ) {
    straightarrow( from = colMeans( pos[1:4, ] ) + c(0.1, -0.3),
                   to = pos[3, ],
                   lwd = 2)
  }
  
  if ( i == 8 ) {
    textrect( colMeans( pos[1:4, ] ) + c(-0.1, -0.3),
              lab = "Response variable",
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.05,
              rady = 0.02,
              cex = 0.85)
  }
  if ( i == 9 ) {
    textrect( colMeans( pos[1:4, ] ) + c(0.1, -0.3),
              lab = "Explanatory variable",
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.05,
              rady = 0.02,
              cex = 0.85)
  } 

  
  # Always need P and O
  textrect( pos[1,], 
            lab = "P",
            radx = 0.05,
            rady = 0.025,
            shadow.size = 0,
            lcol = "white",
            box.col = "white",
            cex = 2)
  textrect( pos[2,], 
            lab = "O", 
            radx = 0.05,
            rady = 0.05,
            shadow.size = 0,
            lcol = "white",
            box.col = "white",
            cex = 2)
  if (i >= 3) { # C
    textrect( pos[3,], 
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.05,
              rady = 0.05,
              lab = "C",
              cex = 2)
  }
  if (i >= 5) { # I
    textrect( pos[4,], 
              box.col = "white",
              lcol = "white",
              shadow.size = 0,
              radx = 0.05,
              rady = 0.05,
              lab = "I", 
              cex = 2)
  }

} 
  
}
  
```
  

`r if (knitr::is_html_output()){
  'The following short videos may help explain some of these concepts:'
}`

<div style="text-align:center;">
<iframe width="560" height="315" src="https://www.youtube.com/embed/eic_LjXT4qc" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture"></iframe>
</div>

<div style="text-align:center;">
<iframe width="560" height="315" src="https://www.youtube.com/embed/2N_bkiyTiXU" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture"></iframe>
</div>


## Quick review questions {#Chap3-QuickReview}

::: {.webex-check .webex-box}
1. A study [@fraboni2018red] examined the 'red-light running behaviour of cyclists in Italy'. \tightlist
This study is most likely to be: `r if (!knitr::is_html_output()){
'  (a) An observational study; (b) A quasi-experimental study; or (c) An experimental study.'
}`
`r if( knitr::is_html_output() ) {
	longmcq( c(answer = "an observational study",
             "a quasi-experimental study",
             "an experimental study"))}`
1. When the results of studying a sample apply to the wider population of interest, the study is called:  `r if( !knitr::is_html_output() ) {' (a) Internally valid; or (b) Externally valid.'}`
`r if(knitr::is_html_output()){
	longmcq( c("internally valid",
              answer = "externally valid") ) }`
1. In a quasi-experiment, the researchers allocate treatments to groups that they cannot manipulate. 
True or false?
`r if(knitr::is_html_output())	torf(answer=TRUE)`
1. What is the difference between an true experiment and a quasi-experiment?\tightlist
 `r if (!knitr::is_html_output()){
'  a. In a true experiment, the researchers apply treatments to groups. In a quasi-experiment, the researchers do not apply treatments to groups. 
   b. In a true experiment, the researchers apply treatments to groups that they have determined. In a quasi-experiment, the researchers apply treatments to groups that they have not determined.
   c. In a true experiment, the RQ has a comparison; in a quasi-experiment, the RQ has a connection.'
}`
`r if( knitr::is_html_output() ) {
   longmcq( c(
      "In a true experiment, the researchers apply treatments to groups. In a quasi-experiment, the researchers do not apply treatments to groups", 
      answer = "In a true experiment, the researchers apply treatments to groups that they have determined. In a quasi-experiment, the researchers apply treatments to groups that they have not determined",
      "In a true experiment, the RQ has a comparison; in a quasi-experiment, the RQ has a connection"))}` 
1. A research study compared the use of two different education programs to reduce the percentage of patients experiencing ventilator-associated pneumonia (VAP).
Paramedics from two cities were chosen to participate. Paramedics in City A were chosen to receive Program 1, and paramedics in the other city to receive Program 2.
What type of study is this?
`r if( knitr::is_html_output() ) {
   longmcq( c(
      "Observational", 
      answer = "Quasi-experiment",
      "True experiment",
      "Descriptive"))}` 
1. Which of the following are true?

   a. True experiments have a higher internal validity than observational studies.  \tightlist
`r if( knitr::is_html_output() ) {torf( answer = TRUE)}`
   b. Internal validity refers to the strength of the inferences made from the study.  
`r if( knitr::is_html_output() ) {torf( answer = TRUE)}`
   c. External validity refers to the ability to generalise the results to other groups apart from those studied.  
`r if( knitr::is_html_output() ) {torf( answer = TRUE)}`
   d. Inclusion and exclusion criteria can be used to clarify the internal validity.  
`r if( knitr::is_html_output() ) {torf( answer = FALSE)}`
   e. Observational studies have a higher external validity than experimental studies.  
`r if( knitr::is_html_output() ) {torf( answer = FALSE)}`
:::


## Exercises {#ResearchDesignsExercises}

Selected answers are available in Sect.\ \@ref(ResearchDesignAnswer).

::: {.exercise #ResearchDesignConcreteBeams}
In a  study on the shear strength of recycled concrete beams [@gonzalez2007shear], beams were divided into three groups.
Different loads were then applied to each group, and the shear strength needed to fracture the beams was measured.
Is this a *quasi-experiment* or a *true experiment*? 
Explain.
:::


::: {.exercise #ResearchDesignMatresses}
A nursing study aimed to compare "the effectiveness of alternating pressure air mattresses vs. overlays, to prevent pressure ulcers" (@data:Manzano2013:Matresses , p. 2099).
Patients were *provided* with either alternating pressure air overlays (in 2001) or alternating pressure air mattresses (in 2006).
The number of pressure ulcers were recorded.

This study is experimental, because the researchers *provided* the mattresses.
Is this a *true* experiment or *quasi*-experiment?
Explain.
:::


::: {.exercise #ResearchDesignPetsAndHealth}
Consider this initial RQ (based on @friedmann1985health), that clearly requires a lot of refining: "Are people with pets healthier?"
To answer this RQ:

1. Describe useful and practical definitions for P, O and C. 
2. Describe an *experimental* study to answer the RQ.
3. Describe an *observational* study to answer the RQ.
:::


::: {.exercise #ResearchDesignDietsForWeightLoss}
Consider this journal article extract (@data:sacks:weightloss, p. 859):

> We randomly assigned 811 overweight adults to one of four diets [...]
> The diets consisted of similar foods and met guidelines for cardiovascular health [...]
> The primary outcome was the change in body weight after 2 years in [...] comparisons of low fat versus high fat and average protein versus high protein and in the comparison of highest and lowest carbohydrate content.

1. Define POCI.
2. Is this study observational or experimental?
   Why?
3. Is this study a quasi-experiment or a true experiment?
   Why?
4. What are the units of analysis?
5. What are the units of observation?
6. What is the response variable?
7. What is the explanatory variable?
:::


<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
**Answers to in-chapter questions:**

- Sect. \ref{thinkBox:GardenersDirection}: *Backward directionality*: people were identified with an infection (response), and then the researchers looked at past activities (explanatory).
- \textbf{\textit{Quick Revision:}}
**1.** Observational.
**2.** Externally valid.
**3.** True.
**4.** In a true experiment, the researchers apply treatments to groups that they have determined. In a quasi-experiment, the researchers apply treatments to groups that they have not determined.
**5.** Quasi-experiment.
**6.** The first three statements only are true.
:::
`r if (knitr::is_html_output()) '-->'`