36-Read.Rmd


# Reading and critiquing research {#Reading}

<!-- Introductions; easier to separate by format -->
```{r, child = if (knitr::is_html_output()) {'./introductions/36-Read-HTML.Rmd'} else {'./introductions/36-Read-LaTeX.Rmd'}}
```


<!-- Define colours as appropriate -->
```{r, child = if (knitr::is_html_output()) {'./children/coloursHTML.Rmd'} else {'./children/coloursLaTeX.Rmd'}}
```


## Introduction {#Chap36-Intro}


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-pixabay-258353.jpg" width="200px"/>
</div>

All academic disciplines change and adapt.
Staying current in your discipline requires reading, critiquing and understanding the research of others, as communicated in *journal articles* or *presentations* (Chap.\ \@ref(WritingResearch)).
(*Critiquing* means to evaluate: identifying what is good, and what can be improved.)

At some time during your studies or employment, you will need to read research articles: 

* to understand current practices in your discipline; 
* to know *why* your discipline uses the current procedures and practices; 
* to learn about new procedures and practices that may be adopted;
* to critique the evidence for current or new practices; and 
* to identify open or unresolved questions in your discipline.

Familiarity with the language and concepts of research is important for understanding these articles, even if you will not be conducting your own research.

Reading research articles can be challenging.
Rather than reading articles thoroughly from start to finish, first read the *Abstract* (Sect,\ \@ref(WritingAbstract))) to obtain an overview of the whole article, without becoming lost in the details.
Then, read the *Discussion*, which highlights the important findings.
Next, skim the rest of the article (perhaps focusing on graphs and tables of results).
Finally, if necessary, read the article for details.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Terminology and notation varies widely in research (Sect.\ \@ref(LexicalAmbiguity)).
When reading research, check the terminology and notation being used if you are unsure!
:::


The six steps of the research process (Sect.\ \@ref(SixStepsOfResearch)) can guide the research critique.\index{Research!six steps}

1. *Asking the RQ*:
* What research question is the research answering?
* Why is this RQ important?\index{RQ}
* To what population will the results apply?
* What are the units of analysis\index{Units of analysis} and units of observation?\index{Units of observation}
* How are important terms defined?
2. *Designing the study*:\index{Research design}
* Is the study observational or experimental?
* Is the study well-designed?
What is not explained or clear?
* What design features and used, and why?
* How many individuals are in the study?
* How was the sample obtained? 
What are the implications for external validity?\index{External validity}
* Is the study designed to maximize internal validity?\index{Internal validity}
* How is confounding managed?\index{Confounding}
* What are the design limitations?
* Are there ethical concerns?\index{Ethics}
* What is the source of funding?
3. *Collecting the data*:
* How were the data collected?\index{Data collection}
* Are the necessary details provided so the study can be approximately replicated?
4. *Classifying and summarising the data*:
* Is the data summary appropriate, complete and clear?
* What does the data summary reveal about the data?
* What do the tables and graphs reveal about the data and relationships?
5. *Analysing the data*:
* What types of confidence intervals and/or hypothesis tests were used?
* Is the analysis appropriate, accurate, valid and clear?
* What do the results mean?
* What software was used?
* Are the results statistically valid?
6. *Reporting the results*:
* What are the main conclusions, and how do they answer the RQ?
* Are the conclusion consistent with the results?
* Are the results accurate, appropriate and well-reported?
* Are the results of practical importance?
* Are the study limitations acknowledged, and their implications discussed? 
* What other questions have emerged?


## Example: walking while texting {#ReadWalkingTexting}

@sajewicz2023texting studied the impact of texting (using a smartphone) on how students walk.
In this section, the article will be briefly discussed.


### The Abstract {#ReadWalkingTextingAbstract}

Part of the unstructured *Abstract* for the article reads:

> The aim of this experiment was to investigate whether using a cell phone while walking affects walking velocity [...] in young people. 
> Forty-two subjects ($20$\ males, $22$\ females; mean age: $20.74\pm 1.34$ years; mean height: $173.21\pm 8.07$\ cm; mean weight: $69.05\pm 14.07$\ kg) participated in the study. 
> The subjects were asked to walk on an FDM--1.5 dynamometer platform four times at a constant comfortable velocity and a fast velocity of their choice. 
> They were asked to continuously type one sentence on a cell phone while walking at the same velocity. 
> The results showed that texting while walking led to a significant reduction in velocity compared to walking without the phone. 

As this is the *Abstract*, many details are absent (but explained in the article itself).
Nonetheless, a lot can be learnt about the study from the *Abstract*:

* *Asking* the RQ:\index{Research question!repeated-measures}\tightlist
* This is a repeated-measures RQ: data are collected from the same students 'four times' (which are explained more fully in the article).
* The *population* is 'young people'.
* The numbers that follow the $\pm$ are not explained: are they confidence interval limits, standard deviations, IQRs, ranges, standard errors?
* The *units of analysis* are the individual students in the study:\index{Units of analysis}\index{Units of observation} each person has four measurements.
* The main outcome is the (average) 'walking velocity'.
* *Designing* the study:
* The sampling method is not stated, but likely to be voluntary-response.
* The sample size is $n = 42$.
* *Analyse* the data:
* A quantitative variable (walking velocity) is being compared *within* individuals, so paired $t$-tests are the likely method of analysis (Chap.\ \@ref(AnalysisPaired)).
* *Report the results*:
* Details of the analysis are not given (e.g., $P$-values or CIs).
* Nonetheless, the conclusion is that 'walking led to a significant reduction in velocity compared to walking without the phone'. 


### Introduction {#ReadWalkingTextingIntroduction}

The *Introduction* introduces the context for the study, and establishes what is known about the topic.
The *aim* of the study is (p.\ 1):

> ... to analyze how the use of a cell phone while walking at different velocities affects gait parameters, i.e., velocity, cadence, stride width, and stride length.

(Cadence refers to the tempo or rhythm of the walking, measure in steps per minute.)


### Materials and Methods {#ReadWalkingTextingMethods}

The *Material and Methods* section provides details of the study, including:

* The *sample*\index{Sample} comprises students from the University School of Physical Education (Poland) studying a course in gait analysis.\tightlist
This sample may not represent any general population, though the conclusions may possibly apply to non-students and not-Poles.\index{External validity}
* Exclusion criteria\index{Exclusion criteria} (e.g., using lower limb prostheses) and inclusion criteria\index{Inclusion criteria} (e.g., daily use of a cell phone while walking) were given.
* Extraneous variables\index{Variables!extraneous} collected included age, height, weight and sex. 
* The study was ethical,\index{Ethics} with permission sought from the students, and approval given by the Senate Committee on Research Ethics at the university.
* Control variables\index{Variables!control} included the temperature and humidity: 'the air temperature was constant at $22$\ degrees Celsius, and the air humidity was $47$%' (p.\ 3).
* Details of the specialised equipment used was given: 'The experiment was conducted using an FDM--1.5 Zebris dynamography platform'.
* Further details of the protocol\index{Protocol} used were given.
* Each subject participated in four tasks.
In each, the subject (without footwear) made as many passes on the platform as possible in one minute:
1. The subject walked at a constant *comfortable* velocity (i.e., the velocity, chosen by the subject, that the
person walks most naturally).
2. The subject walked at a constant *fast* velocity (i.e.,  as fast as the subject could comfortably walk, chosen by the subject).
3. Task\ 1 was repeated, with subjects continuously typing a sentence on a cell phone. 
4. Task\ 2 was repeated, with subjects continuously typing a sentence on a cell phone.
* Five response variables were used: left-side stride length, right-side stride length, cadence, stride width, and walking velocity.
These appear to have been measured *objectively*.\index{Objective data}
* The 'sentences' to be typed were defined as 'tongue twisters... not used in everyday conversation', but were not provided.
*  The analysis was given as 'paired Student’s $t$-test'\index{Data!paired}\index{Hypothesis testing!mean difference} in most cases, or the Wilcoxon test\index{Wilcoxon signed ranks test} if the statistical validity conditions were not satisfied.
* The software (and version) used was stated: TIBCO Statistica\textregistered\ 13.3.0 (StatSoft Poland).


### Results {#ReadWalkingTextingResults}

The *Results* section provides the results of the analyses, including:

* The data are not immediately available (probably due to ethics concerns) but 'are available upon request' (p.\ 7).
Details of the analysis were not available.\tightlist
* Case-profile plots\index{Graphs!case-profile plot} were produced for the five response variable, showing the means for each task rather than for all\ $42$ individuals (much like Fig.\ \@ref(fig:RunningRM)). 
* Correlations were computed between the five response variable.
The correlations between stride width and the other variables were negative; all other correlations were positive.
Since the relationship were non-linear, Spearman correlations\index{Correlation coefficient!Spearman} were used rather than the Pearson correlations studied in Chap.\ \@ref(CorrelationRegression).
All the corresponding $P$-values were all less than $0.05$ apart from the correlation between stride width and cadence.
* The following comparisons were made for each response variable:
1. Task\ 1 and Task\ 3 were compared: this explored the impact of texting on a smartphone when walking at a comfortable velocity.
2. Task\ 2 and Task\ 4 were compared: this explored the impact of texting on a smartphone when walking at a fast velocity.
3. Task\ 3 and Task\ 4 were compared: this explored the difference between texting and not texting when walking at a fast velocity.
* The mean and standard deviation of the three response variables was provided for each task. 
However, the numerical summary for the *differences* were not provided; a $P$-value only was provided.
* Fifteen hypothesis tests were conducted: the three between-tasks comparison listed above, for the five response variables.
* The results are summarised as (p.\ 5):

> ... right and left single step length and gait velocity, were found to be statistically significant in each comparison.
> The value of the change in step width was statistically significant only when comparing trials\ 1 and\ 3, and cadence showed statistical significance when comparing trials\ 2 and\ 4, as well as\ 3 and\ 4.

* 'Statistically significant'\index{Statistical significance} was defined as a $P$-value smaller than $0.017$, rather than the commonly-used $P < 0.05$.
The reason was to reduce the chance of making a Type\ I error (Sect.\ \@ref(TypeErrors)),\index{Type\ I error} but the details are beyond the scope of this book.
* The researchers also made qualitative observations.\index{Research!qualitative} 
For example: they observed 'moments when the subject took their eyes off the phone in order to assess the direction of the path' (p.\ 5).


### Discussion {#ReadWalkingTextingDiscussion}

The *Discussion* includes the following statements:

* The researchers concluded that 'the use of a cell phone while walking significantly affects gait parameters, causing a decrease in walking velocity and a reduction in stride length' (p.\ 6), thus providing answers to the RQs. 
* The researchers state (p.\ 6) that 'This proves that texting on a cell phone has a major impact on gait'.
However, the research does *not* prove anything based on one of countless possible samples.
* The researchers listed limitations\index{Limitations!research design (effectiveness)} of the study, including that the sentence typed by the subjects was unchanged in both Trials\ 3 and\ 4; hence, Trial\ 4 may have been easier than the previous trial as the sentence was familiar.
* The researchers made recommendations: 'that the selection of the order of trials and the sentence to be typed on the cell phone be randomized' (p.\ 7).\index{Confounding!random allocation}


## Chapter summary {#Read-ChapterSummary}

The six steps of research can be used as a scaffold for critiquing research articles.
Starting by reading the *Abstract* (or *Summary*, or *Overview*) for an overview, then the *Discussion*, and then skim the rest of the article (perhaps focusing on graphs and tables of results).
If necessary, read the article for details if needed.


## Quick review questions {#Read-QuickReview}


::: {.webex-check .webex-box}
Are these statements true or false?

1. Reading an article thoroughly, from start to finish, is the best approach.\tightlist
`r if( knitr::is_html_output() ) {torf( FALSE)}`
1. The six steps of research are a useful scaffold for critiquing an article.
`r if( knitr::is_html_output() ) {torf( TRUE) }`
1. Critiquing an article means to focus on finding all the problems.
`r if( knitr::is_html_output() ) {torf( FALSE)}`
:::


## Exercises {#ReadExercises}

[Answers to odd-numbered exercises] are given at the end of the book. 

`r if( knitr::is_latex_output() ) "\\captionsetup{font=small}"`

::: {.exercise #ReadExerciseiPhoneStepCounts}
@duncan2018walk examined the accuracy of step counts, as recorded on iPhones.
The article states that participants

> ... were recruited through word of mouth and posters displayed around the [researcher's] university.
> Participants were eligible if they were ambulatory, $\ge 18$ years of age, and owned an iPhone 6 [...] or newer model.

1. How would you describe the sampling method?
What is the implication?
2. How would you describe  the information given about the subjects needing to be ambulatory and 18 years of age or over?

Although $33$ participants were selected, the authors note some parts of the study used a smaller sample size because one subject lost their phone, while others chose to withdraw from the study.

3. Why did the authors discuss these changes in sample size for some parts of the study?

The article notes that previous studies have been able to:

> ... demonstrate the accuracy of the iPhone pedometer function in laboratory test conditions. However, no studies have attempted to evaluate evidence [...] in the field.

4. Describe the issue that the authors raise with previous studies, using the language in this book.
5. Among many other things, the researchers compared the *mean difference* between the number of step counts recorded by manually counting steps (mean: $92.6$) and the iPhone-recorded number of steps (mean: $85.4$). 
What statistical test would be appropriate?
6. What hypotheses are being tested?
7. While walking at $2.5\kms$.h^$-1$^, the above statistical test resulted in $t = 2.95$.
What is the approximate $P$-value?
Interpret the results.
8. The sample size for the analysis mentioned above was $n = 32$. 
Is the test statistically valid?
:::


::: {.exercise #ReadExerciseHeadphones}
@mohammadpoorasl2019prevalence studied the relationship between hearing loss, and headphone and earphone use in Iranian students, using a non-directional study.\index{Study types!directionality!non-directional}
The article states:

> ... $890$ students were randomly selected from five schools at <span style="font-variant:small-caps;">qums</span> [...] using a proportional cluster sampling method...

Only $866$ of the $890$ students agreed to participated in the study; of these, $745$ used *ear*phones.
The participants completed a hearing test and a Hearing Loss Questionnaire (HLQ; values between $17$ and $34$; higher scores indicating more severe hearing loss).

1. What is the population?
2. Is this an observational or experimental study?
3. Critique the sampling method.
What is the implication for interpreting the results of the study?

One question in the HLQ is:

> Does a hearing problem cause you difficulty when listening to TV or radio?

4. What is a potential problem with this question?
5. Compute the $95$% confidence interval for the proportion of students who had used earphones.

Some results are presented in Table\ \@ref(tab:HearingLossTable).

6. What statistical test was appropriate for comparing the mean scores for males and females?
7. What are the hypotheses being tested?
8. What is the standard error for the *difference* between the means?
9. Perform the hypothesis tests; what do the results mean?
10. Compute the approximate $95$% confidence interval for the difference between the means.
11. Are the test and the CI statistically valid?

Table\ \@ref(tab:HearingLossTable) also compares the HLQ scores for the frequency of *earphone* use specifically.

12. What are the hypotheses being tested?
13. Why is the sample size for this comparison only $791$ and not $845$?
14. Interpret the $P$-value for this test; what do the results mean?

Table\ \@ref(tab:HearingLossTable) also compares the HLQ scores for those who use and do not use *earphones*.

15. Form an approximate $95$%\ CI for the mean hearing loss score for students who use earphones. 
16. Compute the standard error of the *difference* between the mean hearing loss score for students who use and do earphones.
17. Perform a hypothesis test to compare the *difference* between the mean hearing loss score for students who use and do not use earphones, and confirm that the $P$-value is indeed very small. 
:::

<!-- http://jhealthscope.com/en/articles/65901.html -->


```{r HearingLossTable}
HearingTable <- array( dim = c(7, 5) )

HearingTable[, 1] <- c("Female", 
                       "Male", 
                       "Yes", 
                       "No", 
                       "$0$, $1$ times/day", 
                       "$2$ to $3$ times/day", 
                       "More than $3$ times/day")
HearingTable[, 2] <- c(543, 302, 745, 100, 194, 319, 278)
HearingTable[, 3] <- c(19.37, 19.99, 19.8, 19.0, 19.2, 19.6, 20.2)
HearingTable[, 4] <- c(2.91, 3.51, 3.08, 1.71, 2.87, 2.66, 3.54)
HearingTable[, 5] <- c("$0.009$", NA, "$< 0.001$", NA, "$0.001$", NA, NA)
tab.cap.HL <- "The Hearing Loss Questionnaire scores for various demographic variables."

# Swap rows around so the space inserted by kable is in a useful place
HearingTable[ c(1, 2, 6, 7, 3, 4, 5), ] <- HearingTable

colnames(HearingTable) <- c("Levels", 
                            "Sample size", 
                            "Mean", 
                            "Std dev.", 
                            "$P$-value")

if( knitr::is_latex_output() ) {
  kable(pad(HearingTable, 
            surroundMaths = TRUE, 
            targetLength = c(0 , 3 , 5 , 4 , 0), 
            decDigits = c(0 , 0 , 2 , 2 , 0)),
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE,
        align = c("r", "c", "c", "c", "c"),
        linesep = c("", "\\addlinespace", 
                    "", "", "\\addlinespace",
                    "",""), # Otherwise adds a space after five lines... which looks odd when there are only six
        col.names = colnames(HearingTable),
        caption = tab.cap.HL) %>%
    add_header_above( c(" " = 2,
                        "HLQ" = 2,
                        " " = 1),
                      bold = TRUE,
                      line = TRUE) %>%
    row_spec(row = 0, 
             bold = TRUE) %>%
    kable_styling(font_size = 8) %>%
    pack_rows("Sex",
              start_row = 1,
              end_row = 2) %>%
    pack_rows("Frequency of earphone use",
              start_row = 3,
              end_row = 5) %>%
    pack_rows("Earphone use",
              start_row = 6,
              end_row = 7)
}

if( knitr::is_html_output() ) {
  HearingTable[6, 5] <- "less than $0.001$"
  
  kable(pad(HearingTable, 
            surroundMaths = TRUE, 
            targetLength = c(0 , 3 , 5 , 4 , 0), 
            decDigits = c(0 , 0 , 2 , 2 , 0)),
        format = "html",
        longtable = FALSE,
        booktabs = TRUE,
        align = "c",
        escape = FALSE,
        col.names = colnames(HearingTable),
        caption = tab.cap.HL ) %>%
    add_header_above( c(" " = 2,
                        "HLQ" = 2,
                        " " = 1),
                      bold = TRUE,
                      line = TRUE) %>%
    column_spec(column = 1, 
                bold = TRUE) %>%
    row_spec(row = 0, 
             bold = TRUE) %>%
    pack_rows("Sex",
              start_row = 1,
              end_row = 2) %>%
    pack_rows("Frequency of earphone use",
              start_row = 3,
              end_row = 5) %>%
    pack_rows("Earphone use",
              start_row = 6,
              end_row = 7)
}
```


::: {.exercise #ReadExerciseQuakes}
@mesrkanlou2023effect studied the effect of an earthquake on pregnant mothers in Varzaghan, Iran (p.\ 2), using:

> ... $1000$ cases of pregnant women living in urban and rural areas of Varzaghan city that consisted of $550$ pre-earthquake and $450$ post-earthquake cases.

The researchers compared the mothers in the two groups (pre- and post-earthquake) on various measurements.
For example, the mean age of mothers in the *pre*-group was $25.82$\ y, and the mean age of the mothers in the *post*-group was $26.71$\ y; the difference has a $P$-value of $0.084$.

1. What does this result *mean*?
2. *Why* did the researchers make this comparison between the mothers' ages in two groups?
3. What type of hypothesis test was used to make this conclusion?

The researchers also compared the mean birth weights of the babies born to the mothers in the two groups.
In the *pre*-group, the mean birth weight was $3.25\kgs$ ($s = 0.52$) and in the *post*-group the mean birth weight was $3.18\kgs$ ($s = 0.54$).

4. Compute the standard error for comparing the *difference* between the two means.
5. Perform a hypothesis test to compare the mean birthweights.
Interpret the results.
6. The two-tailed $P$-value for this test as given as $0.001$.
Is this consistent with your calculations?

The researchers also compared the percentage of babies with a Low Birth Weight (LBW; less than $2.5\kgs$).
For the *pre*-group, the percentage was $6.01$%; for the *post*-group, the percentage was $8.92$%.
<!-- The $P$-value is given as $< 0.001$. -->

7. What *type* of definition is given for LBW?
8. Construct the $2\times 2$ table for displaying these data.
9. What type of test was probably used for this comparison?
10. For the test, $\chi^2 = 3.052$.
Deduce the equivalent $z$-score and the approximate $P$-value.
11. What limitations can you identify for this study?
:::


::: {.exercise #ReadExerciseSelinium}
@tracy1990selenium studied the selenium (Se) concentration in irrigation and stock water sources in California.
For drinking water, the maximum recommended concentration was $10$\ $\mu$g.L^$-1$^; for irrigation water, the maximum recommended concentration was $20$\ $\mu$g.L^$-1$^

Part of the study examined the area within $5\kms$ of wells.
When Pliocene rocks were within this radius, the relationship between the Se concentration $y$ in the water and the electrical conductivity of the water $x$ (in deciSiemens per meter, dS.m^$-1$^) was $\hat{y} = -3.1 + 7.0x$, where $R^2 = 27$%.

1. Interpret the meaning of $R^2$.
2. What is the value of the correlation coefficient?
3. The $P$-value for testing the slope is $P < 0.001$. 
Interpret what this means in this context.
4. What are the measurement units of the slope?

For the $n = 151$ wells in the study, Table\ \@ref(tab:seleniumTable) shows the selenium concentration of the water and the geology within $5\kms$ of the well.

5. What hypotheses are being tested by the table?
6. The article states that $\chi^2 = 31.5$.
What is the equivalent $z$-score for the test?
7. What is the approximate $P$-value for the test?
Interpret what this means.
:::

```{r seleniumTable}
seleniumTab <- array(dim = c(2, 2))

seleniumTab[1, ] <- c(78, 15)
seleniumTab[2, ] <- c(23, 35)

rownames(seleniumTab) <- c("Se concentration $\\le 2$ $\\mu$g.L$^{-1}$",
                           "Se concentration $> 2$ $\\mu$g.L$^{-1}$")
colnames(seleniumTab) <- c("No", 
                           "Yes")

if( knitr::is_latex_output() ) {
  kable(pad(seleniumTab,
            surroundMaths = TRUE,
            targetLength = c(2, 2),
            decDigits = 0),
        format = "latex",
        booktabs = TRUE,
        col.names = c("Plioscene rocks present",
                      "Plioscene rocks not present"),
        longtable = FALSE,
        escape = FALSE,
        align = "c",
        caption = "Number of wells with dissolved Se above $2$ $\\mu$g.L$^{-1}$, and the geology within $5$\\kms."
  ) %>%
    column_spec(1, bold = TRUE)  %>%
    row_spec(0, bold = TRUE) %>%
    kable_styling(font_size = 8)
}


if( knitr::is_html_output() ) {
  rownames(seleniumTab) <- c("Se concentration $\\le 2$ $\\mu$g.L$^{-1}$",
                             "Se concentration greater than $2$ $\\mu$g.L$^{-1}$")

  kable(pad(seleniumTab,
            surroundMaths = TRUE,
            targetLength = c(2, 2),
            decDigits = 0),
        format = "html",
        booktabs = TRUE,
        longtable = FALSE,
        align ="c",
        caption = "Number of wells with dissolved Se above $2$ $\\mu$g.L$^{-1}$, and the geology within $5$\\kms radius of the well.")
}   
```

```{r}
# seleniumBars <- c(69, 27, 29, 15, 5, 2, 0, 1)
# 
# barplot( seleniumBars,
#          space = 0,
#          width = c(1, 2, 7, 10, 30, 30, 70, 120),
#          names.arg = c("0 to under 1",
#                        "1 to under 3",
#                        "3 to under 10",
#                        "10 to under 20",
#                        "20 to under 50",
#                        "50 to under 80",
#                        "80 to under 150",
#                        "150 to under 270")
# )

```


::: {.exercise #ReadExerciseLarva}
@russell2023difference compared the larvae of two types of mosquitoes: *Ae. albopictus* (an invasive specie) and *Cx. pipiens* (a native species).
One study compared the survival rates of the larvae at two temperatures (p.\ 4).
At $15$^o^C and $25$^o^C, the survival rates were $86.8$% and $86.1$%, respectively.
The papers states that these survival rates 'did not differ significantly', and quoted a $P$-value of $P =  0.8076$.

1. What type of test was probably used?
2. Interpret what the $P$-value means in this context.

The researchers also compared the size of the surviving larvae (p.\ 4 and\ 5) in the two control groups, using a two sample $t$-test.
They found that control larvae were 'significantly larger' for *Cx. pipiens* compared to *Ae. albopictus* larvae.
The paper then gives this information:

> $\text{mean}\pm\text{SD}$: *Cx. pipiens*${} = 1.64 \pm 0.18\mms$, *Ae. albopictus*${} = 1.36 \pm 0.13\mms$; $\text{$p$-value} =< .0001$.

The two sample sizes are $n = 410$ and $n = 498$ respectively.

3. How would these results be interpreted?
4. What type of test would probably have been used?
5. Compute the standard error for the difference between the two types of mosquitoes.
6. Compute the $t$-score and approximate $P$-value for the test. 
What does the mean?
7. Is the $P$-value in the article consistent with your calculations?
8. Is the test statistically valid?

The length of the surviving larvae from both species were compared for the two temperatures also (p.\ 5):
For surviving *Cx. pipiens*, control larvae were larger at $15$^o^C compared to $25$^o^C. 
The paper reports:
  
> $\text{mean} \pm \text{SD}$: $15$^o^C ${}= 1.66 \pm 0.01\mms$, $25$^o^C ${} = 1.60 \pm 0.02\mms$; $\text{$p$-value} = .0065$. 

For surviving *Ae. albopictus*, control larvae were larger at $15$^o^C compared to $25$^o^C.
The paper reports:
  
> $\text{mean} \pm \text{SD}$: $15$^o^C ${}= 1.66 \pm 0.01\mms$, $25$^o^C ${} = 1.60 \pm 0.02\mms$; $\text{$p$-value} = .0065$. 

9. How would these results be interpreted?
10. What type of test would probably have been used?

*Megacyclops viridis* (a copepod) preys on the larvae.
The linear association between predation efficiency ($y$; as a percentage) and predator--prey size-ratio ($x$; no units) was found (using $n = 45$) to be $\hat{y} = -19.56 + 31.64x$.
The standard errors of the two regression coefficients were $17.92$ (intercept) and $13.88$ (slope).

11. Find an approximate $95$% confidence interval for each regression parameter.
12. Estimate the $P$-value for testing if the population slope is zero. 
Interpret what this means.
13. Is this test statistically valid?
14. Interpret the meaning of the slope.
15. The value of $R^2$ was given as $0.087$ (i.e., $8.7$%).
Interpret this value.
16. Find the value of the correlation coefficient, $r$.
:::


::: {.exercise #ReadExerciseMouth}
@li2017normal studied the maximum mouth opening (MMO; in mm) for $452$ Chinese adults aged from $20$ to\ $35$.

1. Would the individuals in the study have been blinded?
Explain.
What are the implications?

The correlation between height and MMO was given as $r = 0.54$ with $P < 0.001$.

2. What does this *mean*?
3. Compute and interpret the value of $R^2$.

The regression equation relating the height $x$ (in cm) and MMO $y$ was given as $\hat{y} = 0.36x - 10.15$.

4. Interpret the estimates of the regression parameters.
5. Use the regression equation to predict the MMO for a person $179\cms$ tall.

The mean MMO of males was $54.18\mms$ ($s = 5.21$), and for females was $49.62\mms$ ($s = 3.69$).

6. What *type* of hypothesis tests was used to compare the mean MMO for males and females?
7. The $t$-score for comparing MMO for males and females is $t = 10.63$.
What is the $P$-value?
8. Is this result statistically valid?
9. What is the meaning of this comparison?
10. Is gender likely to be a confounding variable in this regression analysis?
Explain carefully.

The authors state one of the limitations as:

> First, participants were recruited from a pool of people who were undergoing regular medical examinations in our hospital [...]

11. What does this mean?
What are the implications?
Are there other limitations?
:::


::: {.exercise #ReadExerciseTomatoes}
@drinkwater1995fundamental compared tomatoes growing on conventional (CNV; $n = 14$) and organic (ORG; $n = 17$) farms.
Between\ 1989 and\ 1990, the researchers sampled tomato fields during April and September (p.\ $1\,100$).
An area between $0.04$ and\ $0.1\has$ was set aside within each field for collecting data.
Each area was divided into $20$\ sections then, a $1\ms$ row was selected at random within each section to be sampled.

1. Explain what type of sampling is being used.

One important measure of soil health is the number of actinomycetes.
When comparing ORG and CNV, the researchers found that the (p.\ $1\,103$):

> ... total numbers of actinomycetes [...] were significantly larger in the ORG soils [...] (Student's $t$ test, $t = 5.4$, $P = 0.006$)...

2. What type of test would probably have been used to reach this conclusion?
3. Explain what the results mean.
4. Are the results statistically valid?

The researchers also found that (p.\ $1\,103$):

> ... starch hydrolyzing actinomycetes were more numerous in CNV [...] (Student's $t$ test, $t = 4.0$, $P = 0.005$).

5. What type of test would probably have been used to reach this conclusion?
6. Explain what these results mean.

They also found that (p.\ $1\,103$):

> Total actinomycete abundance [was] negatively correlated with corky root [a disease] severity ($r = -0.76$, $P = 0.08$;...).

7. Explain what these results mean.
8. Compute and interpret the value of $R^2$.
:::


::: {.exercise #ReadExerciseEyeMasks}
@teo2022eye studied pregnant Malaysian women with sleeping disruptions in the last month of pregnancy.
The $56$ patients were (p.\ 1):

> ... randomized to the use of eye-mask and earplugs or "sham" headbands during night sleep (both introduced as sleep aids). 

Thus, two groups were used: one using eye-masks and earplugs (treatment group, T; $n = 29$) and one using sham headbands (control or placebo group, P; $n = 27$).

1. What was the purpose of using 'sham' headbands if it was an ineffective intervention?
2. What type of study is this: experimental or observational?
Explain.

Sleep duration was measured in Week\ 1 (no intervention) and again in Week\ 2 (with the allocated intervention) for each subject, using a 'wrist actigraphy monitor'.

3. Why is using a 'wrist actigraphy monitor' better than self-reported sleep duration?

The women in the two groups were compared.
For example, the mean age of the women was $30.6$\ y ($s = 3.6$) (T) and $30.1$\ y ($s = 3.3$) (P); the $P$-value for the comparison was given as $P = 0.56$.

4. *Why* was this comparison made?
5. Compute the standard error for the difference between the two mean sleep durations.
6. Compute the $t$-score for the test.
7. Is the quoted $P$-value consistent with your calculations?
What do these results mean?
8. Is the result statistically valid?

Another comparison was the room 'condition' where the women slept: in the treatment group, $13$ had a room with a fan ($16$ had air conditioning), while in the control group $10$ women had a fan (and $17$ air conditioning).
The $P$-value for the comparison was given as $P = 0.60$.

9. *Why* was this comparison made?
10. Construct the $2\times 2$ table summarising the data.
11. The $\chi^2$-score for the test is $0.35064$.
<!-- chisq.test( matrix( c(13,10,16,17), ncol=2), correct=FALSE)-->
Compute the equivalent $z$-score.
Interpret the results.
12. Is the quoted $P$-value consistent with your calculations?
13. Is the result statistically valid?

In the *treatment* group, the mean sleep duration in Week\ 1 was $279.0\mins$ ($s = 18.9$) and in Week\ 2 was $303.6\mins$ ($s = 18.8$).
The *increase* was $24.7\mins$ ($s = 14.9$).

14. Test if sleep duration *increased* in the treatment group.
Interpret the results mean.

In the *control* group, the mean sleep duration in Week\ 1 was $286.3\mins$ ($s = 20.9$) and in Week\ 2 was $301.9\mins$ ($s = 21.8$; $n = 26$).
The *increase* was $18.1\mins$ ($s = 17.3$).

15. Test if sleep duration *increased* in the control group.
Interpret the results.
16. Why would sleep duration *increase*, if the control group used an ineffective intervention? 

The *increase* in sleep duration can be compared for the two groups.

17. Compute the standard error for difference between the mean increases $\text{s.e.}( \bar{x}_T - \bar{x}_T)$.
18. Compare the increase in sleep duration for the two groups.
Interpret the results.
19. Is the test statistically valid?
:::


<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
**Answers to *Quick Revision* questions:**
**1.** False.
**2.** True.
**3.** False.
:::
`r if (knitr::is_html_output()) '-->'`