34-Testing-Selecting.Rmd

# Selecting a test {#SelectTest}

Selecting the correct hypothesis test (or confidence interval) can be tricky... and in this book only a small number of possible scenarios are described.
For the situations studied in this book, determining if the response and explanatory *variables* are qualitative or quantitative is important (Table \@ref(tab:InferenceTestCI)).

So far, only situations with a *qualitative* explanatory variable have been considered.
In the next three chapters, cases where both the response and explanatory variables  are *quantitative* are studied.
**Appendix \@ref(StatisticsAndParameters)** may also prove useful.


```{r InferenceTestCI}
Scenarios <- array( dim = c(5, 4) )
colnames(Scenarios) <- c("Graphical summary",
                         "Numerical summary",
                         "Hypothesis test",
                         "Confidence interval")

if( knitr::is_latex_output() ) {
  Scenarios[1, ] <- c("\\stackunder{Bar chart;}{pie chart}",
                      "\\stackunder{Counts;}{\\stackunder{percentages;}{odds}}",
                      "One-sample $z$-test",
                      "CI for one proportion")
  Scenarios[2, ] <- c("\\stackunder{Histogram;}{\\stackunder{stemplot;}{dot chart}}",
                      "\\stackunder{Means, medians;}{\\stackunder{Std. dev., IQR;}{outliers, etc.}}",
                      "One-sample $t$-test",
                      "CI for one mean")
  Scenarios[3, ] <- c("\\stackunder{Histogram of \\emph{differences};}{case-profile}",
                      "\\stackunder{Mean, median of differences;}{\\stackunder{Std. dev., IQR of differencess;}{Outliers, etc.}}",
                      "$t$-test for mean differences",
                      "CI for mean difference")
  Scenarios[4, ] <- c("\\stackunder{Boxplot;}{error bar chart}",
                      "\\stackunder{Difference between means;}{\\stackunder{SE of difference;}{Summary of both groups}}",
#                      "Mean and std. error of the difference; mean, std. dev. etc. of \\emph{each} group",
                      "$t$-test for the difference between two means",
                      "CI of the difference between two means")
  Scenarios[5, ] <- c("\\stackunder{Side-by-side bar chart;}{stacked bar chart}",
                      "\\stackunder{Odds;}{\\stackunder{OR;}{percentages}}",
                      "Chi-square test",
                      "CI for ORs")

  kable(Scenarios,
        booktabs = TRUE,
        longtable = FALSE,
        escape = FALSE,
        caption = "Five different scenarios studied so far",
        format = "latex") %>%
     kable_styling("striped", 
                   full_width = TRUE, 
                   font_size = 9) %>%
    
     pack_rows("Proportion in one sample",                         
                start_row = 1, 
                end_row = 1, 
                bold = FALSE, 
                italic = TRUE) %>%
     pack_rows("Mean of one sample",                         
                start_row = 2, 
                end_row = 2, 
                bold = FALSE, 
                italic = TRUE,
                hline_before = TRUE) %>%
     pack_rows("Mean of differences (paired data)",          
               start_row = 3, 
               end_row = 3, 
               bold = FALSE, 
               italic = TRUE, 
               hline_before = TRUE) %>%
     pack_rows("Comparing means in two groups",   
               start_row = 4, 
               end_row = 4, 
               bold = FALSE, 
               italic = TRUE, 
               hline_before = TRUE) %>%
     pack_rows("Comparing odds/percentages in two groups",              
               start_row = 5, 
               end_row = 5, 
               bold = FALSE,
               italic = TRUE,
               hline_before = TRUE) %>%
     row_spec(0, bold = TRUE)

}                 
                    
if( knitr::is_html_output() ) {
  
  Scenarios[1, ] <- c("[Bar charts; pie chart](#GraphsOneQual)",
                      "[Counts; percentages; odds](#NumericalQual)",
                      "[One-sample $z$](#TestOneProportion)",
                      "[CI for one mean](#CIOneProportion)")
  Scenarios[2, ] <- c("[Histogram; stemplot; dot chart](#GraphsOneQuant)",
                      "[Means, medians; Std. dev., IQR; etc.](#NumericalQuant)",
                      "[One-sample $t$](#TestOneMean)",
                      "[CI for one mean](#OneMeanConfInterval)")
  Scenarios[3, ] <- c("[Histogram of *differences*](#HistoDiffPlot); [case-profile](#CaseProfilePlot)",
                      "[Mean, std. dev. etc. of *differences*](#NumericalQuant)",
                      "[$t$-test for mean differences](#TestPairedMeans)",
                      "[CI for mean difference](#PairedCI)")
  Scenarios[4, ] <- c("[Error bar chart](#ErrorBarCharts)",
                     "[Mean and std. error of the difference; mean, std. dev. etc. of *each* group](#NumericalQuant)",
                     "[$t$-test for the difference between two means](#TestTwoMeans)",
                     "[CI of the difference between two means](#CITwoMeans)")
  Scenarios[5, ] <- c("[Side-by-side bar chart; stacked bar chart](#TwoQualVars)",
                      "[Odds; OR; percentages](#NumericalQual)",
                      "[Chi-square test](#TestsOddsRatio)",
                      "[CI for ORs](#OddsRatiosCI)")

  
  kable(Scenarios,
               booktabs = TRUE,
               longtable = FALSE,
               escape = FALSE,
               caption = "Four different scenarios studied so far",
               format = "html") %>%
      kable_styling("striped", 
                    full_width = TRUE) %>%
      column_spec(column = 1, width = "25mm") %>%
      column_spec(column = 2, width = "25mm") %>%
      column_spec(column = 3, width = "25mm") %>%
      column_spec(column = 4, width = "25mm") %>%
     pack_rows("Proportion in one sample",                         
                start_row = 1, 
                end_row = 1, 
                bold = FALSE, 
                italic = TRUE) %>%
      pack_rows("Mean of one sample",                         
                start_row = 2, 
                end_row = 2, 
                bold = FALSE, 
                italic = TRUE) %>%
      pack_rows("Mean of differences (paired data)",          
                start_row = 3, 
                end_row = 3, 
                bold = FALSE, 
                italic = TRUE, 
                hline_before = TRUE) %>%
      pack_rows("Comparing means in two groups",   
                start_row = 4, 
                end_row = 4, 
                bold = FALSE, 
                italic = TRUE, 
                hline_before = TRUE) %>%
      pack_rows("Comparing odds/percentages in two groups",              
                start_row = 5, 
                end_row = 5, 
                bold = FALSE, 
                italic = TRUE, 
                hline_before = TRUE)
}

```


::: {.thinkBox .think data-latex="{iconmonstr-light-bulb-2-240.png}"}
1. Suppose researchers compare the average number of hours of exercise per week for office workers, both in summer and in winter, to see if the averages are different.  
What would be a suitable test?
`r if( knitr::is_html_output() ) {
	 mcq( c(answer = "A paired samples t-test (for a mean difference)",
 	        "A chi-squared test (to compare two proportions)",
  	      "A two-sample t-test (to compare the means of the two groups)"))}`   


2. Suppose we wish to compare the number of hours of sunlight exposure per day for female and male teachers.  
What would be a suitable test?
`r if( knitr::is_html_output() ) {
	 mcq( c("A paired samples t-test (for a mean difference)",
 	        "A chi-squared test (to compare two proportions)",
  	      answer = "A two-sample t-test (to compare the means of the two groups)"))}`   

3. Suppose researchers wish to compare the proportion of trees with koalas in them, comparing trees more than 10 metres tall with trees 10 metres or shorter.  
What would be a suitable test?
`r if( knitr::is_html_output() ) {
	 mcq( c("A paired samples t-test (for a mean difference)",
 	        answer = "A chi-squared test (to compare two proportions)",
  	      "A two-sample t-test (to compare the means of the two groups)"))}`   

4. Suppose researchers are wanting to compare the number of hours spend on social media for people aged over 30, to people aged 30 and under.  What would be a suitable test?
`r if( knitr::is_html_output() ) {
	 mcq( c("A paired samples t-test (for a mean difference)",
 	        "A chi-squared test (to compare two proportions)",
  	      answer = "A two-sample t-test (to compare the means of the two groups)"))}`   

`r if (!knitr::is_html_output()) '<!--'`
`r webexercises::hide()`
To select the correct test, it is important to know how many 
`r if( knitr::is_html_output() ) { mcq( c("observations", answer = "variables"))}`
are measured, observed, or recorded on each unit of 
`r if( knitr::is_html_output() ) { mcq( c("observation", answer = "analysis"))}`, and what type they are.  
If one quantitative variable is recorded, we can conduct a test about the 
`r if( knitr::is_html_output() ) { mcq( c(answer = "mean", "proportions"))}`.

If two variables are recorded, there are a lot of possible options.

If **both** variables are qualitative, we could use a
`r if( knitr::is_html_output() ) { mcq( c("t-test", answer = "chi-squared test"))}`
to compare the odds (or the proportions) in the two groups.

If one variable is qualitative and one is quantitative, we could use a
`r if( knitr::is_html_output() ) { mcq( c(answer = "t-test", "chi-squared test"))}` to compare the 
`r if( knitr::is_html_output() ) { mcq( c(answer = "means", "odds"))}` in both groups.

If the **change** in the value of a quantitative variable is of interest, we have **paired** data so we could use a $t$-test, based on the 
`r if( knitr::is_html_output() ) { mcq( c( "means", answer = "mean difference", "odds"))}`.
`r if (!knitr::is_html_output()) '-->'`
`r webexercises::unhide()`
:::


`r if (knitr::is_html_output()){
  'The following short video may help explain some of these concepts. Note that the test for correlation and regression have not yet been covered in this book (but they will be in the next few chapters).'
}`


<div style="text-align:center;">
```{r}
htmltools::tags$video(src = "./videos/SelectTest.mp4", 
                      width = "550", 
                      controls = "controls", 
                      loop = "loop", 
                      style = "padding:5px; border: 2px solid gray;")
```
</div>