forked from PeterKDunn/SRM-Textbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
34-Testing-Selecting.Rmd
228 lines (193 loc) · 10.4 KB
/
34-Testing-Selecting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# Selecting a test {#SelectTest}
Selecting the correct hypothesis test (or confidence interval) can be tricky... and in this book only a small number of possible scenarios are described.
For the situations studied in this book, determining if the response and explanatory *variables* are qualitative or quantitative is important (Table \@ref(tab:InferenceTestCI)).
So far, only situations with a *qualitative* explanatory variable have been considered.
In the next three chapters, cases where both the response and explanatory variables are *quantitative* are studied.
**Appendix \@ref(StatisticsAndParameters)** may also prove useful.
```{r InferenceTestCI}
Scenarios <- array( dim = c(5, 4) )
colnames(Scenarios) <- c("Graphical summary",
"Numerical summary",
"Hypothesis test",
"Confidence interval")
if( knitr::is_latex_output() ) {
Scenarios[1, ] <- c("\\stackunder{Bar chart;}{pie chart}",
"\\stackunder{Counts;}{\\stackunder{percentages;}{odds}}",
"One-sample $z$-test",
"CI for one proportion")
Scenarios[2, ] <- c("\\stackunder{Histogram;}{\\stackunder{stemplot;}{dot chart}}",
"\\stackunder{Means, medians;}{\\stackunder{Std. dev., IQR;}{outliers, etc.}}",
"One-sample $t$-test",
"CI for one mean")
Scenarios[3, ] <- c("\\stackunder{Histogram of \\emph{differences};}{case-profile}",
"\\stackunder{Mean, median of differences;}{\\stackunder{Std. dev., IQR of differencess;}{Outliers, etc.}}",
"$t$-test for mean differences",
"CI for mean difference")
Scenarios[4, ] <- c("\\stackunder{Boxplot;}{error bar chart}",
"\\stackunder{Difference between means;}{\\stackunder{SE of difference;}{Summary of both groups}}",
# "Mean and std. error of the difference; mean, std. dev. etc. of \\emph{each} group",
"$t$-test for the difference between two means",
"CI of the difference between two means")
Scenarios[5, ] <- c("\\stackunder{Side-by-side bar chart;}{stacked bar chart}",
"\\stackunder{Odds;}{\\stackunder{OR;}{percentages}}",
"Chi-square test",
"CI for ORs")
kable(Scenarios,
booktabs = TRUE,
longtable = FALSE,
escape = FALSE,
caption = "Five different scenarios studied so far",
format = "latex") %>%
kable_styling("striped",
full_width = TRUE,
font_size = 9) %>%
pack_rows("Proportion in one sample",
start_row = 1,
end_row = 1,
bold = FALSE,
italic = TRUE) %>%
pack_rows("Mean of one sample",
start_row = 2,
end_row = 2,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
pack_rows("Mean of differences (paired data)",
start_row = 3,
end_row = 3,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
pack_rows("Comparing means in two groups",
start_row = 4,
end_row = 4,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
pack_rows("Comparing odds/percentages in two groups",
start_row = 5,
end_row = 5,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
row_spec(0, bold = TRUE)
}
if( knitr::is_html_output() ) {
Scenarios[1, ] <- c("[Bar charts; pie chart](#GraphsOneQual)",
"[Counts; percentages; odds](#NumericalQual)",
"[One-sample $z$](#TestOneProportion)",
"[CI for one mean](#CIOneProportion)")
Scenarios[2, ] <- c("[Histogram; stemplot; dot chart](#GraphsOneQuant)",
"[Means, medians; Std. dev., IQR; etc.](#NumericalQuant)",
"[One-sample $t$](#TestOneMean)",
"[CI for one mean](#OneMeanConfInterval)")
Scenarios[3, ] <- c("[Histogram of *differences*](#HistoDiffPlot); [case-profile](#CaseProfilePlot)",
"[Mean, std. dev. etc. of *differences*](#NumericalQuant)",
"[$t$-test for mean differences](#TestPairedMeans)",
"[CI for mean difference](#PairedCI)")
Scenarios[4, ] <- c("[Error bar chart](#ErrorBarCharts)",
"[Mean and std. error of the difference; mean, std. dev. etc. of *each* group](#NumericalQuant)",
"[$t$-test for the difference between two means](#TestTwoMeans)",
"[CI of the difference between two means](#CITwoMeans)")
Scenarios[5, ] <- c("[Side-by-side bar chart; stacked bar chart](#TwoQualVars)",
"[Odds; OR; percentages](#NumericalQual)",
"[Chi-square test](#TestsOddsRatio)",
"[CI for ORs](#OddsRatiosCI)")
kable(Scenarios,
booktabs = TRUE,
longtable = FALSE,
escape = FALSE,
caption = "Four different scenarios studied so far",
format = "html") %>%
kable_styling("striped",
full_width = TRUE) %>%
column_spec(column = 1, width = "25mm") %>%
column_spec(column = 2, width = "25mm") %>%
column_spec(column = 3, width = "25mm") %>%
column_spec(column = 4, width = "25mm") %>%
pack_rows("Proportion in one sample",
start_row = 1,
end_row = 1,
bold = FALSE,
italic = TRUE) %>%
pack_rows("Mean of one sample",
start_row = 2,
end_row = 2,
bold = FALSE,
italic = TRUE) %>%
pack_rows("Mean of differences (paired data)",
start_row = 3,
end_row = 3,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
pack_rows("Comparing means in two groups",
start_row = 4,
end_row = 4,
bold = FALSE,
italic = TRUE,
hline_before = TRUE) %>%
pack_rows("Comparing odds/percentages in two groups",
start_row = 5,
end_row = 5,
bold = FALSE,
italic = TRUE,
hline_before = TRUE)
}
```
::: {.thinkBox .think data-latex="{iconmonstr-light-bulb-2-240.png}"}
1. Suppose researchers compare the average number of hours of exercise per week for office workers, both in summer and in winter, to see if the averages are different.
What would be a suitable test?
`r if( knitr::is_html_output() ) {
mcq( c(answer = "A paired samples t-test (for a mean difference)",
"A chi-squared test (to compare two proportions)",
"A two-sample t-test (to compare the means of the two groups)"))}`
2. Suppose we wish to compare the number of hours of sunlight exposure per day for female and male teachers.
What would be a suitable test?
`r if( knitr::is_html_output() ) {
mcq( c("A paired samples t-test (for a mean difference)",
"A chi-squared test (to compare two proportions)",
answer = "A two-sample t-test (to compare the means of the two groups)"))}`
3. Suppose researchers wish to compare the proportion of trees with koalas in them, comparing trees more than 10 metres tall with trees 10 metres or shorter.
What would be a suitable test?
`r if( knitr::is_html_output() ) {
mcq( c("A paired samples t-test (for a mean difference)",
answer = "A chi-squared test (to compare two proportions)",
"A two-sample t-test (to compare the means of the two groups)"))}`
4. Suppose researchers are wanting to compare the number of hours spend on social media for people aged over 30, to people aged 30 and under. What would be a suitable test?
`r if( knitr::is_html_output() ) {
mcq( c("A paired samples t-test (for a mean difference)",
"A chi-squared test (to compare two proportions)",
answer = "A two-sample t-test (to compare the means of the two groups)"))}`
`r if (!knitr::is_html_output()) '<!--'`
`r webexercises::hide()`
To select the correct test, it is important to know how many
`r if( knitr::is_html_output() ) { mcq( c("observations", answer = "variables"))}`
are measured, observed, or recorded on each unit of
`r if( knitr::is_html_output() ) { mcq( c("observation", answer = "analysis"))}`, and what type they are.
If one quantitative variable is recorded, we can conduct a test about the
`r if( knitr::is_html_output() ) { mcq( c(answer = "mean", "proportions"))}`.
If two variables are recorded, there are a lot of possible options.
If **both** variables are qualitative, we could use a
`r if( knitr::is_html_output() ) { mcq( c("t-test", answer = "chi-squared test"))}`
to compare the odds (or the proportions) in the two groups.
If one variable is qualitative and one is quantitative, we could use a
`r if( knitr::is_html_output() ) { mcq( c(answer = "t-test", "chi-squared test"))}` to compare the
`r if( knitr::is_html_output() ) { mcq( c(answer = "means", "odds"))}` in both groups.
If the **change** in the value of a quantitative variable is of interest, we have **paired** data so we could use a $t$-test, based on the
`r if( knitr::is_html_output() ) { mcq( c( "means", answer = "mean difference", "odds"))}`.
`r if (!knitr::is_html_output()) '-->'`
`r webexercises::unhide()`
:::
`r if (knitr::is_html_output()){
'The following short video may help explain some of these concepts. Note that the test for correlation and regression have not yet been covered in this book (but they will be in the next few chapters).'
}`
<div style="text-align:center;">
```{r}
htmltools::tags$video(src = "./videos/SelectTest.mp4",
width = "550",
controls = "controls",
loop = "loop",
style = "padding:5px; border: 2px solid gray;")
```
</div>