26-Testing-OneProportion.Rmd

# Hypothesis tests: one proportion {#TestOneProportion}


<!-- Introductions; easier to separate by format -->
```{r, child = if (knitr::is_html_output()) {'./introductions/26-Testing-OneProportion-HTML.Rmd'} else {'./introductions/26-Testing-OneProportion-LaTeX.Rmd'}}
```


<!-- Define colours as appropriate -->
```{r, child = if (knitr::is_html_output()) {'./children/coloursHTML.Rmd'} else {'./children/coloursLaTeX.Rmd'}}
```


## Introduction: rolling dice {#ProportionTestIntro}
\index{Hypothesis testing!one proportion|(}

<div style="float:right; width: 222x; border: 1px; padding:10px"><img src="OtherImages/SmiffyDice-Rotated.png" width="200px"/></div>


`r if (knitr::is_html_output()) '<!--'`
\begin{wrapfigure}[3]{R}{.30\textwidth} % The first optional input is the number if lines allowed for the inage to be placed in
  \centering%
  \vspace{-16pt}% This removes some white space
  \includegraphics[width=.27\textwidth]{OtherImages/SmiffyDice-Rotated.png}%
\end{wrapfigure}
`r if (knitr::is_html_output()) '-->'`


When in a toy store one day (for my children, of course), I saw 'loaded dice' for sale.
The packaging claimed <span style="font-variant:small-caps;">one loaded \& one normal</span>.
I bought two sets!
However, there was no indication as to *which* die was the loaded die.
How could I determine which of the dice was loaded?
That is, how could I make a *decision* about which die was loaded?

For a die that is *not* loaded, the population proportion of rolling any face of the die is $p = 1/6$.
So, for example, the population proportion of rolls that show a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
is $p = 1/6$, using the classical approach to probability.\index{Probability!classical approach}
In any *sample* of rolls, however, the proportion of rolls showing a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
would vary due to sampling variation, but would be approximately $\hat{p} = 1/6$ with a fair die.

Suppose I rolled one die a certain number of times (say, $n = 50$\ times), then determined the value of the sample proportion.
The sample proportion of rolls that show a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
is unlikely to be *exactly* $1/6$ (the population proportion).
If the observed value of $\hat{p}$ was not exactly $1/6$, two possible reasons could explain this discrepancy:

* I was rolling the *fair* die (with $p = 1/6$), and the discrepancy between the *population* and *sample* proportions was simply due to sampling variation.
* I was rolling the *loaded* die  (with $p \ne 1/6$), and the discrepancy between the *population* and *sample* proportion simply reflected this.

If I observed an unusually small or unusually large sample proportion of rolls that showed a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1},'
} else {
   '<span class="larger-die">&#9856;</span>,'
}`
I would suspect that I had the loaded die: I was observing something unusual from a fair die.
This is exactly the decision-making process seen in Chap.\ \@ref(MakingDecisions).

More formally then, the decision-making process (Chap.\ \@ref(MakingDecisions)) could proceed as follows:

* Make an *assumption* about the parameter (Sect.\ \@ref(Assumption)): assume I have a fair die, so that $p = 1/6$, where $p$ is the population proportion of rolls that show a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}.'
} else {
   '<span class="larger-die">&#9856;</span>.'
}`
* Describe the *expectations* of the statistic (Sect.\ \@ref(ExpectationOf)): describe what value of the *sample* proportion $\hat{p}$ could reasonably be expected from a fair die.
* Take the sample *observations* (Sect.\ \@ref(Observation)): roll the die many times to find a value of $\hat{p}$.
* Make a *decision* (Sect.\ \@ref(MakeDecision)) based on what is observed in the sample.

Using this decision-making process (Fig.\ \@ref(fig:DecisionFlowDice)), I could decide if the die I had rolled seemed to be the fair die.
For one specific die, I am asking the decision-making RQ:

> For this die, is the population proportion of rolls that show a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
equal to\ $1/6$?


```{r DecisionFlowDice, fig.cap = "A way to make decisions for the dice example.", fig.align="center", out.width='100%', fig.width = 9.5, fig.height = 4}
source("R/showDecisionMaking.R")

showDecisionMaking(populationText = expression( atop(bold(Assume)~the,
                                                     die~is~fair)),
                   expectationText = expression(atop(bold(Expect)~to~find,
                                                     about~1/6~rolls~show~a~1)),
                   oneSampleText = expression( atop(Roll~die,
                                                    many~times) ),
                   oneStatisticText = expression( atop(One~observed,
                                                       value~of~hat(italic(p))) ),
                   showQuestionMark = TRUE
)

```


Answering a decision-making RQ such as this requires a *hypothesis test*.
The process requires being able to describe what value of the *sample* proportion $\hat{p}$ could reasonably be expected from a fair die, with $p = 1/6$ (that is, Step\ 3 of the decision-making process).


::: {.tipBox .tip data-latex="{iconmonstr-info-6-240.png}"}
$p$ refers to the *population* proportion, and\ $\hat{p}$ refers to a *sample* proportion.
:::


## Rolling dice: the sampling distribution of $\hat{p}$ {#SamplingDistributionKnownpHT}
\index{Sampling distribution!one proportion, known\ $p$ (CI)}


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-skitterphoto-705171.jpg" width="200px"/>
</div>


When a fair, six-sided die is rolled $50$\ times, what proportion of the rolls will produce a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}?'
} else {
   '<span class="larger-die">&#9856;</span>?'
}`
That is, what will be the value of the *sample proportion* $\hat{p}$?
Of course, no-one knows, because the sample proportion will not be the same for every sample of $50$\ rolls.
The sample proportion *varies* from sample to sample: *sampling variation* exists and is described by the *sampling distribution*.

As seen in Chap.\ \@ref(SamplingVariation), the sample statistic often varies with a normal distribution (whose standard deviation is called the *standard error*).
However, being more specific about the details of this sampling distribution (i.e., the mean and standard deviation describing the normal model) is useful.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Remember: studying a sample leads to the following observations:
\vspace{-2ex}

* Every sample is likely to be different.
* We observe just one of the many possible samples.
* Every sample is likely to yield a different value for the statistic.
* We observe just one of the many possible values for the statistic.
\vspace{-2ex}

Since many values for the sample proportion are possible, the values of the sample proportion vary (called *sampling variation*) and have a *distribution* (called a *sampling distribution*).
:::


To better understand the sampling distribution for the proportion of rolls that show a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
in $50$\ rolls of a die, statistical theory could be used, or thousands of repetitions of a sample of\ $50$ rolls could be performed, or a computer could *simulate* many samples of $50$\ rolls (as for a roulette wheel in Sect.\ \@ref(SamplingDistributionProportions)).

Here, the *population proportion* of rolls showing a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
is $p = 1/6$ (using the classical approach to probability).\index{Probability!classical approach}
Each sample of $n = 50$ rolls produces a *sample* proportion, denoted by\ $\hat{p}$, which varies from sample to sample.
<!-- For these ten samples, the proportion of even rolls ranged from $\hat{p} = 0.32$ to $\hat{p} = 0.60$. -->

These sample proportions would be expected to vary around $p = 1/6$ (the *population proportion*): some values of $\hat{p}$ would be larger than\ $p$ and some smaller than\ $p$.
The value of the sample proportion in\ $50$ rolls could be *very* small or *very* high by chance, but we wouldn't expect to see that very often.
The sample proportions exhibit sampling variation, and the *amount* of sampling variation is quantified using a *standard error*.

Suppose a fair die was rolled $50$\ times, and this random procedure \index{Random procedure} was repeated *thousands* of times, and the proportion of rolls that showed a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
was recorded for every one of those thousands of sets of $50$\ rolls.
These thousands of sample proportions $\hat{p}$ (one from every sample of $n = 50$\ rolls) could be shown using a
`r if (knitr::is_latex_output()) {
   'histogram (Fig.\\ \\@ref(fig:RollDiceHistFigHT)).'
} else {
   'histogram; see the animation below.'
}`


<center>
```{r fig.show="animate", RollDiceHistHTML, animation.hook="gifski", interval=0.4, loop=FALSE, dev=if (is_latex_output()){"pdf"}else{"png"}}
if (knitr::is_html_output()){
  set.seed(99100991)
  num.rolls <- 50
  num.sims <- 1000
  
  print_Histo <- rep(FALSE, num.sims)
  print_Histo[ c( 1:10,
                  seq(24, num.sims, 25) + 1,
                  num.sims) ] <- TRUE
  
  prop.even <- array(dim = num.sims)
  
  p.die <- 1/6
  se.die <- sqrt( p.die * (1 - p.die) / num.rolls)

  for (i in 1:num.sims){
    
    roll <- sample(1:6, num.rolls, 
                   replace = TRUE)
    prop.even[i] <- sum( roll == 1 )/num.rolls ### sum( roll/2 == floor(roll/2)) / num.rolls
    
    #Print every nth histogram only
    if (print_Histo[i]){
      out <- hist( prop.even,
                   breaks = seq(0.05, 0.95, by = 0.02) - 0.03,
                   las = 1,
                   ylim = c(0, 250),
                   xlim = c(0, 1),
                   col = plot.colour,
                   main = paste("Histogram of sample proportions\nSet number:", i),
                   xlab = "Proportion of the 50 rolls showing a one",
                   sub = paste("(For this sample: proportion of rolls showing a one is ", 
                               format(round(prop.even[i], 2), nsmall = 2),
                               ")", 
                               sep = "" ),
                   ylab = "",
                   right = FALSE,
                   axes = FALSE)
      axis(side = 1)
      #axis(side = 2, 
      #     las = 1)
      
      points(prop.even[i], 0,
             pch = 19,
             col = plot.colour0)
      
      xx <- seq(0, 1, 
                length = 500)
      yy <- dnorm(xx, 
                  mean = p.die, 
                  sd = se.die )
      yy <- yy/max(yy) * max(out$count)
      
      lines(yy ~ xx, 
            col = "grey", 
            lwd = 2)  
    }
  }
}
```
</center>

(ref:DieRollSamplingDist) The proportion of rolls that show a `r if (knitr::is_latex_output()) {
   '\\largedice{1},'
} else {
   '<span class="larger-die">&#9856;</span>,'
}` $\hat{p}$, is not the same for every sample of $50$ rolls; it varies around a mean of $p = 1/6$ (shown by the dot). The solid line is the normal distribution used to model the sampling distribution. The sampling distribution is an approximate normal distribution; it shows a model of how the proportion of rolls showing a `r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}` varies, when a die is rolled $50$ times.


```{r RollDiceHistFigHT, fig.align="center", fig.height=3, fig.width=7.5, out.width='85%', fig.cap="(ref:DieRollSamplingDist)" }
if (knitr::is_latex_output()){
  set.seed(99100991)
  num.rolls <- 50
  num.sims <- 1000
  prop.even <- array(dim = num.sims)
  
  p.die <- 1/6
  se.die <- sqrt( p.die * (1 - p.die) / num.rolls)
    
  for (i in 1:num.sims){
    
    roll <- sample(1:6, num.rolls,  
                   replace = TRUE)
    prop.even[i] <- sum(roll == 1) / num.rolls
  }
  
  
  par( mar = c(4, 1, 4, 1) )
  out <- hist( prop.even,
               breaks = seq(0.05, 0.95, by = 0.02) - 2/15,
               las = 1,
               ylim = c(0, 250),
               xlim = c(0, 1),
               col = plot.colour,
               main = paste("Histogram of sample proportions\nfrom thousands of simulations of",
                            num.rolls,
                            "rolls"),
               xlab = "Proportion of the 50 rolls showing a one",
               ylab = "",
               right = FALSE,
               axes = FALSE)
  axis(side = 1,
       at = seq(0, 1, by = 0.1))
  #axis(side = 2, 
  #     las = 1)
  
  xx <- seq(0, 1, 
            length = 500)
  yy <- dnorm(xx, 
              mean = p.die, 
              sd = se.die )
  yy <- yy/max(yy) * max(out$count)
  
  lines(yy ~ xx, 
        col = grey(0.3), 
        lwd = 2)  
  
  points(x = 1/6,
         y = 0,
         pch = 19)
  
}
```


In fact, the sampling distribution of\ $\hat{p}$ was described in Def.\ \@ref(def:SamplingDistPropCI) (and repeated in Def.\ \@ref(def:SamplingDistPropHT)).
The sample proportions are described by

* an approximate normal distribution,
* centred around the *sampling mean*, with a value of $p = 1/6$,
* with a standard deviation, called the *standard error* $\text{s.e.}(\hat{p})$, of
  \begin{equation}
     \text{s.e.}(\hat{p}) 
     = \sqrt{ \frac{p\times(1 - p)}{n} }
     = \sqrt{ \frac{1/6\times(1 - 1/6)}{50} }
     = `r round(se.die, 5)`.
    (\#eq:StdErrorExampleDieHT)
  \end{equation}


`r if (knitr::is_html_output()) '<!--'`
::: {.definition #SamplingDistPropHT name="Sampling distribution of a sample proportion with $p$ known"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.definition #SamplingDistPropHT name="Sampling distribution of a sample proportion with the population proportion known"}
`r if (knitr::is_latex_output()) '-->'`
For a known value of $p$, the *sampling distribution of the sample proportion* is (when certain conditions are met; Sect.\ \@ref(ValidityProportions)) described by

* an approximate normal distribution,
* centred around the sampling mean whose value is\ $p$,
* with a standard deviation (called the *standard error* of\ $\hat{p}$), denoted $\text{s.e.}(\hat{p})$, whose value is
\begin{equation}
   \text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}},
   (\#eq:StdErrorPknownHT)
\end{equation}
where\ $n$ is the size of the sample used to compute\ $\hat{p}$, and\ $p$ is the population proportion.
:::


A picture of this normal distribution can be drawn (Fig.\ \@ref(fig:NormalDieTheoryHT)).
The standard error is the standard deviation of the normal distribution in Fig.\ \@ref(fig:NormalDieTheoryHT).
While we still don't know *exactly* what values of $\hat{p}$ the next set of $50$\ rolls will produce, we have some idea of *how* the sample proportion varies in samples of $50$\ rolls.
For instance, values of\ $\hat{p}$  greater than about\ $0.35$ are unlikely to be observed from a fair die (with $p = 1/6$).


(ref:DieRollStdError) The sampling distribution is an approximate normal distribution; it shows a model of how the proportion of rolls showing a `r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}` varies, when a die is rolled $50$ times. The cross represents the observed sample proportion.

```{r NormalDieTheoryHT, fig.cap="(ref:DieRollStdError)", fig.align="center", fig.width=8.25, fig.height=3.0, out.width='100%'}
pop.p <- 1/6
n <- 50
se.p <- sqrt( pop.p * (1 - pop.p) / n )

par( mar = c(5, 0.25, 0.5, 0.25)) 
out <- plotNormal(mu = pop.p, 
                  sd = se.p, 
                  xlab = expression( Values~of~hat(italic(p))*","~the~sample~proportion~of~even~rolls~out~of~50), 
                  round.dec = 2,
                  xlim.hi = 0.55,
                  xlim.lo = 0,
                  showX = c(0, 0.1, 1/6, 0.2, 0.3, 0.4, 0.5),
                  showXlabels = c("0",
                                  "0.1", 
                                  expression(1/6), 
                                  "0.2", 
                                  "0.3", 
                                  "0.4", 
                                  "0.5"),
                  ylim = c(0, 11), # To allow room for "Sampling mean"
                  showZ = TRUE) # Vertical lines at z = -3:3

arrows(x0 = pop.p,
       x1 = pop.p,
       y0 = 1.2 * max(out$y),
       y1 = max(out$y),
       lwd = 2,
       length = 0.15,
       angle = 15)
text(x = pop.p,
     y = 1.2 * max(out$y),
     pos = 3,
     labels = expression(Sampling~mean*":"~italic(p)) )

arrows(x0 = pop.p,
       x1 = pop.p + se.p,
       y0 = 0.3 * max(out$y),
       y1 = 0.3 * max(out$y),
       lwd = 2,
       code = 3,
       length = 0.15,
       angle = 15)

locateText <- mean( c( pop.p,
                       pop.p + se.p) )
text(x = locateText,
     y = 0.28 * max(out$y),
     pos = 3,
     labels = expression(Std~error) )
text(x = locateText,
     y = 0.265 * max(out$y),
     pos = 1,
     labels = expression(plain(s.e.)(hat(italic(p)))))

# Explanations at left and right
text(x = 0.05,
     y = 1.05 * max(out$y),
     pos = 1,
     labels = expression( atop(Values~of~hat(italic(p)),
                               smaller~than~italic(p)) ) )
text(x = 0.35,
     y = 1.05 * max(out$y),
     pos = 1,
     labels = expression(Values~of~hat(italic(p))~larger~than~italic(p)) )


### Show the sample proportion
phat <- 19/50
# arrows( x0 = phat,
#         x1 = phat,
#         y0 = 0.2 * max(out$y),
#         y1 = 0,
#         angle = 15,
#         length = 0.1)
# text(x = phat,
#      y = 0.2 * max(out$y),
#      pos = 3,
#      cex = 0.9,
#      label = expression(hat(p) == 0.38))
points(x = phat,
       y = 0,
       cex = 1.25,
       pch = 4)

```


## Rolling dice: making a decision {#TestpObsDecision}

Figure\ \@ref(fig:NormalDieTheoryHT) show what values of the sample proportion $\hat{p}$ are expected when a fair die is rolled.
Step\ 3 of the decision-making process (Fig.\ \@ref(fig:DecisionFlowDice)) is to now roll the die.

When I rolled the die, a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
appeared $19$\ times in my $50$\ rolls, a sample proportion of
$$
\hat{p} = \frac{19}{50} = 0.38.
$$
In this unusual or unexpected?
Locating this value of $\hat{p}$ on the sampling distribution in Fig.\ \@ref(fig:NormalDieTheoryHT) shows that a sample proportion of $\hat{p} = 0.38$ is *highly* unusual from a fair die with $p = 1/6$.
More specifically, since the sampling distribution has a normal distribution, the $z$-score is
$$
  z 
  = \frac{\text{statistic} - \text{mean of the distribution}}{\text{std dev. of the distribution}}
  = \frac{0.38 - (1/6)}{0.05270}
  = 4.05,
$$
which is a *very* large $z$-score (based on the $68$--$95$--$99.7$ rule).\index{68@$68$--$95$--$99.7$ rule}
Using a fair die, observing $\hat{p} = 0.38$ would almost never occur.
But I *did* observe $\hat{p} = 0.38$, which suggests that the die I was rolling was *not* the fair die.

I concluded that the die I was rolling was loaded (that is, $p \ne 1/6$).
I may be incorrect (after all, it is not *impossible* to observe $\hat{p} = 0.38$), but the evidence is certainly convincing.
Using the decision-making process, a decision has been made about the dice.

The process described above is called *hypothesis testing*.\index{Hypothesis testing}
Hypothesis testing is used to make decisions about a population after observing just one of the countless possible samples.
Formally, the hypothesis test above proceeds as described in the following sections. 


## The process of hypotheses testing: assumption {#TestpObsDecisionHypothesis}
\index{Test statistic!z@$z$-score}

**Step\ 1** in the decision-making process is to make an assumption about the parameter.
For the die example, the parameter is\ $p$, the population proportion of rolls that show a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}.'
} else {
   '<span class="larger-die">&#9856;</span>.'
}`
The assumption is that $p = 1/6$.
This is called the *null hypothesis*,\index{Hypotheses!null} denoted by $H_0$:
$$
  \text{$H_0$:\ } p = 1/6.
$$
The null hypothesis states the value of\ $p$ is $1/6$; in other words, if the sample proportion\ $\hat{p}$ is not equal to $1/6$, the discrepancy is explained by sampling variation.
The null hypothesis is always the 'sampling variation' explanation for the discrepancy between the values of the statistic and the parameter.

The other explanation for why the value of the sample proportion $\hat{p}$ is not equal to $1/6$ is called the *alternative hypothesis* (denoted $H_1$):\index{Hypotheses!alternative} that the population proportion is *not* $1/6$, and this is the cause of the discrepancy:
$$
  \text{$H_1$:\ } p \ne 1/6.
$$
These two hypotheses offer different explanations for the discrepancy between the values of the population proportion (the parameter) and the sample proportion (the statistic).
The null hypothesis $H_0$ states that $p = 1/6$ and the discrepancy is due to sampling variation.
The alternative hypothesis $H_1$ states that $p \ne 1/6$, which explains the discrepancy.

Here, the RQ here is open to the value of $p$ being smaller *or* larger than\ $1/6$; that is, two possibilities are considered.
Hence, we write $p\ne 1/6$, which is called a *two-tailed* alternative hypothesis.
Alternative hypotheses like $p > 1/6$ (the population proportion is *larger* than $1/6$) or $p < 1/6$  (the population proportion is *smaller* than $1/6$) are *one-tailed* hypothesis.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The form of the alternative hypothesis (either one- or two-tailed) depends on what the research question asks, *not the data*.
:::


## The process of hypotheses testing: expectation {#TestpObsDecisionSamplingDist}

**Step\ 2** in the decision-making process is to describe what values of the statistic (i.e., $\hat{p}$) could be expected under the assumption about the parameter (i.e., *when the null hypothesis is true*).
Hypothesis testing *always* begins by assuming the null hypothesis is true.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The decision-making process begins by assuming the *null hypothesis* is true.
Thus, *the onus is on the data to refute the null hypothesis, the initial assumption*.

That is, the null hypothesis is retained unless persuasive evidence emerges to change our mind.
:::


Effectively, this step requires describing the sampling distribution of the statistic.
For the die example, the sampling distribution for $\hat{p}$ is (see Def.\ \@ref(def:SamplingDistPropHT))

* an approximate normal distribution,
* centred around the sampling mean whose value is\ $p = 1/6$,
* with a standard deviation, whose value is $\text{s.e.}(\hat{p}) = 0.05270\dots$

Drawing the picture of the sampling distribution (like Fig.\ \@ref(fig:NormalDieTheoryHT)) using this information is not necessary, but may be helpful.


## The process of hypotheses testing: observation {#TestpObsDecisiontestStat}

**Step\ 3** in the decision-making process is to make the observations.
As noted above, a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
was observed in\ $19$ of the $50$\ rolls, so $\hat{p} = 0.38$.
Since the sampling distribution has a normal distribution, the corresponding $z$-score was computed as $z = 4.05$.

In hypothesis testing, the $z$-score is called the *test statistic*.\index{Test statistic}\index{Test statistic!z@$z$-score}
The test statistic measures how far, in relative terms, the sample proportion is from the assumed value of the parameter.


## The process of hypotheses testing: decision {#TestpObsDecisionPvalues}

**Step\ 4** of the decision-making process is to use the information to make a decision: is the sample statistic *consistent* with what was expected under the assumption that $p = 1/6$, or does it *contradict* what was expected?

For the die example, the decision is reasonably easy: $z = 4.05$ is *very* large and *very* unlikely to be observed if $p = 1/6$.
This means the sample evidence *contradicts* what was expected if the assumption was true: persuasive evidence exists that the die is loaded.

More generally, evidence is evaluated using a $P$-value.\index{P@$P$-values}
$P$-values refer to the area *more extreme* than the calculated test statistic in the sampling distribution.
For this situation, $P$-values refer to the area *more extreme* than the calculated $z$-score (the statistic)\index{Statistic} in the normal distribution (the sampling distribution); that is, the area in the *tails* of the distribution (see Fig.\ \@ref(fig:OnePropTestP)).
This is a way to measure how unusual the calculated $z$-score is.

For *two-tailed* alternative hypotheses, the $P$-value is the combined area in the lower and upper tails that correspond to the positive  *and* negative values of the test statistic.
For *one-tailed* alternative hypotheses, the $P$-value is the area in one tail only.
Clearly, since the $P$-value is a probability, its value is always between\ $0$ and\ $1$.

$P$-values can be approximated using the $68$--$95$--$99.7$ rule and a diagram (Sect.\ \@ref(ApproxProbs); Sect.\ \@ref(OnePropTestP6895997)), or more precisely using the $z$-tables
`r if (knitr::is_latex_output()) {
   'in Appendices\\ \\@ref(ZTablesNEG) and \\@ref(ZTablesPOS)'
} else {
   'in App.\\ \\@ref(ZTablesOnline)'
}`
(Sect.\ \@ref(ZScoreForestry); Sect.\ \@ref(OnePropTestPTables)).
$P$-values are also reported by software for most statistical tests.


### Approximating $P$-values using the $68$--$95$--$99.7$ rule {#OnePropTestP6895997}
\index{68@$68$--$95$--$99.7$ rule}\index{P@$P$-values!using $68$--$95$--$99.7$ rule}

The $68$--$95$--$99.7$ rule can be used to determine *approximate* $P$-values.
To demonstrate, suppose the computed $z$-score was $z = 1$.
Then, the two-tailed $P$-value is the shaded tail-area in Fig.\ \@ref(fig:OnePropTestP) (top left panel): about\ $32$%, based on the $68$--$95$--$99.7$ rule.
The two-tailed $P$-value would be the same if $z = -1$.
The *one-tailed* $P$-value would be the area in one-tail (Fig.\ \@ref(fig:OnePropTestP), bottom left panel): about\ $16$%, based on the $68$--$95$--$99.7$ rule.

As another example, suppose the calculated $z$-score was $z = -2$.
Then, the two-tailed $P$-value is the shaded area shown in Fig.\ \@ref(fig:OnePropTestP) (top right panel): about\ $5$%, based on the $68$--$95$--$99.7$ rule.
The two-tailed $P$-value would be the same if $z = 2$.
The *one-tailed* $P$-value would be the area in one tail only (Fig.\ \@ref(fig:OnePropTestP), bottom right panel): about\ $2.5$%, based on the $68$--$95$--$99.7$ rule.


```{r, OnePropTestP, fig.cap="The two-tailed $P$-value is the combined area in the two tails of the distribution. Top left panel: if $z = 1$ (or $z = -1$), the two-tailed $P$-value is approximately $0.16$. Top right panel: if $z = 2$ (or $z = -2$), the two-tailed $P$-value is approximately $0.05$. The corresponding one-tailed $P$-values are half the two-tailed $P$-values, and are shown in the bottom panels.", fig.width=9.5, fig.height=5.25, out.width='100%', fig.align="center"}
par(mfrow = c(2, 2), 
    mar = c(4, 1, 4, 1) + 0.1)


######### TWO-TAILED

out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop( The~italic(P)*"-value"~"if"~italic(z)==1~or~italic(z)==-1*":",
                                    approx.~bold(two)*"-"*bold(tailed)~italic(P)*"-"*value*":"~0.32) ),
           xlab = expression(italic(z)*"-score")
           )

shadeNormal(out$x, out$y,
            lo = -5, 
            hi = -1,
            col = plot.colour)
shadeNormal(out$x, out$y,
            lo = 1, 
            hi = 5,
            col = plot.colour)
polygon(x = c(-0.9, -0.9, 0.9, 0.9), # White-ish background for above text
        y = c(0.05, 0.14, 0.14, 0.05),
        border = NA,
        col = "white")
arrows(x0 = -1, 
       x1 = 1,
       y0 = 0.04,
       y1 = 0.04,
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 0.07,
     label = "Area: 68%")

text(x = -1.5,
     y = 0.05,
     label = "16%")
text(x = 1.5,
     y = 0.05,
     label = "16%")

###

out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop( The~italic(P)*"-value"~"if"~italic(z)==2~or~italic(z)==-2*":",
                                    approx.~bold(two)*"-"*bold(tailed)~italic(P)*"-"*value*":"~0.05) ),
           xlab = expression(italic(z)*"-score")
           )
shadeNormal(out$x, out$y,
            lo = -5, 
            hi = -2,
            col = plot.colour)
shadeNormal(out$x, out$y,
            lo = 2, 
            hi = 5,
            col = plot.colour)

polygon(x = c(-1.4, -1.4, 1.4, 1.4), # White-ish background for above text
        y = c(0.05, 0.14, 0.14, 0.05),
        border = NA,
        col = "white")
arrows(x0 = -2, 
       x1 = 2,
       y0 = 0.04,
       y1 = 0.04,
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 0.07,
     label = "Area: 95%")


######### ONE-TAILED


out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop( The~italic(P)*"-value"~"if"~italic(z)==1*":",
                                    approx.~bold(one)*"-"*bold(tailed)~italic(P)*"-"*value*":"~0.16) ),
           xlab = expression(italic(z)*"-score")
           )


shadeNormal(out$x, out$y,
            lo = 1, 
            hi = 5,
            col = plot.colour)
polygon(x = c(-0.9, -0.9, 0.9, 0.9), # White-ish background for above text
        y = c(0.05, 0.14, 0.14, 0.05),
        border = NA,
        col = "white")
arrows(x0 = -1, 
       x1 = 1,
       y0 = 0.04,
       y1 = 0.04,
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 0.07,
     label = "Area: 68%")
text(x = 1.5,
     y = 0.05,
     label = "16%")


###

out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop( The~italic(P)*"-value"~"if"~italic(z)==-2*":",
                                    approx.~bold(one)*"-"*bold(tailed)~italic(P)*"-"*value*":"~0.025) ),
           xlab = expression(italic(z)*"-score")
           )
shadeNormal(out$x, out$y,
            lo = -5, 
            hi = -2,
            col = plot.colour)


polygon(x = c(-1.4, -1.4, 1.4, 1.4), # White-ish background for above text
        y = c(0.05, 0.14, 0.14, 0.05),
        border = NA,
        col = "white")
arrows(x0 = -2, 
       x1 = 2,
       y0 = 0.04,
       y1 = 0.04,
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 0.07,
     label = "Area: 95%")
```

Of course, calculated $z$-scores are unlikely to be exactly $z = 1$ or $z = -2$.
Suppose the $z$-score is a little *larger* than $z = 1$; say $z = 1.2$.
Then, the two-tailed area will be a little *smaller* than the tail area when $z = 1$ (Fig.\ \@ref(fig:OnePropTestP2), left panel).
The two-tailed $P$-value is a little *smaller* than\ $0.32$.

Similarly, suppose the $z$-score is not quite equal to $z = -2$; say $z = -1.9$.
Then, the two-tailed area will be a little *larger* than the tail area when $z = -2$ (Fig.\ \@ref(fig:OnePropTestP2), right panel).
The two-tailed $P$-value is a little *larger* than\ $0.05$.


```{r OnePropTestP2, fig.cap="The two-tailed $P$-value for $z$-scores not aligned with the $68$--$95$--$99.7$ rule. Left panel: when $z = 1.2$ (or $z = -1.2$). Right panel: when $z = 1.9$ (or $z = -1.9$).", fig.align="center", fig.width=10, fig.height=2.75, out.width='95%'}
par(mfrow = c(1, 2), 
    mar = c(4, 1, 4, 1) + 0.1)


out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop(The~two*"-"*tailed~italic(P)*"-value"~when~italic(z)==1.2*".",
                                   italic(P)*"-"*value~a~bit~smaller~than~0.32)),
           xlab = expression(italic(z)*"-score")
           )
shadeNormal(out$x, out$y,
            lo = -5, 
            hi = -1.2,
            col = plot.colour)
shadeNormal(out$x, out$y,
            lo = 1.2, 
            hi = 5,
            col = plot.colour)

lines( x = c(-1, -1), 
       y = c(0, 1.36 * dnorm(-1)), 
       lwd = 2)
lines( x = c(1, 1), 
       y = c(0, 1.36 * dnorm(1)), 
       lwd = 2)
text(x = -1, 
     y = 1.36 * dnorm(-1), 
     pos = 3, 
     label = expression(italic(z) == -1))
text(x = 1, 
     y = 1.36 * dnorm(1), 
     pos = 3,
     label = expression(italic(z) == 1))

arrows(x0 = -1, 
       x1 = 1,
       y0 = 0.04,
       y1 = 0.04,
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 0.07,
     label = "Area: 68%")

#Arrows pointing to z = 1.2 and z = -1.2
arrows(x0 = 2.25, 
       x1 = 1.2,
       y0 = 0.15,
       y1 = 0.08,
       angle = 15,
       length = 0.15)
text(x = 2.25,
     y = 0.15,
     pos = 4,
     label = expression(italic(z) == 1.2) )

arrows(x0 = -2.25, 
       x1 = -1.2,
       y0 = 0.15,
       y1 = 0.08,
       angle = 15,
       length = 0.15)
text(x = -2.25,
     y = 0.15,
     pos = 2,
     label = expression(italic(z) == -1.2) )

###

out <- plotNormal(mu = 0,
           sd = 1,
           main = expression( atop(The~two*"-"*tailed~italic(P)*"-value"~when~italic(z)==1.9*".",
                                   italic(P)*"-"*value~a~bit~larger~than~0.05)),
           xlab = expression(italic(z)*"-score")
           )
shadeNormal(out$x, out$y,
            lo = -5, 
            hi = -1.9,
            col = plot.colour)
shadeNormal(out$x, out$y,
            lo = 1.9, 
            hi = 5,
            col = plot.colour)

lines( x = c(-2, -2), 
       y = c(0, 1.3 * dnorm(-1)), 
       lwd = 2)
lines( x = c(2, 2), 
       y = c(0, 1.3 * dnorm(1)), 
       lwd = 2)
text(x = -2,  
     y = 1.3 * dnorm(1), 
     pos = 3, 
     label = expression(italic(z) == -2))
text(x = 2, 
     y =  1.3 * dnorm(1),  
     pos = 3,
     label = expression(italic(z) == 2))

arrows(x0 = -2, 
       x1 = 2,
       y0 = 2.5 * dnorm(2),
       y1 = 2.5 * dnorm(2),
       angle = 15,
       length = 0.15,
       code = 3) # BOTH ENDS
text(0,
     y = 2.5 * dnorm(2),
     pos = 3,
     label = "Area: 95%")

```


### More precise $P$-values using tables {#OnePropTestPTables}
\index{P@$P$-values!using tables}

Using the tables of areas under normal distributions (`r if ( knitr::is_html_output()) { 'Appendix\\ \\@ref(ZTablesOnline).'} else {'Appendices\\ \\@ref(ZTablesNEG) and \\@ref(ZTablesPOS)'}`), more precise $P$-values can be found using the ideas from Sect.\ \@ref(ExactAreasUsingTables).
For instance (see Fig.\ \@ref(fig:OnePropTestP2)):

* For $z = 1.2$: the area to the *left* of $z = -1.2$ is\ $0.1151$, and the area to the *right* of $z = 1.2$ is\ $0.1151$, so the *two-tailed* $P$-value is $0.1151 + 0.1151 = 0.2302$.
  This is a little smaller than\ $0.32$, as estimated above.
* For $z = 1.9$: the area to the *left* of $z = -1.9$ is\ $0.0287$, and the area to the *right* of $z = 1.9$ is\ $0.0287$, so the *two-tailed* $P$-value is $0.0287 + 0.0287 = 0.0574$.
  This is a little larger than\ $0.05$, as estimated above.

In this die-rolling example, where $z = 4.05$, the tail area is *very* small (using 
`r if ( knitr::is_html_output()) { 
'Appendix\\ \\@ref(ZTablesOnline)'} else {
'Appendices\\ \\@ref(ZTablesNEG) and\\ \\@ref(ZTablesPOS)'}`),
and zero to four decimal places.
$P$-values are never exactly zero, so we write $P < 0.0001$ (that is, the $P$-value is *less than*\ $0.0001$).

$P$-values tells us the probability of observing the sample statistic (or a value even more extreme), assuming the null hypothesis is true.
In the die-rolling example, the $P$-value is the probability of observing the value of $\hat{p} = 0.38$ (or more extreme), just through sampling variation if $p = 1/6$.
Then `r if( knitr::is_html_output() ) {
   "(see the animation below)."
}`
`r if( knitr::is_latex_output() ) {
   "(see Fig.\\ \\@ref(fig:PvaluesBigSmall)):"
}`

* 'Big' $P$-values mean the sample statistic (i.e., $\hat{p}$) could reasonably have occurred through sampling variation in one of the many possible samples, if the assumption made about the parameter (stated in $H_0$) was true: 
   the data *do not* contradict the assumption in\ $H_0$.
   There *is no* persuasive evidence to support the alternative hypothesis.
* 'Small' $P$-values mean the sample statistic (i.e., $\hat{p}$) is *unlikely* to have occurred through sampling variation in one of the many possible samples, if the assumption made about the parameter (stated in\ $H_0$) was true: 
   the data *do* contradict the assumption in $H_0$.
   There *is* persuasive evidence to support the alternative hypothesis.


```{r PvaluesAnimation, animation.hook="gifski", dev=if (is_latex_output()){"pdf"}else{"png"}}
zList <- c( seq(0.5,
                1,
                by = 0.1),
            seq(1, 3.5, 
                by = 0.05) ) 
pMeaning <- function(pValue){
  if (pValue > 0.10) Meaning <- "Insufficient"
  if ( (pValue >= 0.05)  & (pValue < 0.10)) Meaning <- "Slight"
  if ( (pValue >= 0.01)  & (pValue < 0.05)) Meaning <- "Moderate"
  if ( (pValue >= 0.001) & (pValue < 0.01)) Meaning <- "Strong"
  if (pValue < 0.001) Meaning <- "Very strong"
  Meaning
}

pColours <- viridis( length(zList), 
                     begin = 0.5 ,
                     end = 1,
                     option = "H")

if (knitr::is_html_output()) {
  for (i in (1:length(zList))) {

    zScore <- zList[i]
    pValue <- pnorm( -zScore )
    pValue2 <- ifelse( pValue < 0.001, 
                       "< 0.001",
                       round(pValue, 4) )
    
   par( mar = c(0.1, 0.1, 2.5, 0.1) ) # Number of margin lines on each side
    out <- plotNormal(mu = 0,
                      sd = 1,
                      xlab = expression(italic(z)~"-score"),
                      main = paste("Evidence to support alternative hypothesis:\n",
                                   pMeaning(pValue)),
                      round.dec = 0)
    shadeNormal(out$x,
                out$y,
                col = pColours[i],
                lo = zScore,
                hi = 6)
    shadeNormal(out$x,
                out$y,
                col = pColours[i],
                hi = -zScore,
                lo = -6)

    abline(v = zScore,
           col = "grey")
    abline(v = -zScore,
           col = "grey")

    polygon(x = c(-1.4, -1.4, 1.4, 1.4), # White-ish background for above text
            y = c(0.02, 0.10, 0.10, 0.02),
            border = NA,
            col = "white")
    text(0,
         y = 0.06,
         label = paste("Two-tailed P-value:", pValue2 ) )
  }  
  
}
```


```{r PvaluesBigSmall, fig.cap="The strength of evidence: $P$-values. As the $z$-score becomes larger, the $P$-value becomes smaller, and it is more likely that the evidence contradicts the null hypothesis.", fig.height = 2.75, fig.width=10, out.width='100%', fig.align="center", dev=if (is_latex_output()){"pdf"}else{"png"}}
if (knitr::is_latex_output()) {
  
  par(mfrow = c(1, 2),
      mar = c(4.1, 0.25, 4.1, 0.25) )
#  par( mar = c(0.1, 0.1, 0.1, 0.1) ) # Number of margin lines on each side

  #zList <- c( 1.15, # Two-tailed P-value: 10% -1.645
  #            2.2 ) # Two-tailed P-value: 1% -2.576
 
  
  zList <- c( qnorm(0.20),
              qnorm(0.015) )
  pMeaning <- function(pValue){
    if (  pValue >  0.10)                     Meaning <- "Insufficient"
    if ( (pValue >= 0.05)  & (pValue < 0.10)) Meaning <- "Slight"
    if ( (pValue >= 0.01)  & (pValue < 0.05)) Meaning <- "Moderate"
    if ( (pValue >= 0.001) & (pValue < 0.01)) Meaning <- "Strong"
    if (  pValue <  0.001)                    Meaning <- "Very strong"
    Meaning
    }
  
  pColours <- c(BlockColour,
                ResponseColour)
  
  for (i in (1:length(zList))){
    zScore <- zList[i]
    pValue <- pnorm( zScore )
    
  
    pValue2 <- ifelse( pValue < 0.001, 
                       "< 0.001",
                       round(pValue, 4) )
    
    out <- plotNormal(mu = 0,
                      sd = 1,
                      xlab = expression(italic(z)-score),
                      round.dec = 0,
                      main = ifelse( abs(zScore) < 2,
                                     expression( atop(Slight~evidence~to~support~italic(H)[1],
                                                      italic(z)==0.84*";"~italic(P)*"-"*value=="0.200") ),
                                     expression( atop(Moderate~evidence~to~support~italic(H)[1],
                                                      italic(z)==2.17*";"~italic(P)*"-"*value=="0.015") )
                              )
                      )
    shadeNormal(out$x,
                out$y,
                col = pColours[i],
                lo = -zScore,
                hi = 10)
    shadeNormal(out$x,
                out$y,
                col = pColours[i],
                hi = zScore,
                lo = -10)
    
    abline(v = zScore,
           col = "grey")
    abline(v = -zScore,
           col = "grey")
    
    polygon(x = c(-2.1, -2.1, 2.1, 2.1), # White-ish background for the above text
           y = c(0.09, 0.21, 0.21, 0.09),
           border = NA,
           col = rgb(255, 255, 255, max = 255, alpha = 240) ) # Translucent white
    text(0,
         y = 0.14,
         label = paste("Two-tailed P-value:", format(pValue2,
                                                     nsmall = 3) ) )
  }  
  
}

```


What is meant by 'small' and 'big' in this context?
What represents persuasive evidence to support the alternative hypothesis?
A $P$-value smaller than\ $5$% (or\ $0.05$) is usually considered 'small', and persuasive evidence to support the alternative hypothesis.
In contrast, a $P$-value larger than\ $5$% (or\ $0.05$) is usually considered 'big', and *not* persuasive evidence to support the alternative hypothesis.

The value of\ $0.05$ given here is *arbitrary*, and in some disciplines the distinction is made when $P = 0.01$ or $P = 0.10$ instead.
Rather than having an arbitrary boundary between 'big' and 'small', a more sensible approach is to qualify the strength of the evidence that supports the alternative hypotheses, which is discussed in Sect.\ \@ref(AboutPvalues).

In this die-rolling example, where the $P$-value is *very* small, the data contradict the null hypothesis (that $p = 1/6$): the evidence supports the alternative hypothesis that $p \ne 1/6$.
This suggests that the die is very likely *not* fair.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
*Be careful interpreting the results!*
We cannot be *sure* that the die is unfair.
*A small $P$-value is not proof that the die is loaded.*
The die may be fair but, due to sampling variation, the sample we observed may simply have produced an unusually high proportion of rolls that show a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
by chance.

The result is interpreted as 'there is evidence that the die is unfair'.
Remember: *the onus is on the data to refute the null hypothesis, the initial assumption*.
:::


`r if (knitr::is_html_output()) '<!--'`
::: {.example #PvaluesInterpret name="Interpreting $P$-values"}
`r if (knitr::is_html_output()) '-->'`
`r if (knitr::is_latex_output()) '<!--'`
::: {.example #PvaluesInterpret name="Interpreting P-values"}
`r if (knitr::is_latex_output()) '-->'`
In the die example, suppose we found the two-tailed $P$-value as\ $0.26$.
This is relatively 'large' (i.e., much larger than\ $0.05$).
Then the observed value of\ $\hat{p}$ could easily be explained by chance, and is *not* persuasive evidence to support the alternative hypothesis (that the die is unfair).
There is no evidence that\ $p$ is not\ $1/6$.
:::


## Writing conclusions {#OnePropTestCommunicate}

In general, communicating the results of any hypothesis test requires:

* an answer to the RQ,  worded in terms of how much evidence exists to support the *alternative* hypothesis.
* a summary of the evidence used to reach that conclusion (such as the $z$-score and $P$-value, including if the $P$-value is one- or two-tailed).
* sample summary information (see Chap.\ \@ref(CIOneProportion)), summarising the data used to make the decision (which usually includes a confidence interval for the parameter).

So for the die-rolling example, write:

> The sample provides very strong evidence ($z = 4.05$; two-tailed $P < 0.001$) that the proportion of rolls that show a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
is not\ $1/6$ ($\hat{p} = 0.38$; approx.\ $95$% CI: $0.243$ to\ $0.517$; $n = 50$ rolls) in the population.

This statement includes the three necessary components:

* an answer to the RQ: 'The sample provides very strong evidence... that the population proportion is not\ $1/6$'.
* the evidence used to reach the conclusion: '$z = 4.05$; two-tailed $P < 0.001$'.
* sample summary information (including a CI).


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Since the *null* hypothesis is initially assumed to be true, *the onus is on the evidence to refute the null hypothesis*. 
That is, we retain the null hypothesis unless there is persuasive evidence to stop doing so.
Hence, conclusions are worded in terms of how strongly the evidence (i.e., sample data) supports the alternative hypothesis.  

The alternative hypothesis *may* or *may not* be true, but we report how strongly the evidence (data) supports the alternative hypothesis.
Conclusions are *not* worded in terms of how much evidence support the null hypothesis.
:::


## Process overview {#OnePropTestOverview}

Let's recap the decision-making process, in this context about rolling a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`:

1. *Assumption*: 
   Write the *null hypothesis* and *alternative hypothesis* about the *parameter* (based on the RQ), where\ $p$ is the population proportion of rolls that are a
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`: 
   * $H_0$: $p = 1/6$ (i.e., sampling variation explains the discrepancy between\ $p$ and\ $\hat{p}$);
   * $H_1$: $p \ne 1/6$ (this is a two-tailed alternative hypothesis).
2. *Expectation*: 
   The sampling distribution describes what values to reasonably expect from the sample statistic across all possible samples, *if* the null hypothesis is true.
   In this situation, the sampling distribution has an approximate normal distribution.
3. *Observation*: 
   Compute the $z$-score ($z = 4.05$), a measure of the discrepancy between the assumed population value, and the observed sample value.
4. *Decision*: 
   Determine if the data are consistent with the assumption, by computing the $P$-value.
   Here, the $P$-value is (much) less than\ $0.0001$, so very strong evidence exists that\ $p$ is *not*\ $1/6$.


## Statistical validity conditions {#ValidityProportionsTest}
\index{Statistical validity (for inference)!one proportion}

The hypothesis test conducted in this chapter assumes the sampling distribution is approximately a normal distribution (and so, for example, the $68$--$95$--$99.7$ rule can be applied).
This is only true if certain conditions are met.

The *statistical validity conditions* for a test for a single proportion is that the *expected* number of individuals in the group of interest (i.e, $n\times p$) and in the group *not* of interest (i.e., $n\times (1 - p)$) both exceed five; that is:

* Both $n\times p > 5$, *and* $n\times (1 - p) > 5$.

The value of\ $5$ here is a rough figure; some books give other values (such as\ $10$).
This condition ensures that the *sampling distribution of the sample proportions has an approximate normal distribution* (so that, for example, the $68$--$95$--$99.7$ rule can be used).
The units of analysis are also assumed to be *independent* (e.g., from a simple random sample).
For a test for one proportions, these conditions are similar to those for the CI for one proportion (Sect.\ \@ref(CIOneProportion)).

If the statistical validity conditions are not met, other similar options include using a binomial test\index{Non-parametric statistics} [@conover2003practical].


::: {.example #StatisticalValidityDice name="Statistical validity"}
The hypothesis test regarding the dice is statistically valid.
Firstly, $n\times p = 50 \times (1/6) = 8.666\dots$ (i.e., expect about $8.7$\ rolls to show a `r if (knitr::is_latex_output()) {
   '\\largedice{1})'
} else {
   '<span class="larger-die">&#9856;</span>)'
}`,
and $n\times (1 - p) = 41.666\dots$ (i.e., expect about $41.7$\ rolls to *not* show a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1})'
} else {
   '<span class="larger-die">&#9856;</span>)'
}`.
*Both* comfortably exceed five, so the normal distribution will be a good approximation for the sampling distribution.
This is what we observe from simulation (Fig.\ \@ref(fig:StatValidp), left panel).
:::


::: {.example #StatisticalValidityDice2 name="Statistical validity"}
Suppose the die was rolled $10$\ times rather than $50$\ times.
Then, $n\times p = 10 \times (1/6) = 1.666\dots$ and $n\times (1 - p) = 10 \times (1 - 1/6) = 8.333\dots$.
These do not *both* exceed five, so the normal distribution may be a poor approximation for the sampling distribution.

This is what we observe from simulating the situation (Fig.\ \@ref(fig:StatValidp), right panel).
The normal model is poor: the simulation shows that the sample proportions are not even symmetrically distributed.
:::


(ref:StatValidp) The sampling distributions for two situations for rolling a die. Left: for sets of 50 rolls, the sampling distribution does have an approximate normal distribution. Right: for sets of $10$ rolls, the sampling distribution does not have a normal distribution. The solid lines show the approximate normal distributions, and the histograms show the distribution of the sample proportions over many sets of rolls. The solid dots are the value $p = 1/6$, the population proportion of rolls that show a `r if (knitr::is_latex_output()) {
   '\\largedice{1}.'
} else {
   '<span class="larger-die">&#9856;</span>.'
}`
\index{Hypothesis testing!one proportion|)}


```{r StatValidp, fig.cap="(ref:StatValidp)", fig.align="center", fig.width=10, fig.height=2.5, out.width='95%'}

set.seed(301182)

p <- 1/6
n1 <- 10
n2 <- 50
numSims <- 5000

se1 <- sqrt( p * (1 - p) / n1)
se2 <- sqrt( p * (1 - p) / n2)

rolls1 <- rolls2 <- array( dim = numSims)

for (i in 1:numSims){
  x1 <- sample( x = 1:6,
                size = n1, 
                replace=TRUE)
  rolls1[i] <- sum(x1 == 1)
  
  x2 <- sample( x = 1:6,
                size = n2, 
                replace=TRUE)
  rolls2[i] <- sum(x2 == 1)
  
}

rolls1 <- rolls1/n1
rolls2 <- rolls2/n2

###

par(mfrow = c(1, 2), 
    mar = c(4, 2, 4, 1) + 0.1)


xNormal <- seq(-0.3, 0.8, 
               length = 200)


###
out2 <- hist(rolls2,
             xlab = "Proportion of ones rolled",
             main = paste("Proportion of ones rolled\nin sets of", n2, "rolls"),
             axes = FALSE,
             las = 1,
             xlim = c(0, 0.4),
             right = FALSE,
#             breaks = seq(0, 0.8, by = 0.05)
             breaks = seq( 1/6 - 13 * 0.02, 
                           1/6 + 12 * 0.02, 
                           by = 0.02) + 1/12
             )
axis(side = 1)
points(x = 1/6,
       y = 0,
       pch = 19)

yNormal2 <- dnorm(x = xNormal,
                  mean = 1/6,
                  sd = se2)
yNormal2 <- yNormal2 / max(yNormal2) * max(out2$counts)
lines(yNormal2 ~ xNormal,
      lwd = 2,
      col = "black")

abline(h = 0, 
       lwd = 2)


###


out1 <- hist(rolls1,
             xlab = "Proportion of ones rolled",
             main = paste("Proportion of ones rolled\nin sets of", n1, "rolls"),
             axes = FALSE,
             las = 1,
             right = FALSE,
             xlim = c(-0.25, 0.65),
#             breaks = seq(0, 0.8, by = 1/10)
             breaks = seq(0, 0.8,
                          by = 1/9)             )
axis(side = 1)
points(x = 1/6,
       y = 0,
       pch = 19)

yNormal1 <- dnorm(x = xNormal,
                  mean = 1/6,
                  sd = se1)
yNormal1 <- yNormal1 / max(yNormal1) * max(out1$counts)
lines(yNormal1[xNormal > 0] ~ xNormal[xNormal > 0],
      lwd = 2,
      col = "black")
lines(yNormal1[xNormal < 0] ~ xNormal[xNormal < 0],
      lwd = 2,
      lty = 2,
      col = "black")
lines( x = c(0, 1),
       y = c(0, 0),
       lwd = 2)
lines( x = c(-0.2, 0),
       y = c(0, 0),
       lty = 2,
       lwd = 2)

```

## Example: rolling the other die {#OneProportiontestRollOtherDie}

In $50$ rolls of the *other* die, I found a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
on $7$\ rolls, so that $\hat{p} = 7/50 = 0.14$.
To determine if this die appears loaded, the hypotheses are the same as before:
$$
  \text{$H_0$:\ } p = 1/6.  \qquad\text{and}\qquad  \text{$H_1$:\ } p \ne 1/6.
$$
Following the procedures above (check!) and using the same hypotheses, $z = -0.506$ and (using tables) the two-tailed $P$-value is $2\times 0.3061 = 0.6122$.
This means that the sample result was not unusual if $p = 1/6$, and is certainly not persuasive evidence to support the alternative hypothesis.
There is *no evidence* to suggest the second die is loaded.

This all implies the first die was the loaded die.
Now I need to decide how to distinguish the two dice\dots


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
*A large $P$-value does not prove that the die is fair!*
It only means that the proportions of rolls that produce a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1}'
} else {
   '<span class="larger-die">&#9856;</span>'
}`
is not unusual... but perhaps the die is loaded in some other way (i.e., to produce more-than-expected rolls of a 
`r if (knitr::is_latex_output()) {
   '\\largedice{5}).'
} else {
   '<span class="larger-die">&#9860;</span>).'
}`


*A large $P$-value does not necessarily mean that the die is fair!*
The die may indeed be loaded to produce a larger-than-expected numbers of rolls that show a 
`r if (knitr::is_latex_output()) {
   '\\largedice{1},'
} else {
   '<span class="larger-die">&#9856;</span>,'
}`
but (due to sampling variation) the sample we observed simply did not provide evidence to make that conclusion.

The result is interpreted in terms of how much evidence exists to support the alternative hypothesis.
The onus is on the data (i.e., evidence) to refute the assumption made in the null hypothesis.
:::


## Example: dominance of birds {#OneProportiontestBirds}

@barve2017elevational compared two types of birds (male green-backed tits; male cinereous tits) to see which was more behaviourally dominant over winter.
If the species were equally-dominant, then about\ $50$% of the interactions would be won by each species.
If we define\ $p$ as the proportion of interactions won by green-backed tits, then we would expect $p = 0.50$.
However, in the $45$\ interactions observed between the two species, green-backed tits won $37$\ interactions (i.e., $\hat{p} = 37/45 = 0.82222$).
A discrepancy exists between the sample proportion ($\hat{p} = 0.8222$) and the expected population proportion $p = 0.50$.

Of course, every sample of $45$\ interactions would produce a different value of\ $\hat{p}$.
To test if the population proportion of interaction wins could be equally shared, the hypotheses are:
$$
   \text{$H_0$: } p = 0.5\quad\text{and}\quad\text{$H_1$: } p \ne 0.5 \text{ (two-tailed)}.
$$
The test is statistically valid, since both $n\times p = 45\times 0.5 = 22.5$ and $n\times (1 - p) = 22.5$ exceed five.
The *standard error* is
$$
   \text{s.e.}(\hat{p}) 
   = \sqrt{\frac{p \times (1 - p)}{n}} 
   = \sqrt{\frac{0.50 \times (1 - 0.50)}{45}} 
   = 0.0745356...
$$
Then, the value of the *test statistic* is:
$$
   z 
   = \frac{\hat{p} - p}{\text{s.e.}(\hat{p})}
   = \frac{0.82222 - 0.50}{0.0745356}
   = 4.322.
$$
This is a *very* large $z$-score, so the $P$-value will be very small, using the $68$--$95$--$99.7$ rule, or using tables.
This is persuasive evidence to support the alternative hypothesis.
We write:

> *Very* strong evidence exists in the sample ($P < 0.0001$; $z = 4.325$) that the interactions were not won equally by each species ($\hat{p} = 0.8222$ won by green-backed tits; approx.\ $95$% CI: $0.708$ to\ $0.936$; $n = 45$) in the population.


## Chapter summary {#Chap28Summary}

To test a hypothesis about a population proportion $p$:

* Write the null hypothesis ($H_0$; the sampling variation explanation) and the alternative hypothesis ($H_1$).
* Initially *assume* the value of\ $p$ in the null hypothesis to be true.
* Then, describe the *sampling distribution*, which describes what to *expect* from the sample statistic across all possible samples, based on this assumption: under certain statistical validity conditions, the sample mean varies with:
   *  an approximate normal distribution,
   *  with sampling mean, whose value is the value of\ $p$,
   *  with a standard deviation of $\displaystyle \text{s.e.}(\hat{p}) = \sqrt{\frac{p \times (1 - p)}{n}}$, where\ $p$ is the hypothesised value given in the null hypothesis, and\ $n$ is the sample size.
* Compute the value of the *test statistic*:
$$
   z = \frac{ \hat{p} - p}{\text{s.e.}(\hat{p})}.
$$
* Compute an approximate *$P$-value* using the $68$--$95$--$99.7$ rule, or using tables.
* Make a decision, and write a conclusion.
* Check the statistical validity conditions.


## Quick review questions  {#Chap31-QuickReview}

::: {.webex-check .webex-box}
A study of diseases in Native Americans [@kizer2006digestive] found $381$\ obese or overweight patients in $449$\ patients.
In the general population of the USA, the percentage obese or overweight was\ $65$%.
The researchers wanted to determine if the percentage of obesity/overweight Native Americans was *greater* than that of the general population.

Are the following statements *true* or *false*?

1. The sample size is $n = 381$. \tightlist
`r if( knitr::is_html_output() ) {torf(answer=FALSE)}`
1. The value of the *sample* proportion is\ $\hat{p} = 381$.
`r if( knitr::is_html_output() ) {torf(answer=FALSE)}`
1. The *null* hypothesis is\ $H_0$: $p = 0.65$.
`r if( knitr::is_html_output() ) {torf(answer=TRUE)}`
1. The *alternative* hypothesis is\ $H_0$: $p = 0.8486$.
`r if( knitr::is_html_output() ) {torf(answer=FALSE)}`
1. We initially assume the *population* proportion of overweight/obese Native Americans is\ $0.65$.
`r if( knitr::is_html_output() ) {torf(answer=TRUE)}`
1. The *alternative* hypothesis is *one*-tailed.
`r if( knitr::is_html_output() ) {torf(answer=TRUE)}`
1. In a one-sample test of proportion, the $z$-score is always large.
`r if( knitr::is_html_output() ) {torf(answer=FALSE)}`
1. The value of the $z$-score for this example is $8.82$.
`r if( knitr::is_html_output() ) {torf(answer=TRUE)}`
1. We have evidence to support the alternative hypothesis in this example.
`r if( knitr::is_html_output() ) {torf(answer=TRUE)}`
1. We always accept the *null* hypothesis.
`r if( knitr::is_html_output() ) {torf(answer=FALSE)}`
:::


## Exercises {#OneProportionTestExercises}

[Answers to odd-numbered exercises] are given at the end of the book. 

`r if( knitr::is_latex_output() ) "\\captionsetup{font=small}"`


:::{.exercise #sepPWhy1}
Explain *why* the standard error is computed using\ $p$ for hypothesis testing, but using\ $\hat{p}$ for confidence intervals.
:::


:::{.exercise #sepPWhy2}
Explain why describing the sampling distribution is difficult if we *assume* $p \ne 1/6$.
:::


:::{.exercise #OneProportionTestExplainA}
In the die example, the observed proportion is $0.38$.
Explain why we cannot simply state that the proportion clearly is not $1/6 = 0.1666$ (as it is clearly $0.38$).
:::


:::{.exercise #OneProportionTestExplainB}
Explain why we compute $\text{s.e.}(\hat{p})$ and not\ $\text{s.e.}(p)$.
:::


:::{.exercise #OneProportionTestExercisesDodgyA}
What is wrong with the following statement, after testing $H_0$: $p = 0.25$:

> There is very strong evidence that the sample proportion is greater than\ $0.25$.
:::


:::{.exercise #OneProportionTestExercisesDodgyB}
Explain what is wrong with this statement from @davis2024higher, that appears under their Table\ 2:

> One proportion $z$-test with $H_0 = 0.076$, the proportion of UDT in our sample
:::


::: {.exercise #OneProportionTestExercisesPlacebos}
The study of herbal medicines is complicated, as *blinding* subjects is difficult: placebos are often easily identifiable by eye, by taste, or by smell.

@loyeung2018experimental studied if subjects could identify potential placebos at a *better* rate than just guessing.
The $81$ subjects were each presented with a choice of five different supplements, four of which were placebos.
Subjects were asked to select which one was the legitimate herbal supplement based on the *taste*.
Of these, $50$\ correctly selected the true herbal supplement.

1. If the subjects were selecting the true herbal supplement randomly, what proportion of subjects would be expected to select the correct supplement as the true herbal medicine?
2. Write the hypotheses for addressing the aims of the study.
3. Is this a one- or two-tailed test? 
   Explain.
4. Sketch the *sampling distribution* of the sample proportion, assuming $H_0$ is correct.
5. Is there evidence that people can identify the true supplement by taste?
6. Are the statistical validity conditions satisfied?
:::


::: {.exercise #POnePropTestMeasles}
@kim2004sero studied the measles-rubella vaccination-rates in Korea, comparing the proportion of children susceptible to measles with the *World Health Organization* target proportion (for children aged\ $5$ to\ $9$ years old: $10$%).

The aim was to test if the proportion of Korean children susceptible to measles in the *population* was $10$%\ or *lower* (i.e., better).
In the study, $55$\ children out of\ $972$ were susceptible to measles.

1. Compute the sample proportion\ $\hat{p}$ of children susceptible to measles.
2. Write the hypotheses for the test.
   Is the test one- or two-tailed?
3. Compute the standard error for the test.
4. Compute the $z$-score and determine the $P$-value.
5. Write a conclusion.
6. Are the statistical validity conditions satisfied?
:::


::: {.exercise #OneProportionTestTurtleSex}
@streeting2022optimising studied western saw-shelled turtles.
When eggs were incubated at $27$^o^C, they observed that $29$\ males and $44$\ females hatched.
Are the proportions of male and female turtles that hatch at this temperature equal?
:::


::: {.exercise #OneProportionTestExercisesEPL}
[*Dataset*: `PremierL`]
In the 2019/2020 English Premier League (EPL), the home team won $91$\ games, and the away team won $67$\ games.
(Another $50$\ games were draws.)

Use the $n = 158$\ games with a result to determine if there is evidence that the home team wins more often than\ $50$% (i.e., that there is a home-side advantage).
:::


::: {.exercise #OneProportionTestExercisesPedalMachines}
@maeda2013introducing introduced pedal machines on the first floor of the Joyner Library for use by students at East Carolina University (ECU) to increase activity in library users.
At ECU, $60.2$%\ of all students were females (i.e., in the population).
Students were observed using the machine on $589$\ occasions, of which\ $295$ times were by females

Is there evidence that the proportion of female users of the machines was *lower* than the overall female proportion at the university?
What would you conclude?
:::


::: {.exercise #OneProportionTestExercisesCasinos}
@koenen1995analysis found that $88$\ of the $357$\ visitors to Las Vegas casinos in 1995 were smokers.
At the time, $25.5$%\ of the general US population were smokers (based on data from the US *National Center for Health Statistics*).
Is the proportion of smokers among casino-goers the same as for the general US population?
:::


:::{.exercise #OneProportionBreadfruitPasta}
@nochera2019development developed gluten-free pasta made from breadfruit.
In the study sample, $57$\ of the $71$\ participants stated that they liked the pasta.
Do the researchers have sufficient evidence to claim that the 'majority of people like breadfruit pasta'?
:::


::: {.exercise #OneProportionTestExercisesCTS}
Carpal Tunnel Syndrome (CTS) is a painful condition in the wrists.
@boltuch2020palmaris were interested in whether 'a relationship exists between the palmaris tendon [and] carpal tunnel syndrome (CTS)' (p.\ 493).
The palmaris longus (PL) tendon is visually absent in about\ $15$% of the population.
The researchers found PL was visually absent in\ $33$ of\ $516$ CTS wrists in their sample.
Is there evidence to suggest that rate of PL absence is *different* in CTS cases, compared to the general population? 
:::


::: {.exercise #OneProportionTestExercisesBorers}
@siegfried2014estimating studied resistance of some commercial corn varieties to the European corn borer. 
Borers were collected from corn in Iowa and Nebraska.

Researchers aimed to estimate the frequency of resistance to the toxin in the corn.
By mating borers collected from the field with various resistant laboratory individuals, they could determine what proportion of resistant individuals to expect in the second generation offspring.
In one study of $n = 172$ second-generation individuals, $24$\ were found to be resistant. 
The expectation was that $1$-in-$16$ of the second-generation borers would be resistant if the field borers were resistant.
Perform a hypothesis test to determine if the data suggest that the field borers were resistant (that is, if the population proportion is\ $1/16$) as expected.
:::


::: {.exercise #OneProportionTestExercisesLEDlights}
@davidovic2019drivers studied street-light preferences of drivers.
Drivers were asked to conduct a series of manoeuvres under $3\,000$K\ LED light and then under $4\,000$K\ LED lights.
They were then asked to decide which street light they preferred.
Out of the $52$\ subjects, $29$\ preferred the $3\,000$K\ LED lights.
Is there evidence that the choice between the two street lights is random, or is there evidence of a preference for one over the other?
:::


::: {.exercise #OneProportionTestExercisesCoinSpin}
The euro was introduced as a currency on 01\ January 1999.
According to a report by the 
`r if (knitr::is_latex_output()) {
   '*New Scientist*'
} else {
   '[*New Scientist*](https://www.newscientist.com/article/dn1748-euro-coin-accused-of-unfair-flipping/)'
}`,
students in Poland spun a Belgian one-euro coin $250$\ times, and found $140$\ heads (as reported by @data:Gelman2002:DiceCoins).
This resulted in an 'accusation of bias' in the *New Scientist* article.
However, every set of $250$\ spins can produce a different proportion of heads, so perhaps the results is just due to randomness.
Does this sample of $250$\ spins suggest that the one-euro Belgian coin is biased?
:::


::: {.exercise #OneProportionTestExercisesBirths}
As noted in Sect.\ \@ref(ProbRelFreq), the 
`r if (knitr::is_latex_output()) {
   '*Australian Bureau of Statistics* (ABS)'
} else {
   '[*Australian Bureau of Statistics* (ABS)](http:www.abs.gov.au/ausstats/abs@.nsf/0/B8865D71D84F5210CA2579330016754C?opendocument)'
}`
stated that:

> The sex ratio for all births registered in Australia generally fluctuates around $105.5$\ male births per $100$\ female births.

(This statistic does not use births registered as 'other' or 'not stated'.)

1. The value of\ $105.5$ is effectively a population odds ratio of male-to-female births.
   Show that this is equivalent to the population proportion of male births as\ $0.51338$ (not including 'other' or 'not stated').
2. In\ 2021, there were $148\,636$\ male births and $140\,944$\ female births.
   Compute the *sample* proportion of male births in\ 2021 (to five decimal places).
<!--   (Another $23$\ births were registered as 'other' or 'not stated', but are not used.) -->
3. Conduct a test to determine if the 2021\ data appear different to the long-term proportion.
:::


`r if( knitr::is_latex_output() ) "\\captionsetup{font=normalsize}"`


<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
\textbf{Answers to \textit{Quick Revision} questions:}
**1.** False.
**2.** False; $\hat{p} = 381/449 = 0.84855$.
**3.** True.
**4.** False.
**5.** True.
**6.** True.
**7.** False.
**8.** True.
**9.** True.
**10.** False.
:::


<!-- ::: {.exercise #OneProportionTestExercisesIguanas} -->
<!-- @avery2014invasive studied black spiny-tailed iguanas in Florida (an invasive species). -->
<!-- They measured the iguanas' snout--vent length (SVL). -->
<!-- Of the $275$ iguanas with a SVL between $100$ and $149$\ mm, $146$ were female. -->

<!-- Assuming female and male iguanas were equally present in the population, is there evidence that female and male iguanas were equally-likely to be found with SVL in this range? -->
<!-- ::: -->


<!-- ::: {.exercise #OneProportionTestExercisesPenguins} -->
<!-- @vanstreels2013female studied Magellanic penguins found dead or stranded on the southern Brazilian coast. Of the $73$ adult penguins found, $47$ were female, -->

<!-- Assuming female and male penguins were equally present in the population, we would expect about half the dead or stranded penguins to be female and male. -->
<!-- Is this what the data suggest? -->
<!-- ::: -->

<!-- ## Example: obesity REMOVE {#OneProportiontestObesity} -->

<!-- @kolanska2010high compared the rate of obesity in $n = 143$ Polish patients with adrenal tumours to that of the general population of Poland ($p = 0.125$), to test if those with adrenal tumours were *more likely* to be obese that the general population. -->
<!-- The hypotheses are:   -->
<!-- \[ -->
<!--    \text{$H_0$: } p = 0.125\quad\text{and}\quad\text{$H_1$: } p > 0.125\text{ (one-tailed)}. -->
<!-- \] -->
<!-- Assuming the null hypothesis is true, the standard error is (remembering to use $p$):   -->
<!-- \[ -->
<!--    \text{s.e.}(\hat{p})  -->
<!--    = \sqrt{\frac{p (1 - p)}{n}}  -->
<!--    = \sqrt{\frac{.125 \times (1 - 0.125)}{143}}  -->
<!--    = 0.027656... -->
<!-- \] -->
<!-- In their sample, $57$ were obese, so $\hat{p} = 57/143 = 0.3986...$. -->
<!-- Then, the value of the *test statistic* is:   -->
<!-- \[ -->
<!--    z  -->
<!--    = \frac{\hat{p} - p}{\text{s.e.}(p)} -->
<!--    = \frac{0.3986 - 0.125}{0.027656} -->
<!--    = 9.89. -->
<!-- \] -->
<!-- This is an *extremely* large $z$-score, so expect a very small $P$-value using the 68--95--99.7 rule. -->

<!-- The $95$%\ CI for the proportion requires the standard error computed from the *sample* proportion:   -->
<!-- \[ -->
<!--    \text{s.e.}(\hat{p})  -->
<!--    = \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}  -->
<!--    = \sqrt{\frac{0.3986 \times (1 - 0.3986)}{143}}  -->
<!--    = 0.040943... -->
<!-- \] -->
<!-- The approximate $95$%\ CI is $0.3986 \pm(2 \times 0.040943...)$. -->
<!-- We write: -->

<!-- > *Very* strong evidence exists in the sample (one-tailed $P < 0.001$; $z = 9.89$) that the rate of obesity in patients with adrenal tumours ($\hat{p} = 0.3986$; $n = 143$; approximate 95% CI: 0.317 to 0.480) is higher than the general Polish population. -->