hlm.Rmd

---
title: "Hierarchical Linear Modeling"
output:
  html_document:
    code_folding: show
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  error = TRUE,
  comment = "",
  class.source = "fold-show")
```

# Preamble

## Install Libraries

```{r, class.source = "fold-hide"}
#install.packages("remotes")
#remotes::install_github("DevPsyLab/petersenlab")
```

## Load Libraries

```{r, message = FALSE, warning = FALSE, class.source = "fold-hide"}
library("petersenlab")
library("lme4")
library("nlme")
library("lmerTest")
library("MASS")
library("MCMCglmm")
library("performance")
library("ggplot2")
```

## Import Data

```{r, eval = FALSE, class.source = "fold-hide"}
mydata <- read.csv("https://osf.io/cqn3d/download")
```

```{r, include = FALSE}
mydata <- read.csv("./data/nlsy_math_long.csv") #https://osf.io/cqn3d/download
```

## Simulate Data

```{r, class.source = "fold-hide"}
set.seed(52242)

mydata$outcome <- rpois(nrow(mydata), 4)
```

# Terms

These models go by a variety of different terms:

- hierarchical linear model (HLM)
- multilevel model (MLM)
- mixed effects model
- mixed model

# Overview

https://isaactpetersen.github.io/Principles-Psychological-Assessment/reliability.html#mixedModels

# Pre-Model Computation

It can be helpful to center the age/time variable so that the intercept in a growth curve model has meaning.
For instance, we can subtract the youngest participant age to set the intercepts to be the earliest age in the sample.

```{r}
mydata$ageYears <- mydata$age / 12
mydata$ageMonthsCentered <- mydata$age - min(mydata$age, na.rm = TRUE)

mydata$ageYearsCentered <- mydata$ageMonthsCentered / 12
```

# Estimator: ML or REML

For small sample sizes, restricted maximum likelihood (REML) is preferred over maximum likelihood (ML).
ML preferred when there is a small number (< 4) of fixed effects; REML is preferred when there are more (> 4) fixed effects.
The greater the number of fixed effects, the greater the difference between REML and ML estimates.
Likelihood ratio (LR) tests for REML require exactly the same fixed effects specification in both models.
So, to compare models with different fixed effects with an LR test (to determine whether to include a particular fixed effect), ML must be used.
In contrast to the maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters, variance estimates are larger in REML than ML.
To compare whether an effect should be fixed or random, use REML.
To simultaneously compare fixed and random effects, use ML.

# Linear Mixed Models {#linear}

The following models are models that are fit in a linear mixed modeling framework.

## Growth Curve Models {#gcm}

### Plot Observed Growth Curves

```{r}
ggplot(
  data = mydata,
  mapping = aes(
    x = ageYears,
    y = math,
    group = id)) +
  geom_line() +
  scale_x_continuous(
    name = "Age (Years)") +
  scale_y_continuous(
    name = "Math Score")
```

### `lme4`

```{r}
linearMixedModel <- lmer(
  math ~ female + ageYearsCentered + (ageYearsCentered | id),
  data = mydata,
  REML = FALSE, #for ML
  na.action = na.exclude,
  control = lmerControl(optimizer = "bobyqa"))

summary(linearMixedModel)
```

#### Protoypical Growth Curve

```{r}
newData <- expand.grid(
  female = c(0, 1),
  ageYears = c(
    min(mydata$ageYears, na.rm = TRUE),
    max(mydata$ageYears, na.rm = TRUE))
)

newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)

newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)

newData$predictedValue <- predict(
  linearMixedModel,
  newdata = newData,
  re.form = NA
)

ggplot(
  data = newData,
  mapping = aes(x = ageYears, y = predictedValue, color = sex)) +
  xlab("Age (Years)") +
  ylab("Math Score") +
  geom_line()
```

#### Individuals' Growth Curves

```{r}
mydata$predictedValue <- predict(
  linearMixedModel,
  newdata = mydata,
  re.form = NULL
)

ggplot(
  data = mydata,
  mapping = aes(x = ageYears, y = predictedValue, group = factor(id))) +
  xlab("Age (Years)") +
  ylab("Math Score") +
  geom_line()
```

#### Individuals' Trajectories Overlaid with Prototypical Trajectory

```{r}
ggplot(
  data = mydata,
  mapping = aes(x = ageYears, y = predictedValue, group = factor(id))) +
  xlab("Age (Years)") +
  ylab("Math Score") +
  geom_line() +
  geom_line(
    data = newData,
    mapping = aes(x = ageYears, y = predictedValue, group = sex, color = sex),
    linewidth = 2)
```

### `nlme`

```{r}
linearMixedModel_nlme <- lme(
  math ~ female + ageYearsCentered,
  random = ~ 1 + ageYearsCentered|id,
  data = mydata,
  method = "ML",
  na.action = na.exclude)

summary(linearMixedModel_nlme)
```

## Intraclass Correlation Coefficent {#icc}

```{r}
icc(linearMixedModel)
icc(linearMixedModel_nlme)
```

# Generalized Linear Mixed Models {#generalized}

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html (archived at https://perma.cc/9RFS-BCE7; source code: https://github.com/bbolker/mixedmodels-misc/blob/master/glmmFAQ.rmd)

## `lmer`

```{r}
generalizedLinearMixedModel <- glmer(
  outcome ~ female + ageYearsCentered + (ageYearsCentered | id),
  family = poisson(link = "log"),
  data = mydata,
  na.action = na.exclude)

summary(generalizedLinearMixedModel)
```

## `MASS`

```{r}
glmmPQLmodel <- glmmPQL(
  outcome ~ female + ageYearsCentered,
  random = ~ 1 + ageYearsCentered|id,
  family = poisson(link = "log"),
  data = mydata)

summary(glmmPQLmodel)
```

## `MCMCglmm`

```{r}
MCMCglmmModel <- MCMCglmm(
  outcome ~ female + ageYearsCentered,
  random = ~ us(ageYearsCentered):id,
  family = "poisson",
  data = na.omit(mydata[,c("id","outcome","female","ageYearsCentered")]))

summary(MCMCglmmModel)
```

# Nonlinear Mixed Models {#nonlinear}

```{r}
nonlinearModel <- nlme(
  height ~ SSasymp(age, Asym, R0, lrc),
  data = Loblolly,
  fixed = Asym + R0 + lrc ~ 1,
  random = Asym ~ 1)

summary(nonlinearModel)
```

# Robust Mixed Models

To evaluate the extent to which a finding could driven by outliers, this could be done in a number of different ways, such as:

- identifying and excluding influential observations based on DFBETAS or Cook’s distance (Nieuwenhuis, Grotenhuis, & Pelzer, 2012)
- fitting mixed models using rank-based estimation (Bilgic & Susmann, 2013; Finch, 2017) or robust estimating equations (Koller, 2016)
- estimating robust standard errors using a sandwich estimator (Wang & Merkle, 2018)

# Assumptions

The within-group errors:

1. are independent
2. are identically normally distributed
3. have mean zero and variance sigma-squared
4. are independent of the random effects

The random effects:

5. are normally distributed
6. have mean zero and covariance matrix Psi (not depending on the group)
7. are independent for different groups

# Examining Model Assumptions

## Resources

Pinheiro and Bates (2000) book (p. 174, section 4.3.1)

https://stats.stackexchange.com/questions/77891/checking-assumptions-lmer-lme-mixed-models-in-r (archived at https://perma.cc/J5GC-PCUT)

## QQ Plots

Make QQ plots for each level of the random effects.
Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.

```{r}
qqnorm(linearMixedModel_nlme,
       ~ ranef(., level = 1))
```

## PP Plots

```{r}
ppPlot(linearMixedModel)
```

## QQ Plot of residuals

```{r}
qqnorm(resid(linearMixedModel))
qqline(resid(linearMixedModel))
```

## Plot residuals

```{r}
plot(linearMixedModel)
```

## Plot residuals by group (in the example below, level 2 represents the individual)

```{r}
plot(linearMixedModel,
     as.factor(id) ~ resid(.),
     abline = 0,
     xlab = "Residuals")
```

## Plot residuals by levels of a predictor

```{r}
plot(linearMixedModel_nlme,
     resid(., type = "p") ~ fitted(.) | female) #type = "p" specifies standardized residuals
```

## Can model heteroscedasticity of the within-group error with the weights argument

```{r}
linearMixedModel_nlmeVarStructure <- lme(
  math ~ female + ageYearsCentered,
  random = ~ 1 + ageYearsCentered|id,
  weights = varIdent(form = ~ 1 | female),
  method = "ML",
  data = mydata,
  na.action = na.exclude)

summary(linearMixedModel_nlmeVarStructure)
```

## Plot observed and fitted values

```{r}
plot(linearMixedModel,
     math ~ fitted(.))
```

## Plot QQ plot of residuals by levels of a predictor

```{r}
qqnorm(linearMixedModel_nlme, ~ resid(.) | female)
qqnorm(linearMixedModel_nlme, ~ resid(.))
```

## QQ plot of random effects

Make QQ plots for each level of the random effects.
Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.


```{r}
qqnorm(linearMixedModel_nlme,
       ~ ranef(., level = 0))
qqnorm(linearMixedModel_nlme,
       ~ ranef(., level = 1))
qqnorm(linearMixedModel_nlme,
       ~ ranef(., level = 2))
```

## QQ plot of random effects by levels of a predictor

```{r}
qqnorm(linearMixedModel_nlme, 
       ~ ranef(., level = 1) | female)
```

## Pairs plot

```{r}
pairs(linearMixedModel_nlme)
pairs(linearMixedModel_nlme,
      ~ ranef(., level = 1) | female)
```

## Variance functions for modeling heteroscedasticity

- `varFixed`: fixed variance
- `varIdent`: different variances per stratum
- `varPower`: power of covariate
- `varExp`: exponential of covariate
- `varConstPower`: constant plus power of covariate
- `varComb`: combination of variance functions

## Correlation structures for modeling dependence

- `corCompSymm`: compound symmetry
- `corSymm`: general
- `corAR1`: autoregressive of order 1
- `corCAR1`: continuous-time AR(1)
- `corARMA`: autoregressive-moving average
- `corExp`: exponential
- `corGaus`: Gaussian
- `corLin`: linear
- `corRatio`: rational quadratic
- `corSpher`: spherical

# Power Analysis {#powerAnalysis}

- https://aguinis.shinyapps.io/ml_power/
- https://www.causalevaluation.org/power-analysis.html
  - https://powerupr.shinyapps.io/index/
- https://koumurayama.shinyapps.io/tmethod_mlm/
- https://webpower.psychstat.org/wiki/models/index

# Session Info

```{r, class.source = "fold-hide"}
sessionInfo()
```