Skip to content

emilykroberts/AdaptiveBayesianUpdates

 
 

Repository files navigation

Adaptive Bayesian Updates

Current Suggested Citation

Boonstra, Philip S. and Barbaro, Ryan P., "Incorporating Historical Models with Adaptive Bayesian Updates" (2018) In Press at Biostatistics

Authors' Copy

DOI for this repository:

DOI

Executive Summary

The functions glm_nab and glm_sab contained in the file Functions.R represent the primary statistical contribution from this manuscript. With these functions, plus the mean and variance of the coefficients from a historical regression model and the usual ingredients for fitting the current model of interest, a user can fit a Bayesian logistic regression with the adaptive priors that are described in the manuscript.

Further details

In more detail, there are twelve files included in this repository (in addition to this README): one text file (ending in .txt), five R scripts (ending in .R), and six STAN functions (ending in .stan). The simulation studies reported in Boonstra and Barbaro were run using commit 22.

Text file

runABUSims.txt is the script for submitting parallel runs of runABUSims.R (described below) to a cluster that is running SLURM. The following command does this:

sbatch runABUSims.txt

R files

Functions.R provides all of the necessary functions to fit the methods described in the paper.

Exemplar.R creates a single simulated dataset and walks through how to fit the methods described in the manuscript.

GenParams.R constructs inputs for running the simulation study. As described in the script's documentation and the language below, these inputs can be overwritten by the user.

runABUSims.R is the script to conduct the large-scale simulation study described in the manuscript. On a local machine, the user may choose a specific array_id (as described in this script's documentation) and run the code locally on his/her machine. On a cluster running SLURM, the user can use this script to submit multiple jobs simultaneously (as described above).

makeFigures.R gives the code to create the figures and tables in the manuscript and supplementary material reporting on the simulation study.

STAN files

The STAN files are described below. Note that these currently all implement a logistic link, but changing to a non-logistic link (i.e. log, probit, etc.) would be relatively easy. Upon using these for the first time, R will need to compile these programs, creating an R data object file (ending in .rds) in the current working directory. Re-compilation of the STAN files are not necessary as long as they are unchanged.

RegHS_stable.stan implements the regularized horseshoe prior, using the settings described described in Boonstra and Barbaro, applied to a logistic regression. An R user calls this with glm_standard in Functions.R.

NAB_stable.stan, NAB_dev.stan both implement the 'naive adaptive Bayesian' prior, as described in Boonstra and Barbaro, applied to a logistic regression. The '_dev' modifier was initially used for testing development versions of the prior against the current stable version. For the results reported in Boonstra and Barbaro, the only difference between the two is in the hyperprior distribution on η (eta): in the former it is distributed as Inv-Gamma(2.5, 2.5), and in the latter it is Inv-Gamma(25, 25). The 'stable' versions are reported in the main manuscript. An R user calls this with the function glm_nab in Functions.R.

SAB_stable.stan, SAB_dev.stan are analogous versions of the 'sensible adaptive Bayesian' prior. An R user calls this with the function glm_sab in Functions.R.

RegStudT.stan implements a regularized Student-t prior applied to a logistic regression. This is not considered in the simulation study but is used in the data analysis ('PedRESC2'). A Student-t prior is applied to each regression coefficient using a normal-inverse-gamma distribution, but the latent inverse-gamma scale has a smooth upperbound provided by the user, so as to constrain very large scale values. An R user calls this with glm_studt in Functions.R.

Divergent transitions

Built into each glm_*** function is a check for divergent iterations (http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html), which are simulatenously very helpful, very mysterious, and very frustrating. The function will re-run if any divergent transitions are detected, up to a user-specified number of times (ntries), and return the results in which the fewest divergent transitions were encountered. By virtue of the way this check is constructed, the user will see the following warning each time divergent transitions are encountered:

Warning message: In glm_sab(stan_path = paste0(stan_file_path, sab_stan_filename), : NAs introduced by coercion

Compilation warning

STAN is smart enough to recognize the need for the normalizing constant and so, upon compilation, will give the following warning:

DIAGNOSTIC(S) FROM PARSER: Warning (non-fatal): Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable. If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform. Left-hand-side of sampling statement: normalized_beta ~ normal(...)

This warning can be safely ignored because we do, in fact, calculate the normalizing constant.

Note, 10-Jul-2018:

After updating to version 3.5.0, R occasionally throws the following 'error':

Error in x$.self$finalize() : attempt to apply non-function

Error is used in quotes because it does not interrupt any processes and does not seem to affect any results. Searching online, this has been asked about by others and seems to be related to garbage collection:

http://discourse.mc-stan.org/t/very-mysterious-debug-error-when-running-rstanarm-rstan-chains-error-in-x-self-finalize-attempt-to-apply-non-function/4746

About

Repository for Boonstra and Barbaro (2018)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 89.5%
  • Stan 10.3%
  • Shell 0.2%