Consider rate of outcome in non-exposed group in interpret_oddsratio #657

KohlRaphael · 2024-11-20T10:15:51Z

The {interpret_oddsratio} function derives their rules (1.68, 3.47 and 6.71) from Chen et al. (2010). As Chen et al. explain, these thresholds are influenced by the outcome rate in the unexposed group, the current rules ignores that and assumes an outcome rate of 1% in that group.

The code below demonstrates that the current thresholds are overly conservative within the range of >0.1 to ~0.9 and insufficiently conservative at the extremes. It calculates thresholds across the full spectrum of possible baseline rates (>0 to <1). While Chen et al. provide odds ratio (OR) equivalents for Cohen’s d only within the range of 0.1 to 0.2, Kraemer (2004) provides the entire range. There are minor discrepancies between the calculated values and those reported by Chen et al. and Kraemer, typically on the second decimal place, but they grow more significant at the extreme margins (e.g., around 0.99999).

To address this, I implemented a possible solution that integrates the calculate_threshold function into the {interpret_oddsratio} function. By default, the baseline rate remains at 0.01, therefore no changes unless the p0 parameter is explicitly provided. I also updated the tests for the {interpret_oddsratio} function, as the rules argument is no longer the second parameter. However, since this is my first code contribution I would like to check for the things I missed (e.g. calculate_thresholds is not suppost to be inside the function) and at least one question (My changes in the manual do not show after the build, how to do that?).

library(dplyr)
library(ggplot2)
options(scipen=999)

calculate_thresholds <- function(p0, d = c(0.2,0.5,0.8)) {
  z0 <- qnorm(p0)
  z1 <- z0 + d
  p1 <- pnorm(z1)
  or <- (p1*(1-p0))/(p0*(1-p1))
  return(or)
}

chen_2010 <- c(seq(0.01,0.1,0.01))
kraemer_2004 <- c(0.00001,0.0001,0.001,0.01,
                  seq(0.1,0.9,0.1),
                  0.99,0.999,0.9999,0.99999)

data_chen <- tibble()
for (i in chen_2010) {
  or <- calculate_thresholds(i)
  data_chen <- bind_rows(
    data_chen,
    bind_cols(
      cohen_d = format(i),
      effect_size_02 = or[1],
      effect_size_05 = or[2],
      effect_size_08 = or[3]
      )
    )
}

data_kraemer <- tibble()
data_plot <- tibble()
for (i in kraemer_2004) {
  or <- calculate_thresholds(i)
  data_kraemer <- bind_rows(
    data_kraemer,
    bind_cols(
      cohen_d = format(i),
      effect_size_02 = or[1],
      effect_size_05 = or[2],
      effect_size_08 = or[3]
    )
  )

  data_plot <-
    bind_rows(
      data_plot,
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.2, odds_ratio = or[1]),
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.5, odds_ratio = or[2]),
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.8, odds_ratio = or[3])
    )

}

ggplot(data_plot) +
  geom_point(aes(x = cohen_d, y = odds_ratio, color = factor(effect_size))) +
  geom_hline(yintercept = c(1.68,3.47,6.71), linetype = "dashed", color = c("#F8766D","#00BA38", "#619CFF")) +
  theme_bw() +
  labs(x = "Cohen´s d", y = "Odds ratio", color = "Effect size")

mattansb · 2024-12-04T17:27:52Z

I’m not quite sure what the plot you generated is meant to demonstrate.

Here is a similar plot of the recovered d as a function of $p_0$ and the true d:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

library(effectsize)

generate_or <- function(p0, d) {
  p1 <- pnorm(qnorm(p0) + d)
  probs_to_odds(p1)/probs_to_odds(p0)
}

data_plot <- expand.grid(
  p0 = c(
    0.00001, 0.0001, 0.001, 0.01,
    seq(0.1, 0.9, 0.1),
    0.99, 0.999, 0.9999, 0.99999
  ),
  true_d = c(0.2, 0.5, 0.8)
) |>
  mutate(
    or = generate_or(p0, true_d), 
    recovered_d = oddsratio_to_d(or)
  )

ggplot(data_plot, aes(p0, recovered_d - true_d, color = factor(true_d))) +
  geom_point() + 
  geom_hline(yintercept = 0) + 
  theme_bw() +
  scale_y_continuous(breaks = seq(-2, 2, by = 0.2)) + 
  coord_cartesian(ylim = c(-0.1, 1.5)) + 
  labs(x = expression(p[0]), 
       y = expression(Delta[recovered-true]), 
       color = "True Cohen's d")

Here it is clear that only in the extreme edged (when the rate is <1% in either direction) do we get such a large deviation.

This deviation at the extremes is caused by the inverse-logit to probit approximation ($z \approx L \times \sqrt(3) / \pi$) being off at the tails.

Let’s try the exact method:

oddsratio_to_d_exact <- function(or, p0) {
  odds1 <- or * probs_to_odds(p0)
  p1 <- odds_to_probs(odds1)
  qnorm(p1) - qnorm(p0)
}

data_plot |> 
  mutate(
    recovered_d_exact = oddsratio_to_d_exact(or, p0)
  ) |> 
  ggplot(aes(p0, recovered_d_exact - true_d, color = factor(true_d))) +
  geom_point(aes(shape = "Exact")) + 
  geom_point(aes(y = recovered_d - true_d, shape = "Approx."), alpha = 0.2) + 
  geom_hline(yintercept = 0) + 
  theme_bw() +
  scale_y_continuous(breaks = seq(-2, 2, by = 0.2)) + 
  scale_shape_manual(NULL, values = c(21, 16)) + 
  coord_cartesian(ylim = c(-0.1, 1.5)) + 
  labs(x = expression(p[0]), 
       y = expression(Delta[recovered-true]), 
       color = "True Cohen's d")

^{Created on 2024-12-04 with reprex v2.1.1}

Looks good!

This can now be implamented in several places:

Converters:
- (log)oddsratio_to_d()
- d_to_(log)oddsratio()
- (log)oddsratio_to_r()
- t_to_(log)oddsratio()
interpret_oddsratio() that can be a wrapper around interpret_cohens_d(oddsratio_to_d(...))

(somewhat related to #568, #593)

#657

KohlRaphael · 2024-12-05T10:29:37Z

Hey @mattansb,

Thank you for your detailed response and all the changes—it’s clear there’s a lot I hadn’t considered! My initial intention with the plot was simply to replicate the rules outlined by Chen (and Kraemer) and visualize the discrepancies between those rules and the fixed thresholds currently used ininterpret_oddsratio(). I attempted to solve the issue by calculating the thresholds directly within the interpret_oddsratio() function (link). You saw a lot more and I am curios how that will effect the interpret_oddsratio() function in the end.

Also, apologies for the confusion caused by the closing and reopening of the issue—my hand slipped onto my mouse, and I accidentally closed it. Definitely a bit embarrassing!

Thank you!

mattansb · 2024-12-07T18:59:16Z

So effectively, the chen2010 rule is dropped - the default now is to use and OR to d transformation (possibly with p0 provided) and interpreted according to Cohen's 0.2, 0.5, & 0.8 thresholds. See #659

Thanks for the suggestions and pointing this out!

mattansb self-assigned this Dec 4, 2024

mattansb added the enhancement 🔥 New feature or request label Dec 4, 2024

mattansb added a commit that referenced this issue Dec 5, 2024

add p0 to oddsratio_to_d

31b5c94

#657

KohlRaphael closed this as completed Dec 5, 2024

KohlRaphael reopened this Dec 5, 2024

mattansb closed this as completed Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider rate of outcome in non-exposed group in interpret_oddsratio #657

Consider rate of outcome in non-exposed group in interpret_oddsratio #657

KohlRaphael commented Nov 20, 2024 •

edited by mattansb

Loading

mattansb commented Dec 4, 2024 •

edited

Loading

KohlRaphael commented Dec 5, 2024

mattansb commented Dec 7, 2024

Consider rate of outcome in non-exposed group in interpret_oddsratio #657

Consider rate of outcome in non-exposed group in interpret_oddsratio #657

Comments

KohlRaphael commented Nov 20, 2024 • edited by mattansb Loading

mattansb commented Dec 4, 2024 • edited Loading

KohlRaphael commented Dec 5, 2024

mattansb commented Dec 7, 2024

KohlRaphael commented Nov 20, 2024 •

edited by mattansb

Loading

mattansb commented Dec 4, 2024 •

edited

Loading