Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider rate of outcome in non-exposed group in interpret_oddsratio #657

Closed
KohlRaphael opened this issue Nov 20, 2024 · 3 comments
Closed
Assignees
Labels
enhancement 🔥 New feature or request

Comments

@KohlRaphael
Copy link

KohlRaphael commented Nov 20, 2024

The {interpret_oddsratio} function derives their rules (1.68, 3.47 and 6.71) from Chen et al. (2010). As Chen et al. explain, these thresholds are influenced by the outcome rate in the unexposed group, the current rules ignores that and assumes an outcome rate of 1% in that group.

The code below demonstrates that the current thresholds are overly conservative within the range of >0.1 to ~0.9 and insufficiently conservative at the extremes. It calculates thresholds across the full spectrum of possible baseline rates (>0 to <1). While Chen et al. provide odds ratio (OR) equivalents for Cohen’s d only within the range of 0.1 to 0.2, Kraemer (2004) provides the entire range. There are minor discrepancies between the calculated values and those reported by Chen et al. and Kraemer, typically on the second decimal place, but they grow more significant at the extreme margins (e.g., around 0.99999).

To address this, I implemented a possible solution that integrates the calculate_threshold function into the {interpret_oddsratio} function. By default, the baseline rate remains at 0.01, therefore no changes unless the p0 parameter is explicitly provided. I also updated the tests for the {interpret_oddsratio} function, as the rules argument is no longer the second parameter. However, since this is my first code contribution I would like to check for the things I missed (e.g. calculate_thresholds is not suppost to be inside the function) and at least one question (My changes in the manual do not show after the build, how to do that?).

library(dplyr)
library(ggplot2)
options(scipen=999)

calculate_thresholds <- function(p0, d = c(0.2,0.5,0.8)) {
  z0 <- qnorm(p0)
  z1 <- z0 + d
  p1 <- pnorm(z1)
  or <- (p1*(1-p0))/(p0*(1-p1))
  return(or)
}

chen_2010 <- c(seq(0.01,0.1,0.01))
kraemer_2004 <- c(0.00001,0.0001,0.001,0.01,
                  seq(0.1,0.9,0.1),
                  0.99,0.999,0.9999,0.99999)

data_chen <- tibble()
for (i in chen_2010) {
  or <- calculate_thresholds(i)
  data_chen <- bind_rows(
    data_chen,
    bind_cols(
      cohen_d = format(i),
      effect_size_02 = or[1],
      effect_size_05 = or[2],
      effect_size_08 = or[3]
      )
    )
}

data_kraemer <- tibble()
data_plot <- tibble()
for (i in kraemer_2004) {
  or <- calculate_thresholds(i)
  data_kraemer <- bind_rows(
    data_kraemer,
    bind_cols(
      cohen_d = format(i),
      effect_size_02 = or[1],
      effect_size_05 = or[2],
      effect_size_08 = or[3]
    )
  )

  data_plot <-
    bind_rows(
      data_plot,
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.2, odds_ratio = or[1]),
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.5, odds_ratio = or[2]),
      bind_cols(cohen_d = format(i, nsmall = 3),effect_size = 0.8, odds_ratio = or[3])
    )

}

ggplot(data_plot) +
  geom_point(aes(x = cohen_d, y = odds_ratio, color = factor(effect_size))) +
  geom_hline(yintercept = c(1.68,3.47,6.71), linetype = "dashed", color = c("#F8766D","#00BA38", "#619CFF")) +
  theme_bw() +
  labs(x = "Cohen´s d", y = "Odds ratio", color = "Effect size")

image

@mattansb
Copy link
Member

mattansb commented Dec 4, 2024

I’m not quite sure what the plot you generated is meant to demonstrate.

Here is a similar plot of the recovered d as a function of $p_0$ and the true d:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

library(effectsize)

generate_or <- function(p0, d) {
  p1 <- pnorm(qnorm(p0) + d)
  probs_to_odds(p1)/probs_to_odds(p0)
}

data_plot <- expand.grid(
  p0 = c(
    0.00001, 0.0001, 0.001, 0.01,
    seq(0.1, 0.9, 0.1),
    0.99, 0.999, 0.9999, 0.99999
  ),
  true_d = c(0.2, 0.5, 0.8)
) |>
  mutate(
    or = generate_or(p0, true_d), 
    recovered_d = oddsratio_to_d(or)
  )

ggplot(data_plot, aes(p0, recovered_d - true_d, color = factor(true_d))) +
  geom_point() + 
  geom_hline(yintercept = 0) + 
  theme_bw() +
  scale_y_continuous(breaks = seq(-2, 2, by = 0.2)) + 
  coord_cartesian(ylim = c(-0.1, 1.5)) + 
  labs(x = expression(p[0]), 
       y = expression(Delta[recovered-true]), 
       color = "True Cohen's d")

Here it is clear that only in the extreme edged (when the rate is <1% in either direction) do we get such a large deviation.

This deviation at the extremes is caused by the inverse-logit to probit approximation ($z \approx L \times \sqrt(3) / \pi$) being off at the tails.

Let’s try the exact method:

oddsratio_to_d_exact <- function(or, p0) {
  odds1 <- or * probs_to_odds(p0)
  p1 <- odds_to_probs(odds1)
  qnorm(p1) - qnorm(p0)
}

data_plot |> 
  mutate(
    recovered_d_exact = oddsratio_to_d_exact(or, p0)
  ) |> 
  ggplot(aes(p0, recovered_d_exact - true_d, color = factor(true_d))) +
  geom_point(aes(shape = "Exact")) + 
  geom_point(aes(y = recovered_d - true_d, shape = "Approx."), alpha = 0.2) + 
  geom_hline(yintercept = 0) + 
  theme_bw() +
  scale_y_continuous(breaks = seq(-2, 2, by = 0.2)) + 
  scale_shape_manual(NULL, values = c(21, 16)) + 
  coord_cartesian(ylim = c(-0.1, 1.5)) + 
  labs(x = expression(p[0]), 
       y = expression(Delta[recovered-true]), 
       color = "True Cohen's d")

Created on 2024-12-04 with reprex v2.1.1

Looks good!

This can now be implamented in several places:

  • Converters:
    • (log)oddsratio_to_d()
    • d_to_(log)oddsratio()
    • (log)oddsratio_to_r()
    • t_to_(log)oddsratio()
  • interpret_oddsratio() that can be a wrapper around interpret_cohens_d(oddsratio_to_d(...))

(somewhat related to #568, #593)

@mattansb mattansb self-assigned this Dec 4, 2024
@mattansb mattansb added the enhancement 🔥 New feature or request label Dec 4, 2024
mattansb added a commit that referenced this issue Dec 5, 2024
@KohlRaphael KohlRaphael reopened this Dec 5, 2024
@KohlRaphael
Copy link
Author

Hey @mattansb,

Thank you for your detailed response and all the changes—it’s clear there’s a lot I hadn’t considered! My initial intention with the plot was simply to replicate the rules outlined by Chen (and Kraemer) and visualize the discrepancies between those rules and the fixed thresholds currently used ininterpret_oddsratio(). I attempted to solve the issue by calculating the thresholds directly within the interpret_oddsratio() function (link). You saw a lot more and I am curios how that will effect the interpret_oddsratio() function in the end.

Also, apologies for the confusion caused by the closing and reopening of the issue—my hand slipped onto my mouse, and I accidentally closed it. Definitely a bit embarrassing!

Thank you!

@mattansb
Copy link
Member

mattansb commented Dec 7, 2024

So effectively, the chen2010 rule is dropped - the default now is to use and OR to d transformation (possibly with p0 provided) and interpreted according to Cohen's 0.2, 0.5, & 0.8 thresholds. See #659

Thanks for the suggestions and pointing this out!

@mattansb mattansb closed this as completed Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants