How to export data from grouped extract_stats efficiently? #917

lxsteiner · 2024-03-14T13:08:34Z

I'd like to export the individual statistics from a grouped_ggbetweenstats analysis as a .csv or .xslx file, or any other format really. Ideally you'd get the individual $caption_data, $subtitle_data, $pairwise_comparisons_data and other sections in a single file/sheet, but no matter what solution I come up with I'm hitting an issue with "expression" columns within the tibbles that contain a list of statistical expressions.

e.g.

library(PMCMRplus)
p <- grouped_ggbetweenstats(data = mtcars, x = gear, y = mpg, grouping.var = am)
p

all columns inside the invididual tibbles are fine except for "expression":

extract_stats(p[[1]])$caption_data
# A tibble: 1 × 16
  term       effectsize      estimate conf.level conf.low conf.high    pd prior.distribution prior.location
  <chr>      <chr>              <dbl>      <dbl>    <dbl>     <dbl> <dbl> <chr>                       <dbl>
1 Difference Bayesian t-test    -3.69       0.95    -7.63    0.0293 0.974 cauchy                          0
  prior.scale  bf10 method          conf.method log_e_bf10 n.obs expression
        <dbl> <dbl> <chr>           <chr>            <dbl> <int> <list>    
1       0.707  3.43 Bayesian t-test ETI               1.23    19 <language>

Trying to write to an .xlsx for example with single sheets containing all the statistics for a group:

library(do)
for (i in 1:length(p)) {
  subplot <- extract_stats(p[[i]])
  sheetname <- paste0("group", i)
  do::write_xlsx(subplot$subtitle_data, file = "stats.xlsx", sheet = sheetname)
  do::write_xlsx(subplot$caption_data, file = "stats.xlsx", sheet = sheetname, append = TRUE)
}

but there's always an error in most functions that export the data objects (also with e.g. openxlsx::writeData):

Error in FUN(X[[i]], ...) : 
  argument `...` should be a character vector (or an object coercible to)
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'language'

Any suggestions or recommendations for better practices on getting all the statistics in some type of delimited file? I guess one option could be to omit the "expression" column before, but I don't think that all tibbles listed inside extract_stats have that column, if that could be an issue.
Also, any way to easily access the label of the group to print that label along with all the exported data (in the above example comes from grouping.var = am which is just "0" or "1"), other than how it is indexed and ordered?

Any ideas or suggestions would really be welcome. Thank you.

The text was updated successfully, but these errors were encountered:

oranwutang · 2024-07-25T17:38:53Z

Hi, @lxsteiner!

I came across with exactly the same problem trying to do exactly what you're trying to do!

it results that the 'expression' column is a list containing a language expression and that's not an atomic vector, you can do something like:

subplot$subtitle_data$expression <- as.character(subplot$subtitle_data$expression)

Your code should look like:

library(do)
for (i in 1:length(p)) {
  subplot <- extract_stats(p[[i]])
  sheetname <- paste0("group", i)
  subplot$subtitle_data$expression <- as.character(subplot$subtitle_data$expression)
  do::write_xlsx(subplot$subtitle_data, file = "stats.xlsx", sheet = sheetname)
  do::write_xlsx(subplot$caption_data, file = "stats.xlsx", sheet = sheetname, append = TRUE)
}

This will convert the expression column to an atomic vector of chars. Then, the resulting dataframe can be safely passed to the exporting function, in my case openxlsx::write.xlsx()

IndrajeetPatil · 2024-07-27T19:16:45Z

library(ggstatsplot)

p <- grouped_ggpiestats(mtcars, x = cyl, grouping.var = am)
extract_stats(p)
#> [[1]]
#> $subtitle_data
#> # A tibble: 1 × 13
#>   statistic    df p.value method                                   effectsize 
#>       <dbl> <dbl>   <dbl> <chr>                                    <chr>      
#> 1      7.68     2  0.0214 Chi-squared test for given probabilities Pearson's C
#>   estimate conf.level conf.low conf.high conf.method conf.distribution n.obs
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <chr>             <int>
#> 1    0.537       0.95   0.0666     0.725 ncp         chisq                19
#>   expression
#>   <list>    
#> 1 <language>
#> 
#> $caption_data
#> # A tibble: 1 × 4
#>    bf10 prior.scale method                                      expression
#>   <dbl>       <dbl> <chr>                                       <list>    
#> 1  1.15           1 Bayesian one-way contingency table analysis <language>
#> 
#> $pairwise_comparisons_data
#> NULL
#> 
#> $descriptive_data
#> # A tibble: 3 × 4
#>   cyl   counts  perc .label
#>   <fct>  <int> <dbl> <chr> 
#> 1 8         12  63.2 63%   
#> 2 6          4  21.1 21%   
#> 3 4          3  15.8 16%   
#> 
#> $one_sample_data
#> NULL
#> 
#> $tidy_data
#> NULL
#> 
#> $glance_data
#> NULL
#> 
#> attr(,"class")
#> [1] "ggstatsplot_stats" "list"             
#> 
#> [[2]]
#> $subtitle_data
#> # A tibble: 1 × 13
#>   statistic    df p.value method                                   effectsize 
#>       <dbl> <dbl>   <dbl> <chr>                                    <chr>      
#> 1      4.77     2  0.0921 Chi-squared test for given probabilities Pearson's C
#>   estimate conf.level conf.low conf.high conf.method conf.distribution n.obs
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <chr>             <int>
#> 1    0.518       0.95        0     0.741 ncp         chisq                13
#>   expression
#>   <list>    
#> 1 <language>
#> 
#> $caption_data
#> # A tibble: 1 × 4
#>    bf10 prior.scale method                                      expression
#>   <dbl>       <dbl> <chr>                                       <list>    
#> 1 0.434           1 Bayesian one-way contingency table analysis <language>
#> 
#> $pairwise_comparisons_data
#> NULL
#> 
#> $descriptive_data
#> # A tibble: 3 × 4
#>   cyl   counts  perc .label
#>   <fct>  <int> <dbl> <chr> 
#> 1 8          2  15.4 15%   
#> 2 6          3  23.1 23%   
#> 3 4          8  61.5 62%   
#> 
#> $one_sample_data
#> NULL
#> 
#> $tidy_data
#> NULL
#> 
#> $glance_data
#> NULL
#> 
#> attr(,"class")
#> [1] "ggstatsplot_stats" "list"
extract_subtitle(p)
#> [[1]]
#> list(chi["gof"]^2 * "(" * 2 * ")" == "7.68", italic(p) == "0.02", 
#>     widehat(italic("C"))["Pearson"] == "0.54", CI["95%"] ~ "[" * 
#>         "0.07", "0.73" * "]", italic("n")["obs"] == "19")
#> 
#> [[2]]
#> list(chi["gof"]^2 * "(" * 2 * ")" == "4.77", italic(p) == "0.09", 
#>     widehat(italic("C"))["Pearson"] == "0.52", CI["95%"] ~ "[" * 
#>         "0.00", "0.74" * "]", italic("n")["obs"] == "13")
extract_caption(p)
#> [[1]]
#> list(log[e] * (BF["01"]) == "-0.14", italic("a")["Gunel-Dickey"] == 
#>     "1.00")
#> 
#> [[2]]
#> list(log[e] * (BF["01"]) == "0.83", italic("a")["Gunel-Dickey"] == 
#>     "1.00")

^{Created on 2024-07-27 with reprex v2.1.1}

oranwutang · 2024-07-27T21:15:43Z

So cool!

oranwutang mentioned this issue Jul 25, 2024

Efficiently (and easily) extract stats from grouped plots #953

Closed

IndrajeetPatil linked a pull request Jul 27, 2024 that will close this issue

Make sure extract_stats() and cousins work out of the box with grouped plots #955

Merged

IndrajeetPatil closed this as completed in #955 Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to export data from grouped extract_stats efficiently? #917

How to export data from grouped extract_stats efficiently? #917

lxsteiner commented Mar 14, 2024

oranwutang commented Jul 25, 2024 •

edited

Loading

IndrajeetPatil commented Jul 27, 2024

oranwutang commented Jul 27, 2024

How to export data from grouped extract_stats efficiently? #917

How to export data from grouped extract_stats efficiently? #917

Comments

lxsteiner commented Mar 14, 2024

oranwutang commented Jul 25, 2024 • edited Loading

IndrajeetPatil commented Jul 27, 2024

oranwutang commented Jul 27, 2024

oranwutang commented Jul 25, 2024 •

edited

Loading