folk

folk provides easy access to datasets that can be used to benchmark machine learning algorithms. The goal of folk is to facilitate and encourage work on fair machine learning among R users.

The folk package has three key features:

Feature	Description
`get_()`	The `get_()` functions provide easy access to data. Currently, there is only one `get_()` function, `get_acs()`, which provides access to the US Census Bureau’s American Community Survey (ACS) Public Use Microdata Sample.
`set_task()`	The `set_task()` function preprocesses data for pre-defined prediction tasks. Pre-defined tasks can be viewed with `show_tasks()`.
`new_task()`	The `new_task()` function allows users to create custom tasks. A custom task created via `new_task()` returns an object consistent with that returned by `set_task()`.

Installation

Install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("george-wood/folk")

Usage

library(folk)

Easy access to data via folk’s API: get_acs(), …

devtools::load_all()
# optionally, set a path to write to
delaware <- get_acs(state = "de", year = 2014, period = 1, survey = "person")

Show pre-defined prediction tasks for data accessed through the API: show_tasks()

show_tasks(delaware)

#> $income
#> function(
#>     features = c("AGEP",
#>                  "COW",
#>                  "SCHL",
#>                  "MAR",
#>                  "OCCP",
#>                  "POBP",
#>                  "RELP",
#>                  "WKHP",
#>                  "SEX",
#>                  "RAC1P"),
#>     target = "PINCP",
#>     group = "RAC1P",
#>     filter = filter_adult,
#>     target_transform = function(y) binary_target_(y > 50000),
#>     group_transform = NULL,
#>     preprocess = NULL,
#>     postprocess = function(x) replace_na_(x, value = -1L)
#> ) {
#>   invisible(FALSE)
#> }
#> 
#> ...

Set a pre-defined prediction task: set_task()

delaware_income <- set_task(delaware, task = "income")
#> ℹ Setting income prediction task. See `folk::show_definition()()` for details.
head(delaware_income)
#>   PINCP RAC1P AGEP COW SCHL MAR OCCP POBP RELP WKHP SEX
#> 1     0     1   25   1   16   5 5400   17   16   40   2
#> 2     0     1   37   1   21   1 3255   34    0   40   2
#> 3     0     1   36   2   19   5  110   40    0   40   1
#> 4     0     1   59   2   20   1 5120   54    0   40   2
#> 5     0     1   21   1   19   5 5240   10    2   36   2
#> 6     1     1   51   1   16   3 7150   24    0   40   1

Example

library(tidymodels)

delaware <- get_acs(state = "de", year = 2014, period = 1, survey = "person")
delaware_income <- set_task(delaware, task = "income")
#> ℹ Setting income prediction task. See `folk::task_income()` for details.

set.seed(0)
split <- initial_split(delaware_income, prop = 0.8)
train <- training(split)
test  <- testing(split)

income_recipe <-
  recipe(PINCP ~ ., data = train) |>
  step_normalize()

income_model <-
  logistic_reg(mode = "classification", engine = "glm")

income_flow <-
  workflow() |>
  add_recipe(income_recipe) |>
  add_model(income_model)

yhat <- 
  fit(income_flow, data = train) |>
  predict(new_data = test, type = "class")

yhat <- as.numeric(as.character(yhat$.pred_class))
black_tpr <- mean(yhat[test$PINCP == 1 & test$RAC1P == 2])
black_fpr <- mean(yhat[test$PINCP == 0 & test$RAC1P == 2])
white_tpr <- mean(yhat[test$PINCP == 1 & test$RAC1P == 1])
white_fpr <- mean(yhat[test$PINCP == 0 & test$RAC1P == 1])

black_tpr
#> [1] 0.3414634
black_fpr
#> [1] 0.1025641

white_tpr
#> [1] 0.5992063
white_fpr
#> [1] 0.1648352

# equalized odds difference:
max(abs(black_tpr - white_tpr), abs(black_fpr - white_fpr))
#> [1] 0.2577429

Acknowledgements

The folk package is inspired by the folktables Python package. For more information on folktables see Ding, Hardt, Miller, and Schmidt (2022), Retiring Adult: New Datasets for Fair Machine Learning. The pre-defined prediction tasks for the American Community Survey data are implementations of the tasks introduced in this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

folk

Installation

Usage

Example

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

folk

Installation

Usage

Example

Acknowledgements