folk

folk provides easy access to datasets that can be used to benchmark machine learning algorithms. The goal of folk is to facilitate and encourage work on fair machine learning among R users.

The folk package has three key features:

Feature	Description
`get_()`	The `get_()` functions provide easy access to data. Currently, there is only one `get_()` function, `get_acs()`, which provides access to the US Census Bureau’s American Community Survey (ACS) Public Use Microdata Sample.
`set_task()`	The `set_task()` function preprocesses data for pre-defined prediction tasks. Pre-defined tasks can be viewed with `show_tasks()`.
`new_task()`	The `new_task()` function allows users to create custom tasks. A custom task created via `new_task()` returns an object consistent with that returned by `set_task()`.

Installation

Install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("george-wood/folk")

Usage

library(folk)

Easy access to data via folk’s API: get_acs(), …

devtools::load_all()
# optionally, set a path to write to
delaware <- get_acs(state = "de", year = 2014, period = 1, survey = "person")

Show pre-defined prediction tasks for data accessed through the API: show_tasks()

show_tasks(delaware)

#> $income
#> function(
#>     features = c("AGEP",
#>                  "COW",
#>                  "SCHL",
#>                  "MAR",
#>                  "OCCP",
#>                  "POBP",
#>                  "RELP",
#>                  "WKHP",
#>                  "SEX",
#>                  "RAC1P"),
#>     target = "PINCP",
#>     group = "RAC1P",
#>     filter = filter_adult,
#>     target_transform = function(y) binary_target_(y > 50000),
#>     group_transform = NULL,
#>     preprocess = NULL,
#>     postprocess = function(x) replace_na_(x, value = -1L)
#> ) {
#>   invisible(FALSE)
#> }
#> 
#> ...

Set a pre-defined prediction task: set_task()

delaware_income <- set_task(delaware, task = "income")
#> ℹ Setting income prediction task. See `folk::show_definition()()` for details.
head(delaware_income)
#>   PINCP RAC1P AGEP COW SCHL MAR OCCP POBP RELP WKHP SEX
#> 1     0     1   25   1   16   5 5400   17   16   40   2
#> 2     0     1   37   1   21   1 3255   34    0   40   2
#> 3     0     1   36   2   19   5  110   40    0   40   1
#> 4     0     1   59   2   20   1 5120   54    0   40   2
#> 5     0     1   21   1   19   5 5240   10    2   36   2
#> 6     1     1   51   1   16   3 7150   24    0   40   1

Example

library(tidymodels)

delaware <- get_acs(state = "de", year = 2014, period = 1, survey = "person")
delaware_income <- set_task(delaware, task = "income")
#> ℹ Setting income prediction task. See `folk::task_income()` for details.

set.seed(0)
split <- initial_split(delaware_income, prop = 0.8)
train <- training(split)
test  <- testing(split)

income_recipe <-
  recipe(PINCP ~ ., data = train) |>
  step_normalize()

income_model <-
  logistic_reg(mode = "classification", engine = "glm")

income_flow <-
  workflow() |>
  add_recipe(income_recipe) |>
  add_model(income_model)

yhat <- 
  fit(income_flow, data = train) |>
  predict(new_data = test, type = "class")

yhat <- as.numeric(as.character(yhat$.pred_class))
black_tpr <- mean(yhat[test$PINCP == 1 & test$RAC1P == 2])
black_fpr <- mean(yhat[test$PINCP == 0 & test$RAC1P == 2])
white_tpr <- mean(yhat[test$PINCP == 1 & test$RAC1P == 1])
white_fpr <- mean(yhat[test$PINCP == 0 & test$RAC1P == 1])

black_tpr
#> [1] 0.3414634
black_fpr
#> [1] 0.1025641

white_tpr
#> [1] 0.5992063
white_fpr
#> [1] 0.1648352

# equalized odds difference:
max(abs(black_tpr - white_tpr), abs(black_fpr - white_fpr))
#> [1] 0.2577429

Acknowledgements

The folk package is inspired by the folktables Python package. For more information on folktables see Ding, Hardt, Miller, and Schmidt (2022), Retiring Adult: New Datasets for Fair Machine Learning. The pre-defined prediction tasks for the American Community Survey data are implementations of the tasks introduced in this paper.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
R		R
data-raw		data-raw
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
folk.Rproj		folk.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

folk

Installation

Usage

Example

Acknowledgements

About

Releases

Packages

Languages

License

george-wood/folk

Folders and files

Latest commit

History

Repository files navigation

folk

Installation

Usage

Example

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages