Skip to content

Latest commit

 

History

History
 
 

federatedml

Federated Machine Learning

Federatedml includes implementation of many common machine learning algorithms as well as necessary utility tools. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

  1. FML Algorithms: Federated machine learning algorithms serving for DataIO, Data-preprocessing, feature engineering and modeling. More details are listed below.

  2. Utilities: Tools that enable federated learning such as encryption tools, statistic modules, parameter definitions, and transfer variable autogenerator etc.

  3. Framework: Kits and base models for developing new algorithm modules. Framework provides reusable functions to standardize modules and make them compact.

  4. Secure Protocol: Provides multiple security protocols for more secure multi-party interaction calculations.

federatedml structure
Figure 1: Federated Machine Learning Framework

Algorithm List

This component is typically the first component of a modeling task. It will transform user-uploaded date into Instance object which can be used for the following components.

  • Corresponding module name: DataIO

  • Data Input: DTable, values are raw data.

  • Data Output: Transformed DTable, values are data instance define in federatedml/feature/instance.py

Compute intersect data set of two parties without leakage of difference set information. Mainly used in hetero scenario task.

  • Corresponding module name: Intersection

  • Data Input: DTable

  • Data Output: DTable which keys are occurred in both parties.

Federated Sampling data so that its distribution become balance in each party.This module support both federated and standalone version

  • Corresponding module name: FederatedSample

  • Data Input: DTable

  • Data Output: the sampled data, supports both random and stratified sampling.

Module for feature scaling and standardization.

  • Corresponding module name: FeatureScale

  • Data Input: DTable, whose values are instances.

  • Data Output: Transformed DTable.

  • Model Output: Transform factors like min/max, mean/std.

With binning input data, calculates each column's iv and woe and transform data according to the binned information.

  • Corresponding module name: HeteroFeatureBinning

  • Data Input: DTable with y in guest and without y in host.

  • Data Output: Transformed DTable.

  • Model Output: iv/woe, split points, event counts, non-event counts etc. of each column.

Transfer a column into one-hot format.

  • Corresponding module name: OneHotEncoder
  • Data Input: Input DTable.
  • Data Output: Transformed DTable with new headers.
  • Model Output: Original header and feature values to new header map.

Provide 5 types of filters. Each filters can select columns according to user config.

  • Corresponding module name: HeteroFeatureSelection
  • Data Input: Input DTable.
  • Model Input: If iv filters used, hetero_binning model is needed.
  • Data Output: Transformed DTable with new headers and filtered data instance.
  • Model Output: Whether left or not for each column.

Combine multiple data tables into one.

  • Corresponding module name: Union
  • Data Input: Input DTable(s).
  • Data Output: one DTable with combined values from input DTables.

Build hetero logistic regression module through multiple parties.

  • Corresponding module name: HeteroLR
  • Data Input: Input DTable.
  • Model Output: Logistic Regression model.

Wrapper that runs sklearn Logistic Regression model with local data.

  • Corresponding module name: LocalBaseline
  • Data Input: Input DTable.
  • Model Output: Logistic Regression.

Build hetero linear regression module through multiple parties.

  • Corresponding module name: HeteroLinR
  • Data Input: Input DTable.
  • Model Output: Linear Regression model.

Build hetero poisson regression module through multiple parties.

  • Corresponding module name: HeteroPoisson
  • Data Input: Input DTable.
  • Model Output: Poisson Regression model.

Build homo logistic regression module through multiple parties.

  • Corresponding module name: HomoLR
  • Data Input: Input DTable.
  • Model Output: Logistic Regression model.

Build homo neural network module through multiple parties.

  • Corresponding module name: HomoNN
  • Data Input: Input DTable.
  • Model Output: Neural Network model.

Build hetero secure boosting module through multiple parties.

Corresponding module name: HeteroSecureBoost

  • Data Input: DTable, values are instances.
  • Model Output: SecureBoost Model, consists of model-meta and model-param

Output the model evaluation metrics for user.

  • Corresponding module name: Evaluation

Calculate hetero correlation of features from different parties.

  • Corresponding module name: HeteroPearson

Build hetero neural network module.

  • Corresponding module name: HeteroNN
  • Data Input: Input DTable.
  • Model Output: hetero neural network model.