Skip to content

Commit

Permalink
docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
György Kovács committed Oct 2, 2023
1 parent 2bb0548 commit 1465967
Showing 1 changed file with 28 additions and 27 deletions.
55 changes: 28 additions & 27 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,36 +9,36 @@ Oversampling can be carried out by importing any oversampler from the ``smote_va
.. code-block:: Python
import smote_variants as sv
oversampler= sv.SMOTE_ENN()
# supposing that X and y contain some the feature and target data of some dataset
X_samp, y_samp= oversampler.sample(X, y)
Using the ``datasets`` package of ``sklearn`` to import some data:

.. code-block:: Python
import smote_variants as sv
import sklearn.datasets as datasets
dataset= datasets.load_breast_cancer()
oversampler= sv.KernelADASYN()
X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target'])
Using the imbalanced datasets available in the ``imbalanced_datasets`` package:

.. code-block:: Python
import smote_variants as sv
import imbalanced_datasets as imbd
dataset= imbd.load_iris0()
oversamplers= sv.SMOTE_OUT()
X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target'])
Oversampling with random, reasonable parameters
Expand All @@ -52,13 +52,13 @@ In order to facilitate model selection, each oversampler class is able to genera
import smote_variants as sv
import imbalanced datasets as imbd
dataset= imbd.load_yeast1()
par_combs= SMOTE_Cosine.parameter_combinations()
oversampler= SMOTE_Cosine(**np.random.choice(par_combs))
X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target'])
Multiclass oversampling
Expand All @@ -70,11 +70,12 @@ Multiclass oversampling is highly ambiguous task, as balancing various classes m
import smote_variants as sv
import sklearn.datasets as datasets
dataset= datasets.load_wine()
oversampler= sv.MulticlassOversampling(sv.distance_SMOTE)
oversampler= sv.MulticlassOversampling(oversampler='distance_SMOTE',
oversampler_params={})
X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target'])
Model selection
Expand All @@ -83,24 +84,24 @@ Model selection
When facing an imbalanced dataset, model selection is crucial to find the right oversampling approach and the right classifier. It is obvious that the best performing oversampling technique depends on the subsequent classification, thus, the model selection of oversampler and classifier needs to be carried out hand in hand. This is facilitated by the ``model_selection`` function of the package. One must specify a set of oversamplers and a set of classifiers, a score function (in this case 'AUC') to optimize in cross validation and the ``model_selection`` function does all the job:

.. code-block:: Python
import smote_variants as sv
import imbalanced_datasets as imbd
datasets = [imbd.load_glass2]
oversamplers = sv.get_all_oversamplers(n_quickest=5)
oversamplers = sv.generate_parameter_combinations(oversamplers,
n_max_comb=5)
classifiers = [('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 3}),
('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 5}),
('sklearn.tree', 'DecisionTreeClassifier', {})]
sampler, classifier= model_selection(datasets=datasets,
oversamplers=oversamplers,
classifiers=classifiers)
The function call returns the best performing oversampling object and the corresponding, best performing classifier object, respecting the 'glass2' dataset.

Thorough evaluation involving multiple datasets
===============================================

Expand All @@ -110,18 +111,18 @@ Another scenario is the comparison and evaluation of a new oversampler to conven
import smote_variants as sv
import imbalanced_datasets as imbd
datasets= [imbd.load_glass2, imbd.load_ecoli4]
oversamplers = sv.get_all_oversamplers(n_quickest=5)
oversamplers = sv.generate_parameter_combinations(oversamplers,
n_max_comb=5)
classifiers = [('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 3}),
('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 5}),
('sklearn.tree', 'DecisionTreeClassifier', {})]
results= evaluate_oversamplers(datasets=datasets,
oversamplers=oversamplers,
classifiers=classifiers,
Expand Down

0 comments on commit 1465967

Please sign in to comment.