ImbalancedClassification

Credit Card Fraud Detection - Imbalanced Dataset Classification using Near Miss and SMOTE techniques

https://www.kaggle.com/mlg-ulb/creditcardfraud

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Such datasets will have class imbalance so any classifier if classifies all observations as majority class the prediction accuracy will be obviously greater than 50% and its misleading model which leads to failure in model selection and evaluation.

Here we show how to use two different methods to handle such imbalanced datasets:

Near Miss - technique which balances the dataset to the minority class - Undersampling
SMOTE - technique helps to oversampling of the minority class to creates a dataset having equal distirbuted classes

First method results in loss of information - so the accuracy will be lower and we run a risk of underfitting of the majority class. Second method is efficient due to no loss of information only it takes time in training models and evaluting.

Here we use four types of classifiers to build predictive model:

SVC - Support vector classifier
Logistic Regression
k-NN
Decision Tree

Tuning these models for model parameters using the gridsearch cross validation.

Using ROC on test scores we choose the best model for the given dataset.

We found that Logistic Regression performs well for both undersmapling and SMOTE(Oversampling) technique with accuracy of 95%

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Imbalanced Dataset - Classification.ipynb		Imbalanced Dataset - Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImbalancedClassification

About

Releases

Packages

Languages

Anand-GitH/ImbalancedClassification

Folders and files

Latest commit

History

Repository files navigation

ImbalancedClassification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages