Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: association rules mining using scikit-learn pipelines #1105

Open
josejub opened this issue Sep 21, 2024 · 0 comments
Open

New feature: association rules mining using scikit-learn pipelines #1105

josejub opened this issue Sep 21, 2024 · 0 comments

Comments

@josejub
Copy link

josejub commented Sep 21, 2024

Describe the workflow you want to enable

I've developed an addition to mlextend. It makes the process of encoding transactions, mining rules rules and filtering extracted rules seamlessly using sklearn pipelines. As an input you would have your clean pd dataset, it would pass through each step and would produce filtered rules by specified values.

Describe your proposed solution

I've developed a new TransactionEncoder class, which first discretizes numerical variables and then encodes numerical values using intervals and categorical using its discrete values.
Another class, RuleExtractor, encapsulates frequent itemset and rule extraction. It takes a onehot-encoded Dataframe and produces rules for desired support and metric values.
As a last step, there are two classes: one for filtering extracted classes by items in consequent or antecedent and another for filtering rules based on it's metric's values.

As a independent module, there is a class used for negative transaction generation. It takes a onehot-encoded trasaction dataframe and for each specified column, it generates a new column for the negated variables.
Usually association rules describe relations between items A -> B (presence of A is associated with presence of B). Including negated items, we can study group of rules that involve negated items, such as:

  • Rules where A->B and ¬A ->¬B have high confidence would be strong rules, as presence of items is associated and absence is also associated.

All mentioned classes conform to sklearn fit_transform standard, so they can seamlessly be integrated into sklearn pipelines with other algorythms.

Describe alternatives you've considered, if relevant

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant