New feature: association rules mining using scikit-learn pipelines #1105

josejub · 2024-09-21T09:32:44Z

Describe the workflow you want to enable

I've developed an addition to mlextend. It makes the process of encoding transactions, mining rules rules and filtering extracted rules seamlessly using sklearn pipelines. As an input you would have your clean pd dataset, it would pass through each step and would produce filtered rules by specified values.

Describe your proposed solution

I've developed a new TransactionEncoder class, which first discretizes numerical variables and then encodes numerical values using intervals and categorical using its discrete values.
Another class, RuleExtractor, encapsulates frequent itemset and rule extraction. It takes a onehot-encoded Dataframe and produces rules for desired support and metric values.
As a last step, there are two classes: one for filtering extracted classes by items in consequent or antecedent and another for filtering rules based on it's metric's values.

As a independent module, there is a class used for negative transaction generation. It takes a onehot-encoded trasaction dataframe and for each specified column, it generates a new column for the negated variables.
Usually association rules describe relations between items A -> B (presence of A is associated with presence of B). Including negated items, we can study group of rules that involve negated items, such as:

Rules where A->B and ¬A ->¬B have high confidence would be strong rules, as presence of items is associated and absence is also associated.

All mentioned classes conform to sklearn fit_transform standard, so they can seamlessly be integrated into sklearn pipelines with other algorythms.

Describe alternatives you've considered, if relevant

Additional context

josejub added the New Feature label Sep 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature: association rules mining using scikit-learn pipelines #1105

New feature: association rules mining using scikit-learn pipelines #1105

josejub commented Sep 21, 2024

New feature: association rules mining using scikit-learn pipelines #1105

New feature: association rules mining using scikit-learn pipelines #1105

Comments

josejub commented Sep 21, 2024

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context