This package is an implementation of the adaptive partitioning mutual information algorithm [1]. This project is packaged for use in Python using Pybind11 and is written in standard C++.
The adaptive partitioning algorithm is a single parameter estimator of the mutual information between two continuous variables. This package is untested on discrete (categorical data).
The package is installed via pip
in the traditional way.
pip install aminfo
First, you should have two continuous variables X
and Y
that you seek to find the mutual information from. These variables should be NaN
free.
from aminfo import adaptive_mutual_information
X, Y = generate_some_data(...)
MI = adaptive_mutual_information(X, Y)
The function accepts a third positional argument that represents the only parameter used in this estimator, the critical Chi-squared value. This value is set to 8.0 by default.
MI = adaptive_mutual_information(X, Y, 8.0)
The adaptive partitioning algorithm is an improvement on the "histogram" method of calculating mutual information. In this method, the bin sizes are driven by the information latent in distribution. The algorithm works by recursively partitioning the distribution in a fashion very similar to quadtree traversal. Each "child" is tested for distribution uniformity through a Chi-squared test:
The child cell can split into grandchildren if the statistic exceeds some critical value:
Typical values to try for chi
are in the range of 4.0-8.0 for most distributions. A higher chi
will make it harder to split each child and find more relationship between the variables, thereby having a tendency to underestimate the true MI. A lower chi
will make it easier to split each child cell and may find more information than is really there, causing a tendency to overestimate the true MI.
I cannot speak to the "general" accuracy of the Adaptive Partitioning algorithm. I wrote this implementation in order to replace the use of sklearn
's estimator mutual_information_regression
which is based on scipy
's KNN algorithm and is quite slow. Regardless, we can test the accuracy of the Adaptive partitioning esimator against a known analytical mutual information. For two standard normal gaussians that have some correlation
We can plot the difference between the estimator and the actual value for various correlations and number of datapoints,
As expected, the estimator's accuracy falls apart for low values of
I am a bit of a novice at C++, so this code could enjoy some further optimizations. In the following chart, I plot the performance speedup over sklearn
's mutual_information_regression
which is based on a KNN estimator of the joint density.
The baseline performance increase is around 10x for small datasets, and increases as high as 20x for datasets near 100k. This benchmark is performed for joint gaussian with correlation 0.9, and n_neighbors=3
for the mutual_info_regression
estimator. It should also be noted that this joint gaussain (0.9 correlation) represents a lower bound on the speed of this algorithm. The recursion depth in the algorithm depends on the relationship between the variables, so at lower relationships (correlation 0.0-0.3), the speed will be much faster.
Should time or necessity call for it, I may implement both a Parzen window approach to the joint density estimation, and the Copula entropy (no parameter estimator) algorithms for continuous mutual information. However, I will likely only do this if I find some issue during my work with the Adaptive Partitioning method.
MIT License
- [1] Darbellay, Estimation of the information by an adaptive partitioning of the observation space
- [2] Assessing and Improving Classification and Prediction, Timothy Masters
- [3] Some inspiration from this implementation: minfo