Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal bandwidth dependent on dimensionality? #17

Open
redhog opened this issue Feb 13, 2023 · 2 comments
Open

Optimal bandwidth dependent on dimensionality? #17

redhog opened this issue Feb 13, 2023 · 2 comments
Labels
help wanted Extra attention is needed research needed Needs research to determine how to handle

Comments

@redhog
Copy link

redhog commented Feb 13, 2023

It seems the output is highly dependent on the dimensionality. That is, if you run a KDE using only the first 2 dimensions, vs the first 3, averaging over the last dimension, you get very different density distributions (but same general patterns):

image

@taobrienlbl taobrienlbl added help wanted Extra attention is needed research needed Needs research to determine how to handle labels Nov 5, 2023
@taobrienlbl
Copy link
Collaborator

Hi @redhog - apologies for taking so long to get back to you. Thanks for raising this. The main difference between the two plots looks to be the amount of high-frequency variability in the 3D version in comparison to the 2D version.

Intuitively, this behavior makes sense to me for complicated PDFs. When doing the KDE, the spectral representation of the emprical characteristic function ends up being filtered based on contiguous regions that are above a data-dependent threshold; the filter essentially retains low-frequency variability in the ECF. Regions that aren't contiguous in 2D might end up being contiguous in 3D, meaning it's possible for high-frequency regions of the ECF to appear in the 3D KDE that aren't present in the 2D version.

This specification of the filter is somewhat arbitrary as long as it follows some mathematical guidelines defined by Bernacchia and Pigolotti (2011); how best to specify the filter is an open research question. The issue you raise here makes me wonder whether there might be a way to specify the filter such that results are consistent between high- and low-dimensional versions of the PDF.

That said, this is beyond the scope of a simple bug fix, as I think there might be a paper to write on this topic. And I'm not even sure whether it is in fact a bug or a feature.

I'm going to leave this open for now in hopes that someone might be able to provide some insight.

@redhog
Copy link
Author

redhog commented Nov 6, 2023

Heyas! Thanks for taking the time to look into this, even if it took such a long time @taobrienlbl . I don't actually don't remember what exactly I used this for or any other details, but I think this is an interesting issue in general since it might surprise users just like it did me. If nothing else, your explanation is a good start for a note in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed research needed Needs research to determine how to handle
Projects
None yet
Development

No branches or pull requests

2 participants