-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate plotting functionality from kde estimation for better interoperability with visualization libraries #37
Comments
thanks for this comment! Do you have specific thoughts on how the interface might change to facilitate this? Also, I'll note that I happily welcome pull requests! |
Hmm lets see. I'm less familiar with your specific KDE implementation, but i think the last example can be easily adapted to act as a general numerical tool. So adapting this example PDF = fastkde.pdf(x, y, var_names = ['x', 'y']) would just require either adding a simple keyword argument pdf_meshgrid = fastkde.pdf(x, y, data_only=True) or perhaps a separate function: pdf_meshgrid = fastkde.pdf_array(x,y,**kwargs) All that is required is that PDF return a scalar mesh field, and potentially the coordinate data if its not the same as the input coordinates. If the coordinates are rectilinear, they can simply be the 1-D arrays for each axis. If they are scattered, then perhaps an intermediate function to regrid the data? pdf_meshgrid,xx,yy = fastkde.pdf_array(x,y,return_coords=[??]**kwargs) For reference, see the pcolormesh and contour docs. The calls require the same input data, X, Y, and Z. Does this seem easy to do? |
Hi @k-a-mendoza , thanks for clarifying. That is in fact what fastKDE does, so fortunately the change I need to make is one of documentation rather than interface/code. Part of the confusion might originate in a new feature that I introduced in v2.0 that hasn't received much feedback. If the Expanding on this, here is some sample code to extract the arrays you are referring to if # if xarray is installed
PDF = fastkde.pdf(x, y, var_names = ['x', 'y'])
pdf_meshgrid = PDF.values
# get the axis arrays
xx = PDF.x
yy = PDF.y If # if xarray is not installed
pdf_meshgrid, (xx, yy) = fastkde.pdf(x, y)
# if xarray is in fact installed, the above behavior can be forced with the use_xarray=False option
# pdf_meshgrid, (xx, yy) = fastkde.pdf(x, y, use_xarray=False) Does this address your original comment? |
You know what, after playing with the package a bit, I don't think this suggestion applies anymore. the returned xarray is already useful enough and expressive enough for the user to make their own plots. However, there are a few things that I think would be appropriate on a readthedocs or perhaps a follow up publication. As far as I can tell, there are two other popular KDE implementations in the python ecosystem:
One thing that isn't clear from fastkde's readme file is what numerical options are available. I'm not sure if the FFT process under the hood allows for different bandwidths or kernels, but if its easy to implement I think that would be quite useful. Seaborn also has an option to specify contour levels. From the documentation: """
levels : nt or vector
Number of contour levels or values to draw contours at. A vector argument must have increasing values in [0, 1].
Levels correspond to iso-proportions of the density: e.g., 20% of the probability mass will lie below the
contour drawn for 0.2. Only relevant with bivariate data.
""" Playing around with assinging manual contours from the returned xarray object on some geoscience data, I'm finding that there are edge values in the pdf near zero that are quite large, although the majority of points lie in the 0 - 0.1 range. So perhaps more documentation on how to normalize this to iso-proportions or options to enforce zero amplitude at zero frequencies may be nice. Actually, now that I think of it, and option to return the FFT transform and perform frequency data augmentation or other signal processing tricks could really help; this would have to be paired with an option to injest the transformed FFT back into data coordinates. |
These are all good comments. For the most part, I think most of your comments seem to point toward the need for better documentation, which is a known deficiency in this package. (It's one of the downsides of producing code as a scientist: it's taken me a long time to get fastKDE to an even remotely professional point, and there's still a lot more to go.) I'd reemphasize that I welcome pull requests large and small if you see easy ways to improve things! I'll go through your comments point-by-point below to indicate why I think they're mainly related to poor documentation.
Yes, the documentation is unclear on this. There's only one available in fastKDE, and it's the Bernacchia and Pigolloti algorithm, which specifies both the bandwidth and kernel shape based on the data. Bandwidth isn't a controllable parameter in this algorithm.
This is a challenge that I'm not yet certain how to handle. There are certain distribution types that are challenging for fastKDE: heavy-tailed distributions (these are hard for any KDE scheme) and bounded data (causes spectral ringing). I haven't yet figured out a way to give good, general guidance on how to deal with challenging distributions. I'd love thoughts here.
This is actually possible, but it's not a documented feature! The object-oriented interface to fastKDE ( So thanks again for all these comments. And it looks like you're doing some cool research: maybe we'll cross paths at AGU this year! Cheers |
Ahh man. I would love to go to AGU. Unfortunately with graduating and employment sorta up in the air, im not sure if that will be possible. we'll see.
Hm. I wonder how easy it would be to transform the data into a normal-like distribution, do the KDE, then inverse the coordinates when plotting? If it works, then maybe that's worthy of a simple readthedocs blurb rather than a reworking of the internals. The data I'm working with is a little strange in that it is both somewhat linear and very much poissonian (two geophysical images with wildly varying resolution as a function of depth). As for the bandwidth, maybe its possible to "trick" the FFT into having a specific kind of bandwidth through frequency data augmentation. I'll think on it more. I'd be happy to contribute to a readthedocs when i finish my defense. I got a number of projects that I want to publish with similar readthedocs formatting, and it would be good to get some experience in that domain. |
That's what the If you can send some sample data my way, I might be able to help determine a way to get fastKDE to work with it. (The part of me that's trying to procrastinate on grading is offering that.)
That should definitely be possible. The
If you get the chance, I'd definitely appreciate it. Until then, best of luck on finishing up your dissertation and determining what comes next! |
It would be nice to not have matplotlib so tightly integrated into the plot.py functionality so that we can exersize tighter control over the output graphs. Doing this would also result in faster adoption by alternate plotting libraries like plotly, matplotlib, bokeh, pyvista, etc;.
The text was updated successfully, but these errors were encountered: