-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistent behavior when using custom grid #15
Comments
Thanks for raising the Issue @blasern . I agree that something is wrong here. I hope to look into this bug very soon, this weekend or the next. I will report my findings here and patch it up with a regression test. |
I believe I've found the problem. Notice first that the following code produces incorrect results# imports
import numpy as np
import matplotlib.pyplot as plt
import KDEpy
# Create bimodal 2D data
data = np.array([[-1, 1], [0, 1], [1, 1], [-1, -1], [0, -1], [1, -1]])
# Create 2D grid
grid_x = np.linspace(-2, 2, 2 ** 5)
grid_y = np.linspace(-2, 2, 2 ** 4)
grid = np.stack(np.meshgrid(grid_x, grid_y), -1).reshape(-1, 2)
# density estimates
y1 = KDEpy.FFTKDE(bw=0.2).fit(data).evaluate(grid)
y2 = KDEpy.TreeKDE(bw=0.2).fit(data).evaluate(grid)
y3 = KDEpy.NaiveKDE(bw=0.2).fit(data).evaluate(grid)
# plots
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes[0, 0].set_title("Data")
axes[0, 0].scatter(data[:, 0], data[:, 1], marker="x", color="red")
axes[0, 1].set_title("FFTKDE")
axes[0, 1].scatter(grid[:, 0], grid[:, 1], c=y1)
axes[0, 1].scatter(data[:, 0], data[:, 1], marker="x", color="red")
axes[1, 0].set_title("TreeKDE")
axes[1, 0].scatter(grid[:, 0], grid[:, 1], c=y2)
axes[1, 0].scatter(data[:, 0], data[:, 1], marker="x", color="red")
axes[1, 1].set_title("Naive")
axes[1, 1].scatter(grid[:, 0], grid[:, 1], c=y3)
axes[1, 1].scatter(data[:, 0], data[:, 1], marker="x", color="red")
print(np.mean(np.abs(y2 - y3))) # 2.3899e-05
print(np.mean(np.abs(y1 - y3))) # 0.0945866
print(np.mean(np.abs(y1 - y2))) # 0.0945963 Changing grid creation slightly produces another errorgrid = np.stack(np.meshgrid(grid_y, grid_x), -1).reshape(-1, 2) Everything works with the following gridgrid = np.stack(np.meshgrid(grid_y, grid_x), -1).reshape(-1, 2)
grid[:, [0, 1]] = grid[:, [1, 0]] # Swap indices ExplanationThe last code works because the grid values are sorted in order of dimensionality. I.e. it's sorted by
It's required that the grid has this structure to achieve fast linear binning. If I remember correctly: the algorithm must know where to place a data point in the grid in constant time, so it must be able to compute the index immediately. This requires the above grid structure. The (somewhat cryptic) lines of code which perform this is found here: Line 241 in 9e15675
SolutionWe should verify that user-defined grids obey the above structure, i.e. that they are sorted by I propose to add such a check to user-specified grids. I will also perform some more testing. Thoughts on this @blasern ? |
I still find it slightly inconvenient that I cannot choose the grid I want. As I've said before, I may not even want to use an axis-aligned grid. On the other hand, as long as this requirement is documented and you check the input grids (perhaps with an option to turn this check off, for speed) I think that's fine. |
Great. 👍 I agree that it's inconvenient to have to use an axis-aligned grid, the discussion about non-axis aligned (yet equally spaced) fast algorithms could continue at Issue #6 . Arbitrary grids are possible using the other algorithms. Thanks again for your valuable input. I'll leave this issue open until I've patched the software. |
FFTKDE
seems to change the grid internally when providing a custom grid in 2d. Results do not seem to match with the results fromNaiveKDE
orTreeKDE
. See below for an exampleThe text was updated successfully, but these errors were encountered: