-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bandwidth selection for 2d data #81
Comments
Hi Jeff, This is not implemented, since it's not obvious (to me) how to generalize bandwidth to several dimensions for all kernels in an elegant way. In higher dimensions, the bandwidth is a matrix and not a number. That being said, you could scale the data and use from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np
np.random.seed(1)
# 1D example
data = np.random.randn(100, 1)
# Compute the BW, scale, then scale back
bw = silvermans_rule(data[:, [0]])
data_scaled = data / np.array([bw])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate()
x = x_scaled * np.array([bw])
y = y_scaled / (bw)
# Use Silverman directly without scaling
x2, y2 = KDEpy.FFTKDE(bw="silverman").fit(data).evaluate()
plt.plot(x, y, label="scaling_method"); plt.plot(x2, y2, label="direct method");plt.legend(); If you are comfortable with a diagonal bandwidth matrix (scaling along each axis independently), you could generalize this to 2D like so: from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np
np.random.seed(1)
# 2D example
data = np.random.randn(100, 2)
# Compute the BW, scale, then scale back
bw1 = silvermans_rule(data[:, [0]])
bw2 = silvermans_rule(data[:, [1]])
data_scaled = data / np.array([bw1, bw2])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate((32, 32))
x = x_scaled * np.array([bw1, bw2])
y = y_scaled / (bw1 * bw2)
# Verify that the integral is 1
dx = (x[-1, 0] - x[0, 0] ) / (32 - 1) # 32 grid points by default in each direction
dy = (x[-1, 1] - x[0, 1] ) / (32 - 1)
np.sum(y * dx * dy) # 0.9999712583635577 If you want a full bandwidth matrix with off diagonal entries not equal to zero, you could scale and rotate the input data (using principal components), fit the I appreciate thoughts and questions on this issue. Basically multidimensional kernels either have the bandwidth matrix H equal to either (1) identity * h, (2) diagonal H (as seen above), or (3) non-diagonal H (PCA). Maybe I could implemented it in an elegant way, but I am not sure. |
Hi Tommy, This is a nice trick! I know it would be hard to generalize to multiple dimensions. I don't even fully understand the mathematics of bandwidth in 1D, let alone higher dimension. I am working on geographical data, where x and y are projected to equal distance coordinate reference system. So I guess a diagonal bandwidth matrix should be fine, most certainly better than me assigning an arbitrary bandwidth. By the way, improved_sheather_jones is the same as KDEpy.FFTKDE(bw="ISJ"), right? I tried your code with ISJ instead for 1d and the plot looks identical. Thanks again for this highly efficient KDE package! Best, |
Yea, it should do the trick for practical purposes at least. I hope to implement some better support, but I don't want to implement too much magic either. At least tricks like these force the user the think explicitly about what they want. 👍 Yup, |
In KDEpy |
@Kaiyangshi-Ito Not implemented. See Line 152 in 73d421e
|
@tommyod This workaround is so elegant!! thank you so much!! |
Hi,
How can I apply automatic bandwidth selection for 2d data? I tried 'silverman', 'scott', 'ISJ' and they all say "rule is only available for 1D data".
I also think about cross-validation but I don't know what and how to compute some error function for grid search. Any suggestion?
Best,
Jeff
The text was updated successfully, but these errors were encountered: