You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, models that do not have built-in scoring cannot be used with dask_ml wrappers, unless custom scoring is specified. For example, using this code, taken mostly from the sklearn example here
import numpy as np
import dask.array as da
from sklearn.cluster import Birch
from dask_ml.wrappers import Incremental
from sklearn.datasets import make_blobs
xx = da.linspace(-22, 22, 10)
yy = da.linspace(-22, 22, 10)
xx, yy = da.meshgrid(xx, yy)
n_centers = da.hstack((da.ravel(xx)[:, np.newaxis], da.ravel(yy)[:, np.newaxis]))
X, y = make_blobs(n_samples=25000, centers=n_centers, random_state=0)
birch = Birch(threshold=1.7, n_clusters=None)
inc = Incremental(birch)
inc.fit(X)
This code currently, raises a TypeError because: If no scoring is specified, the estimator passed should have a 'score' method. The estimator Birch(n_clusters=None, threshold=1.7) does not.
I fixed it within Pull Request #923 by allowing for None when calling check_scoring within the fit method. If there is a reason to not allow a loosening of this check, I am open to other solutions. Other potential solutions I saw at a glance include:
adding a parameter to fit that propagates to the allow_none value of check_scoring - defaulted to False
removing the line, as I do not see scoring being used directly in _fit_for_estimator, and other areas of the code explicitly check for/handle scoring = None.
Hello,
I hope you are all doing well!
Currently, models that do not have built-in scoring cannot be used with dask_ml wrappers, unless custom scoring is specified. For example, using this code, taken mostly from the sklearn example here
This code currently, raises a TypeError because: If no scoring is specified, the estimator passed should have a 'score' method. The estimator Birch(n_clusters=None, threshold=1.7) does not.
I fixed it within Pull Request #923 by allowing for None when calling check_scoring within the fit method. If there is a reason to not allow a loosening of this check, I am open to other solutions. Other potential solutions I saw at a glance include:
Environment
OS - CentOS-7
python - v3.8.2
dask - v2022.04.1
dask-ml - v2022.01.22
sklearn - v1.0.2
Thanks!
Nick
The text was updated successfully, but these errors were encountered: