-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Support for metric='precomputed' #5
Comments
There are some quirks about sparse precomputed distance matrices that can
make things a little tricky for corner-cases. I'll see if I can get
something done though. I can't promise any timeframes.
…On Sun, May 7, 2023 at 11:27 AM Richard Hakim ***@***.***> wrote:
I'm currently using vanilla HDBSCAN to cluster a precomputed sparse
distance matrix being input as a scipy.sparse.csr_matrix object. I'm very
eager to use fast_hdbscan due primarily to it's easier compilation
requirements as I'm attempting to ship out a tool that uses hdbscan as a
step in a pipeline.
Currently, I believe clustering on precomputed sparse distance matrices is
not supported in fast_hdbscan. I think it would require the porting of some
of the following functions:
- hdbscan_._hdbscan_sparse_distance_matrix
- _hdbscan_reachability.sparse_mutual_reachability
- _hdbscan_linkage.label
Unfortunately, I don't think I'm able to figure out how to implement this
one myself. Though, I'm happy to help out in testing any PRs with basic
implementations.
Thank you for great package and I really hope I'll be able to use it soon!
—
Reply to this email directly, view it on GitHub
<#5>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUBIKM75A6XXZS63TVKDXE65PFANCNFSM6AAAAAAXY6LCLQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you so much for looking into this. I am very motivated to help if you think it's possible to delegate anything. For what it's worth, this is how hdbscan is being used in the project I'm working on: https://github.com/RichieHakim/ROICaT/blob/dev/roicat/tracking/clustering.py#L420 Perhaps bringing up the tricks/hacks that are being used to get desired behavior would be of interest. 1) I'm using a very custom sparse distance matrix as input. 2) Since the graph has multiple disjointed components, I need to add a fully connected node before clustering. 3) Since there are sample pairs that are known to be disconnected a priori, clusters containing these pairs ('pair violations') are split up by walking down the cutting distance until the pair violations are gone. Playing with the Thanks again, I'm a big fan of all your projects. |
@lmcinnes I will look into existing semi-supervised methods for vanilla HDBSCAN, and I will look into approaches to recover / convert to embedding vectors from sparse distance matrices so that we can try fast_hdbscan. If there is a way to achieve both in one library, we are very interested. Please let me know if either would benefit from further conversation or resources. Thanks again for these amazing resources. |
I'm currently using vanilla HDBSCAN to cluster a precomputed sparse distance matrix being input as a
scipy.sparse.csr_matrix
object. I'm very eager to use fast_hdbscan due primarily to it's easier compilation requirements as I'm attempting to ship out a tool that uses hdbscan as a step in a pipeline.Currently, I believe clustering on precomputed sparse distance matrices is not supported in fast_hdbscan. I think it would require the porting of some of the following functions:
hdbscan_._hdbscan_sparse_distance_matrix
_hdbscan_reachability.sparse_mutual_reachability
_hdbscan_linkage.label
Unfortunately, I don't think I'm able to figure out how to implement this one myself. Though, I'm happy to help out in testing any PRs with basic implementations.
Thank you for great package and I really hope I'll be able to use it soon!
The text was updated successfully, but these errors were encountered: