Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance to clusters #22

Open
firmai opened this issue Jul 16, 2024 · 4 comments
Open

Distance to clusters #22

firmai opened this issue Jul 16, 2024 · 4 comments

Comments

@firmai
Copy link

firmai commented Jul 16, 2024

Anyway to obtain this directly from the library, this would extremely valuable piece of info for me. Given the speed of the algo it could be a nice addition.

@hamelin
Copy link

hamelin commented Jul 19, 2024

One key question that your suggestion leaves unclear is which the distance? :-) A cluster is a cloud of points, and your question does not make it clear whether you would like to know the distance between a certain point and the various clusters, or the distance between the clusters. In both cases there is yet more to ask.

  1. Distance between point $P$ and clusters: do you mean the distance to the centroid (vector mean) of the clusters? To the medoid (vector median)? To the nearest or furthest point belonging to the cluster?
  2. Distance between clusters: do you mean the distance between single representative points (again, centroid, medoid, and so on), or a collective similarity such as Wasserstein distance?

So of these use cases are easily covered by some simple Numpy/Scikit-Learn-fu, others are very tricky. Let us know what would help you best.

@firmai
Copy link
Author

firmai commented Jul 22, 2024

What would be great is some functionality to get access to the vectors that would allow me to calculate (1) and (2). My specific use case is looking at belonging, so distance of a single point to all cluster centroids (median, mean etc.). I am interested in tracking movements from one cluster to another. I wonder if the implementation would allow for that?

Specifically looking at stocks, and realising the cluster label filps every so often, and it would be good to track that transition as a continuous value.

@firmai
Copy link
Author

firmai commented Jul 26, 2024

Let me know what you think @hamelin ?

@hamelin
Copy link

hamelin commented Jul 26, 2024

Sorry I've been slow to respond, I'm on vacation. Let me get back to you after I checked a few things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants