-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algorithmic complexity of healpix_rangesearch() is way too high #104
Comments
cc @dstndstn |
This code was written for small Nside (=1 or 2!), so there were never a large number of pixels in our application. Without changing the algorithm, you could try: change ll_contains to ll_sorted_contains this "ll" data structure is a block-list: linked-list of arrays, so the sorted inserts and checks can jump by the array size (256 here) rather than iterating through the whole list. Technically doesn't improve the complexity, but it should improve the speed! The Astrometry.net code does also include the "bt.c" code, which implement a similar API but uses an AVL tree under the hood, so should yield fast inserts and searches. However, it is derived from GNU libavl, so probably won't work for your licensing situation. More broadly: the algorithm here is expanding a frontier, finding healpix neighbours and checking them individually. Obviously, as Nside increases, this is not a good idea. As Nside increases, it becomes more like a regular grid, and you could use a different approach, say finding the start and end pixels within each row inside each base healpixel. |
Oh, I had assumed that this code was supposed to be the equivalent of query_disc()! The proposed change to sorted lists will actually reduce the complexity, as far as I understand: from O(n**2) to O( n log(n)), if n is the number of returned pixels. However it might be worth thinking about going all the way and reach a complexity of O(sqrt(n)). Here is a rough sketch for the NESTED scheme:
|
No, complexity is still O(N**2) because the (block-list versions of) insert_ascending and sorted_contains are still O(N); the top-level structure is a linked list, so can't be binary searched. |
For the RING scheme, you already proposed an efficient way: go through all Healpix rings touching the disk, determine which range of pixels in the ring overlaps with the disk, and append it to the output. These range-based approaches are important for catalog cross-matching etc. In these applications one often needs cone searches at very high resolutions at not-too-small radii, so storing explicit pixel lists becomes prohibitively expensive. |
Sorry, you are of course correct about the complexity staying O(n**2)! |
I just tried to run this simple cone search:
When compiling the C library with "-O2", this computation took 40 seconds on my laptop, which is way off the scale; expected run times should be below a millisecond.
From looking at the sources, I fear that the run time of
healpix_rangesearch
is not only scaling with the number of returned pixels, but actually with its square (this is caused by the calls toll_contains
, which have linear complexity by themselves). This won't do for any application where the cone search returns more than a handful of pixels.The text was updated successfully, but these errors were encountered: