I recently read this article: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/, and I was inspired. First I implemented the same algorithm in pure Python:
Lines 1 to 15 in 81fa805
And then I compared it against sortedcontainers
's implementation of bisect_left across a large range of array sizes:
Pretty handily beats it! However, most people using Python would probably be using bisect_left
from the bisect
library in the stdlib.
To try to beat that implementation, though, I will have to descend into C-land. Here's my implementation as a Python C-extension:
branchless_bisect/submodule/main.c
Lines 14 to 36 in 81fa805
That beats it as well! Admittedly this is only for arrays of size up to 2**29
, but still pretty cool.
Now, you might be asking, how does it perform on non-powers-of-two? I made a graph using the following parameters:
sizes = [i for i in range(0, 2**15)]
I also checked whether it successfully compiled with a CMOVE
instruction:
branchless_bisect/submodule/objdump_output.txt
Lines 67 to 71 in 81fa805
It does! You can see the full compiled dump in objdump_output.txt
.
Now, here are all of them combined (what you will get if you run main.py
):
This repo contains the submodule and benchmarking code I used to obtain these results.