Skip to content

Beating the `bisect` module's implementation using C-extensions.

Notifications You must be signed in to change notification settings

juliusgeo/branchless_bisect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beating Bisect With Branchless Binary Search

I recently read this article: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/, and I was inspired. First I implemented the same algorithm in pure Python:

def branchless_bisect_left(arr, value):
begin, end=0,len(arr)
length = end - begin
if length == 0:
return end
step = 1 << (length.bit_length() - 1)
if step != length and arr[step]<value:
length -= step + 1
if length == 0:
return end
begin = end - step
step >>= 1
for s in (step:=step >> 1 for _ in range(step.bit_length())):
begin += s*(arr[s+begin]<value)
return begin+int(arr[begin]<value)

And then I compared it against sortedcontainers's implementation of bisect_left across a large range of array sizes:

image

Pretty handily beats it! However, most people using Python would probably be using bisect_left from the bisect library in the stdlib. To try to beat that implementation, though, I will have to descend into C-land. Here's my implementation as a Python C-extension:

static PyObject* bisect(PyObject *list_obj, PyObject *value, PyObject *compare){
long size = PyList_Size(list_obj);
long begin = 0, end = size;
long length = end - begin;
if (length == 0) {
return PyLong_FromLong(end);
}
long step = bit_floor_shifted(length);
if (step != length && PyObject_RichCompareBool(PyList_GetItem(list_obj, begin + step - 1), value, compare)) {
begin += step;
length -= step;
}
long next = 0;
for (step >>= 1; step != 0; step >>=1) {
if ((next = begin + step) < size) {
begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next), value, compare) * step;
}
}

image

That beats it as well! Admittedly this is only for arrays of size up to 2**29, but still pretty cool.

Now, you might be asking, how does it perform on non-powers-of-two? I made a graph using the following parameters:

sizes = [i for i in range(0, 2**15)]

image

I also checked whether it successfully compiled with a CMOVE instruction:

bf: e8 00 00 00 00 call c4 <_bl_bisect_left+0xa4>
c4: 83 f8 01 cmp $0x1,%eax
c7: 4c 0f 44 fb cmove %rbx,%r15
cb: 48 83 fb 02 cmp $0x2,%rbx
cf: 73 35 jae 106 <_bl_bisect_left+0xe6>

It does! You can see the full compiled dump in objdump_output.txt. Now, here are all of them combined (what you will get if you run main.py):

image

This repo contains the submodule and benchmarking code I used to obtain these results.

About

Beating the `bisect` module's implementation using C-extensions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published