Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Accelerate string matching #159

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Andersama
Copy link
Contributor

@Andersama Andersama commented Dec 28, 2020

Optimizes string matching by allowing memcmp like functionality (even on utf8 sequences)

comparison: https://compiler-explorer.com/z/Tz3KhG

Rough idea is that ordinarily per character we're doing two comparisons, one to check iterators and one to validate the character and then a branch. Presumably we can improve on this by reducing comparisons and jumps if we know enough about the iterator to work out the remaining characters left, then some trickery to do comparisons in as branchless manner as possible. Vaguely implements idea off of #147, just for string<> at the moment.

Appears clang and msvc pick up on this change and optimize with sse, presumably gcc might as well.

The fold expression responsible for validating the match can definitely be played around with.
We could count the characters matched, count the differences, or like here accumulate bitwise differences in a register.

	return { current + bump, (count >= sizeof...(String)) && (((current[Idx] ^ static_cast<std::remove_cvref_t<char_type>>(String)) | ... | static_cast<char_type>(0)) == 0) };

@Andersama Andersama force-pushed the match_utf8_sequence branch 2 times, most recently from 09e933a to d6e093b Compare December 28, 2020 17:20
@Andersama Andersama force-pushed the match_utf8_sequence branch 10 times, most recently from f9cdaa8 to 9c9d99f Compare December 29, 2020 08:50
@Andersama
Copy link
Contributor Author

Not bench-marked things, but been playing around since the tests passed. It seems like combining == and & operators generates the same assembly that clang uses for std::memcmp.

@Andersama
Copy link
Contributor Author

Andersama commented Dec 31, 2020

I'm getting pretty significant improvements to matching / searching from this. When using == and & operators for example against the pattern 0123456789ABCDEF turns into a 16 byte comparison sse instruction.
https://compiler-explorer.com/z/1bzKKd

match_str(char*, char*):                       # @match_str(char*, char*)
        mov     rax, rsi
        sub     rax, rdi
        cmp     rax, 16
        jb      .LBB0_1
        movdqu  xmm0, xmmword ptr [rdi]
        pcmpeqb xmm0, xmmword ptr [rip + .LCPI0_0]
        pmovmskb        eax, xmm0
        cmp     ax, -1
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        ret
.LBB0_1:
        xor     eax, eax
        ret

vs...

        cmp     rdi, rsi
        je      .LBB0_33
        lea     rcx, [rdi + 1]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi], 48
        jne     .LBB0_32
        lea     rcx, [rdi + 2]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 1], 49
        jne     .LBB0_32
        lea     rcx, [rdi + 3]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 2], 50
        jne     .LBB0_32
        lea     rcx, [rdi + 4]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 3], 51
        jne     .LBB0_32
        
        #... (this for a quite a while)
        
        cmp     byte ptr [rdi + 15], 70
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        jmp     .LBB0_32
.LBB0_33:
        xor     eax, eax
.LBB0_32:
        ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant