-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use _mm512_popcnt_epi64 to speedup hamming distance evaluation. #4020
base: main
Are you sure you want to change the base?
Use _mm512_popcnt_epi64 to speedup hamming distance evaluation. #4020
Conversation
@mulugetam Sure, |
@mulugetam FYI we are also working with other Intel collaborators (@guangzegu and @xtangxtang) to incorporate AMX to FAISS from #3266 @asadoughi suggested perhaps we should have a meeting and sync up on the collaboration with Intel |
Thanks for your contributions :) Question: What is the performance for Hamming distance calculations when |
VPOPCNTDQ Any downside in using the scheme below? I check if the compiler and the underlying machine support VPOPCNTDQ and then add the -mavx512vpopcntdq flag to the target_compile_options based on that.
|
@mengdilin If I benchmark the AVX-512 build without VPOPCNTDQ and compare it to the AVX2 build, I’d see about a 1% - 5% speedup (depending on the code_size) for AVX-512. |
@mengdilin |
@mulugetam Technically, cmake-based check for vpopcntdq may lead to the following situation. Say, a build is performed on a most recent CPU get, and then the built python package is put into conda / pip repository. This package gets spread across the world and leads to problems for all ppl who has Intel Cascade Lake CPUs. |
@alexanderguzhva Thanks! In addition to VPOPCNTDQ, we plan to make additional PRs to speed up the scalar quantizer with FP16 instructions. I will modify this PR and introduce an 'avx512_advanced' mode. Hopefully, the team will accept the changes, given that it offers a performance boost. |
@mulugetam |
@mulugetam Well, there are two more problems then.
Nevertheless, please write and benchmark the code, and then we'll decide how to integrate it into Faiss properly. :) Thanks |
@alexanderguzhva @mengdilin I have created new PR#4025 that adds 'avx512-sr' architecture mode and marked this PR to be dependent on it. |
Summary: This PR adds a new architecture mode to support the new extensions to AVX512, namely [AVX512-FP16](https://networkbuilders.intel.com/solutionslibrary/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide), which have been available since Intel® Sapphire Rapids. This PR is a prerequisite for [PR#4020](#4020) that speeds up hamming distance evaluations. Pull Request resolved: #4025 Reviewed By: pankajsingh88 Differential Revision: D67524575 Pulled By: mengdilin fbshipit-source-id: f3a09943b062d720b241f95aef2f390923ffd779
@mulugetam the dependent PR was just merged. Do you mind rebasing this PR on top to kick off the new CI? |
Signed-off-by: Mulugeta Mammo <[email protected]>
953112e
to
5a9a98f
Compare
Thank you @mengdilin |
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
#ifndef HAMMING_AVX512_INL_H | ||
#define HAMMING_AVX512_INL_H | ||
|
||
// AVX512 version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here summarizing the discussions of the two PRs:
The _mm512_popcnt_epi64 intrinsic is used to accelerate Hamming distance calculations in HammingComputerDefault and HammingComputer64. This is not available in default FAISS avx512 built mode and is only available on Intel Sapphire Rapids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Signed-off-by: Mulugeta Mammo <[email protected]>
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The
_mm512_popcnt_epi64
intrinsic is used to accelerate Hamming distance calculations inHammingComputerDefault
andHammingComputer64
.Benchmarking with bench_hamming_computer on AWS r7i instance shows a performance improvement of up to 30% compared to AVX-2.
This PR depends on PR#4025