Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove broken and unused code path from L2 normalization #166

Merged
merged 2 commits into from
Aug 14, 2024

Conversation

griwodz
Copy link
Member

@griwodz griwodz commented Aug 13, 2024

Description

Remove an optional and apparently unused code path that relies on normf() to perform L2 normalization of the descriptors.

First, that code is buggy. Second, that code would be slower than the more "old-fashioned" code.

Features list

  • normf code along with CMake variable PopSift_USE_NORMF removed.
  • The PR has 2 commits. The first commit fixes the bug but leaves the code in PopSift. Interested people could dig it up and use it. The second commit removes it entirely.

Implementation remarks

There was a bug in src/popsift/s_desc_norm_l2.h in an ifdef'd code path where the L2 norm incorrectly is computed.

It was actually a double bug. First, normf(array) is computed, which is sqrt(sum(array[n]²)) instead of rnormf(array), which would be 1/sqrt(sum(array[n]²)). After that, the min value is multiplied with it instead of the descriptor value. Really stupid bug that would be easily fixed.

However, nobody ever reported the bug because the code wasn't active by default. normf() is only used in PopSift if it is manually selected in CMake, and if the CC is >=7.5.

After learning more and more about CUDA, we know also that the code path would have removed parallelism. It was written under the belief the normf() uses several CUDA warps in the background, forming the equivalent of the reduce() operation that is performed with shuffle_down in the main code path. However, this is not the case.

@griwodz griwodz linked an issue Aug 13, 2024 that may be closed by this pull request
@griwodz griwodz self-assigned this Aug 13, 2024
@griwodz griwodz added prio:minor cuda issues related to cuda versions bugfix labels Aug 13, 2024
@griwodz griwodz force-pushed the dev/fixL2Normalization branch from bd4d957 to 94a2c64 Compare August 13, 2024 13:17
@griwodz griwodz changed the title Dev/fix l2 normalization [WIP] Remove broken and unused code path from L2 normalization Aug 13, 2024
@griwodz griwodz requested a review from simogasp August 14, 2024 11:29
@griwodz griwodz changed the title [WIP] Remove broken and unused code path from L2 normalization Remove broken and unused code path from L2 normalization Aug 14, 2024
@griwodz griwodz requested a review from fabiencastan August 14, 2024 11:41
Copy link
Member

@simogasp simogasp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment

Comment on lines 53 to 57
float norm;

if( threadIdx.x == 0 ) {
norm = normf( 128, src_desc );
}
__syncthreads();
norm = popsift::shuffle( norm, 0 );

descr.x = min( descr.x, 0.2f*norm );
descr.y = min( descr.y, 0.2f*norm );
descr.z = min( descr.z, 0.2f*norm );
descr.w = min( descr.w, 0.2f*norm );

// 32 threads compute 4 squares each, then shuffle to performing a addition by
// reduction for the sum of 128 squares, result in thread 0
norm = descr.x * descr.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float norm = descr.x * descr.x ...

and remove the previous declaration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I can do it because the first assignment to norm is inside the if (line 61). Thread 0 is initialized inside the if (lines 60-62), and the shuffle in line 64 initializes the other 31 threads.

The thing that actually happens underneath the C-like syntax is that the result of normf() is stored in the lowest 32 bits of a 1024-bit SIMD register (line 61). Shuffle is a single SIMD instruction that copies the lowest 32 bits of the SIMD register into every other set of 32 bits of the same register.

Technically, I could write
float norm = ( threadIdx.x == 0 ) ? normf( 128, src_desc ) : 0;
That looks nicer, but if would actually waste time.

@griwodz griwodz merged commit 8623b69 into develop Aug 14, 2024
6 checks passed
@griwodz griwodz deleted the dev/fixL2Normalization branch August 14, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix cuda issues related to cuda versions prio:minor ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] incorrect L2 normalization if CUDA normf() is in use
2 participants