Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always place vectorized domains innermost #2473

Closed
wants to merge 1 commit into from

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Feb 16, 2023

Closes #2464

All tests are green.

Checking benchmark performances.

@naoyam
Copy link
Collaborator Author

naoyam commented Feb 16, 2023

Didn't expect, but resulted in large-degree performance loss:

image

One extreme case is:

NvFuserScheduler_Softmax_Outer_fp16___GRAPH/NvFuserScheduler_Softmax_Outer_fp16/8/33554432/manual_time 1.323790e+12 4.683680e+11


It was around 1.3 TB/s and is just 468 GB/s with this PR. It seems it's due to increased register usage as vectorized domains are located inside CA positions. And this particular case resulted in using 68 registers, whereas previously used 58 registers. While the difference is not significant, it triggered to cross over an occupancy cliff, resulting in half of the original occupancy.

It's also interesting that most of the perf loss happen only with softmax benchmarks, so that probably has something to do with the structure of persistent reductions.

Since this is a non trivial heuristic decision, and it seems mostly negative as is, I'm going to close the PR for now.

@naoyam naoyam closed this Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move vectorized domains innermost always
1 participant