-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
In #387, we have to do a ```C++ int64_t swizzle_period = std::gcd(n_rows / repeated_pattern_size, tile_size_y / n_cols); ``` in order to make our swizzling algorithm work for epilogue. This looks more like an empirical hack whose only goal is to creates a square block. Although it empirically worked, I struggled to find a first-principle explanation for this approach. So I read through my original PR #155 multiple times and think through things carefully. But the more I read and think, the more I feel that the original implementation in #155 does not make sense. The problem is, #155 tries to interleave the entire `ldmatrix_rows / repeated_pattern_size` with an equal size split on tile y dimension. This is overkill, because we just need to evenly distribute rows on different megabanks, and as long as we do so, the number of rows can be arbitrarily large and we can still be bank-conflict free. So we should be swizzling on a `(g, g)` block instead of a (potentially much larger) `(ldmatrix_rows / repeated_pattern_size, ldmatrix_rows / repeated_pattern_size)` block.
- Loading branch information
Showing
1 changed file
with
68 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters