-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang JIT CPU Backend #1239
Comments
Certainly interesting, but do note that we have a limited number of combinations in tensor contractions so this is more of a solution to a finding that compile-time constant sizes are a huge benefit and that we can't pare down that combinatorial space to do ahead-of-time specialization. A different use might be to use JIT to build single-precision versions of select kernels. |
Right, I'd expect that if we enumerated a bunch of kernels ahead of time across combos of WRT performance I just mean that my gut expects the performance of such a backend to be between AVX and LIBXSMM, but without the need for a user to build LIBXSMM so we might get a little better performance in our upcoming Ratel + Enzyme container. I agree that single-precision kernels would be an interesting avenue to explore too so its easier to get mixed precision capabilities. |
It's a low-effort test to see if specializing one particular size has much benefit. Like just drop in some integer literals and run a benchmark using matching sizes. If it's a lot faster, we can see if specializing all the values is important or, say, just one matters. If it's about the same, we don't need to pursue the idea (at least until we learn more). |
That's a good point. Its a easy test to check if someone finds time. I don't see this as a particular priority - 50% of why I created this issue was so we don't lose track of this as an option. |
Clang 16 now supports JIT. An interesting small project could be to create a
/cpu/self/clang-jit
backend that provides JITed tensor contraction kernels. If we see performance that is in the neighborhood of AVX or libXSMM, this could be a way to ship a faster CPU backend with fewer dependencies.See Serac for reference:
https://github.com/LLNL/serac/blob/prototype/adjoints_with_internal_variables/tests/jit/basic_jit.cpp
https://github.com/LLNL/serac/blob/prototype/adjoints_with_internal_variables/include/JIT.hpp
(This repo comes from a member of Jamie's Smith team)
The text was updated successfully, but these errors were encountered: