-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why the result of KI slower than the result of SA ? #3
Comments
I didn't specify any parameters or envs of mp_pingpong_all. It use KERNEL_TIME=20 for default, and 1 P4 GPU for each process in my server. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As described in figure 8 of Offloading communication control logic in GPU accelerated applications article, KI model is faster than SA model. But I use libmp benchmark mp_pingpong_all in my ubuntu with P4 gpu and mlx5 nic, I get a result showing KI is almost double latency of SA. So, I wonder if the result of this article is not tested under the benchmark of libmp? If yes, what test samples dose the article use ?
The text was updated successfully, but these errors were encountered: