-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Echo test is very slow at 60ms for 99% between 2 powerful VM: #54
Comments
If you can post AeronStat output from both nodes and run with Also make sure that you test a UDP connection between the nodes (not ping), i.e. run something like |
Hi Vyazelenko, I don't iperf3, I will create simple Java UDP/TCP socket echo apps and provide benchmark. Here are aeron stats: CLIENT aeron stat:
Server:
|
@Gipxy From the provided AeronStats I can see that there is packet loss (i.e. NAK sent/received are not zero). Try running your test with lower message rates: 1K, 10K, 100K. And see if the loss disappears and the latency numbers improve. You haven't shared the configuration output when running with |
Yeah, but i don't Admin permission on that server. So can't install that: |
Server log:
Client log:
|
Changed to 1K, no msg lost, but similar result:
|
FYI, I'm comparing messaging libs, I did simple java code for GRPC and Http Rest API as well between these 2 servers:
Sending fx rate object about 50bytes, the can the result is alot better than Aeron Transpot |
@Gipxy Have you seen this warning:
It means that the socket buffers could not be configured as desired (i.e. were not set to 2MB). You need to fix your OS configuration if you want to run with benchmark defaults. Otherwise, configure smaller socket buffer sizes and receiver window for your channels:
|
@vyazelenko applied to server and client, but not working, still got warning on both client and server. Result are similar |
On a properly configured system Aeron can achieve several orders of magnitude higher throughput and lower latency than gRCP. Here are some figures from our own testing with 288 byte message at 100K msgs/sec:
|
@Gipxy You are seeing the warning, because MediaDriver settings are still referring to 2MB (see low-latency-driver.properties). That should not affect the results though, i.e. the settings from the channel URIs have precedence. |
@Gipxy If you want some professional help with you benchmarks please contact [email protected]. |
Yeah, just found that as well, changed also, but still same result of avg: 23ms :( |
@Gipxy You must ensure that the system have enough CPU cores to run those benchmarks. In the default config media driver will run in You can try running with |
You are right, i didn't notice that there is existing test run which is not exit yet! Killed all of them, run again, before run CPU of both machine is 99%, when running about 45-50% (both of them have 8 cores) and new result a lot better:
|
Tried with 10k msg, or change to 100 iters or change message length to "32", all having similar result of 3ms, so for 99% comparison of sending about 32 bytes for all of these 3:
Still slower than GRPC, any quick thing we can do to help improve this @vyazelenko ? |
@Gipxy your Aeron results are 20 to 150 times worse than in our tests (see #54 (comment)). This is most likely caused by the OS configuration and/or machine setup (i.e. number of CPU cores, memory, networking capabilities etc.). For instance, you have mentioned that both VMs have 8 cores each but are those real CPU cores or virtual CPU cores? In the latter case you won't have enough CPU cores to run media driver in
How were the above results obtained, i.e. did you run Aeron benchmarks or is it all custom code? Do you know that we have gRPC benchmarks included (see https://github.com/real-logic/benchmarks/tree/master/scripts/grpc)? If you want to compare Aeron and gRCP please use the built-in benchmarks. |
|
|
|
@Gipxy Yes, you really do not have enough CPU cores. Let me clarify, benchmarks configure Because your current setup is oversubscribed OS will force frequent context switches between threads to get access to those 4 CPU cores. Reducing the number of active benchmarks threads from 4 to 2 will help as there will be less busy spinning threads. What you can do in addition is to use different idle strategies. For a full list of supported strategies see Configuration#agentIdleStrategy. |
Tried with difference idle strategy:
If change to SHARED mode, then result is not good almost double! |
Mean so far the best option of Aeron is about 30% better than GRPC only on these 2 VMs of 8 cores. |
@Gipxy In order to change the threading mode of the media driver you need to set the |
@Gipxy please answer the following:
|
Hi @vyazelenko:
|
Just re-run both of them, here are running log and aeron stat: |
Hi, I run the ECHO benchmark between 2 VM and the result is not good 10-60ms, can you advice:
12034.047 0.400000000000 44011 1.67
15859.711 0.500000000000 55006 2.00
17891.327 0.550000000000 60524
....
53084.159 0.978125000000 107599 45.71
55345.151 0.981250000000 107942 53.33
...
64258.047 0.989062500000 108797 91.43
66486.271 0.990625000000 108969 106.67
...
#[Mean = 18838.771, StdDeviation = 14477.590]
#[Max = 95092.735, Total count = 110000]
#[Buckets = 32, SubBuckets = 2048]
Detail env:
#-- server
JVM_OPTS="
-Duk.co.real_logic.benchmarks.remote.output.directory=./server-output
-Duk.co.real_logic.benchmarks.aeron.remote.embedded.media.driver=true
-Duk.co.real_logic.benchmarks.aeron.remote.source.channel=aeron:udp?endpoint=10.52.19.65:13000
-Duk.co.real_logic.benchmarks.aeron.remote.destination.channel=aeron:udp?endpoint=10.52.19.5:13001" aeron/echo-server
--- client
JVM_OPTS="
-Duk.co.real_logic.benchmarks.aeron.remote.embedded.media.driver=true
-Duk.co.real_logic.benchmarks.aeron.remote.source.channel=aeron:udp?endpoint=10.52.19.65:13000
-Duk.co.real_logic.benchmarks.aeron.remote.destination.channel=aeron:udp?endpoint=10.52.19.5:13001"
./benchmark-runner --output-file "aeron-echo-test" --messages "100K" --message-length "288" --iterations 10 "aeron/echo-client"
Do take notes, i must change number of message from 501k to 10k as it will prompt msg lost with 501k
Thanks
The text was updated successfully, but these errors were encountered: