-
Notifications
You must be signed in to change notification settings - Fork 156
A VMA Basic Usage
This guide gather best practice steps for VMA usage - It's recommended to review the Installation guide and user manual from here: http://www.mellanox.com/page/software_vma?mtag=vma
These are the most updated document and provide all the necessary information on how to use VMA.
The measurements were taken on two HP HPE ProLiant DL360 Gen9 CPU E5-2697 v3 @ 2.60GHz (Max turbo 3.6Ghz) servers with CentOS 7.2 x86_64. Two Mellanox ConnectX-4 connected back to back. Ethernet speed was configured to 10Gbe.
- It's very important to use a tuned machine and the correct NUMA & cores (more on that in VMA Tuning Guide)
- Two Machines – one for the server role and second as a client
- Management interfaces configured with an IP and machines can ping each other
- Physical installation of Mellanox Card in your machines
- Verify by "lspici | grep Mellanox" that your system recognized Mellanox HCA
Example:
$ lspci |grep Mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
- VMA Version: 8.2.10
- OFED version: MLNX_OFED_LINUX-4.0-2.0.0.1
- SockPerf version: 2.8-0
- Using NUMA 1 and Cores 13,19
- Refer to the VMA installation guide in here: http://www.mellanox.com/page/software_vma?mtag=vma
-
First machine (Server side):
$ numactl --cpunodebind=1 taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 –tcp
Server side example output:
sockperf: == version #2.8-0.git3dd5971d7d7a ==
sockperf: [SERVER] listen on:
[0] IP = 11.4.3.1 3 PORT = 19140 # TCP
sockperf: Warmup stage (sending a few dummy messages)... -
Second machine run (Client side):
$ numactl --cpunodebind=1 taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 –tcp
Client side example output:
sockperf: == version #2.8-0.git3dd5971d7d7a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 11.4.3.3 PORT = 19140 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.100 sec; SentMessages=469124; ReceivedMessages=469123
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=4.000 sec; SentMessages=457948; ReceivedMessages=457948
sockperf: ====> avg-lat= 4.349 (std-dev=0.336)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 4.349 usec
sockperf: Total 457948 observations; each percentile contains 4579.48 observations
sockperf: ---> <MAX> observation = 42.027
sockperf: ---> percentile 99.999 = 6.944
sockperf: ---> percentile 99.990 = 6.493
sockperf: ---> percentile 99.900 = 5.543
sockperf: ---> percentile 99.000 = 5.027
sockperf: ---> percentile 90.000 = 4.753
sockperf: ---> percentile 75.000 = 4.604
sockperf: ---> percentile 50.000 = 4.401
sockperf: ---> percentile 25.000 = 4.041
sockperf: ---> <MIN> observation = 3.576
VMA performance has been checked by running sockperf and using the VMA_SPEC=latency environment variable
-
First machine (Server side):
$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
-
Second machine run (Client side):
$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 –tcp
Client side example output (trimmed):
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec Latency [VMA_SPEC]
VMA INFO: ---------------------------------------------------------------------------
.
.
.
sockperf: == version #2.8-0.git3dd5971d7d7a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 11.4.3.3 PORT = 19140 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.100 sec; SentMessages=1492229; ReceivedMessages=1492228
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=4.000 sec; SentMessages=1455879; ReceivedMessages=1455879
sockperf: ====> avg-lat= 1.359 (std-dev=0.031)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 1.359 usec
sockperf: Total 1455879 observations; each percentile contains 14558.79 observations
sockperf: ---> <MAX> observation = 6.271
sockperf: ---> percentile 99.999 = 2.085
sockperf: ---> percentile 99.990 = 1.569
sockperf: ---> percentile 99.900 = 1.463
sockperf: ---> percentile 99.000 = 1.428
sockperf: ---> percentile 90.000 = 1.396
sockperf: ---> percentile 75.000 = 1.378
sockperf: ---> percentile 50.000 = 1.359
sockperf: ---> percentile 25.000 = 1.338
sockperf: ---> <MIN> observation = 1.253
Note that some additional VMA and Sockperf headers on both client and server were trimmed
VMA is showing ~300% performance improvement comparing to kernel
Average latency:
- Using Kernel 3.576 usec
- Using VMA 1.253 usec
Now VMA is working, it’s important to implement any server manufacturer and Linux distribution tuning recommendations for lowest latency.
Few examples:
-
HP server guide: http://h10032.www1.hp.com/ctg/Manual/c01804533.pdf
-
VMA Performance Tuning Guide in Mellanox community: https://community.mellanox.com/docs/DOC-2797
-
Understanding BIOS Configuration for Performance Tuning in Mellanox community: https://community.mellanox.com/docs/DOC-2488