You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Followed the installation instructions and it all worked with no hiccups, but when I ran 'make check' it failed two tests: modelnet-prio-sched-test.sh and modelnet-test-dragonfly-synthetic.sh . Upon running 'cat ./test-suite.log' I got the following result:
credit_size not specified, using default: 8
no credit_delay specified - all credit delays set to 1.42
Within-node eager limit (node_eager_limit) not specified, setting to 16000
../codes/tests/modelnet-test-dragonfly-synthetic.sh: line 3: 17976 Killed src/network-workloads/model-net-synthetic --sync=1 --num_messages=1 -- $srcdir/src/network-workloads/conf/modelnet-synthetic-dragonfly.conf
FAIL tests/modelnet-test-dragonfly-synthetic.sh (exit status: 137)
FAIL: tests/modelnet-prio-sched-test.sh
Bandwidth of compute node channels not specified, setting to 20.000000
Within-node eager limit (node_eager_limit) not specified, setting to 16000
/home/shahm/codes-dev/build-codes/tests/.libs/modelnet-prio-sched-test --sync=1 -- tests/conf/modelnet-prio-sched-test.conf
Thu Aug 6 17:54:43 2020
ROSS Version: v7.2.0
tw_net_start: Found world size to be 1
NIC num injection port not specified, setting to 1
NIC seq delay not specified, setting to 10.000000
NIC num copy queues not specified, setting to 1
within node transfer per byte delay is 0.050000
ROSS Core Configuration:
Total PEs 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 4
Simulation End Time 31536000000000000.00
LP-to-PE Mapping model defined
ROSS Event Memory Allocation:
Model events 1025
Network events 16
Total events 1040
*** START SEQUENTIAL SIMULATION ***
Set num_servers per router 1, servers per injection queue per router 1, servers per node copy queue per node 1, num nics 1
*** END SIMULATION ***
: Running Time = 0.0002 seconds
TW Library Statistics:
Total Events Processed 511
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %
Total Roll Backs 0
Primary Roll Backs 0
Secondary Roll Backs 0
Fossil Collect Attempts 0
Total GVT Computations 0
Net Events Processed 511
Event Rate (events/sec) 2823204.4
Total Events Scheduled Past End Time 0
TW Data Structure sizes in bytes (sizeof):
PE struct 624
KP struct 144
LP struct 136
LP Model struct 96
LP RNGs 80
Total LP 312
Event struct 152
Event struct with Model 552
TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Initialization 0.7451
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000
GVT 0.0000
Fossil Collect 0.0000
Primary Rollbacks 0.0000
Network Read 0.0000
Other Network 0.0000
Instrumentation (computation) 0.0000
Instrumentation (write) 0.0000
Total Time (Note: Using Running Time above for Speedup) 0.0005
TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16
Forced GVT 0
Total GVT Computations 0
Total All Reduce Calls 0
Average Reduction / GVT -nan
mpirun has detected an attempt to run as root.
Running at root is strongly discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.
You can override this protection by adding the --allow-run-as-root
option to your cmd line. However, we reiterate our strong advice
against doing so - please do so at your own risk.
Interesting. Are you, by any chance, using Docker or some other container system?
-The weird behavior regarding a status 137 error in that specific dragonfly test has been noted in the past when someone was using containers (#198). Since it didn't seem to affect regular usage of CODES, it was put on the backburner at the time due to some tight deadlines on my end followed by the rest of 2020's events!
-Just some cursory googling about the mpirun-as-root warning seems to imply that this also happens with docker containers and openmpi. Also noted in #198, adding a user appuser to the container would avoid the usage of mpirun by the root.
I'll spend some time this weekend to see about making a "building CODES with Docker" workflow. In the mean time, I'd suggest ignoring these failed tests. Let me know if other errors pop up during your usage of CODES.
I’m trying to have ROSS CI testing run CODES tests and just found a hang on modelnet-test-dragonfly-synthetic.sh. It’s most likely related since the Travis CI tests are running a container. Is there a way that I skip certain tests with the make check command?
Followed the installation instructions and it all worked with no hiccups, but when I ran 'make check' it failed two tests: modelnet-prio-sched-test.sh and modelnet-test-dragonfly-synthetic.sh . Upon running 'cat ./test-suite.log' I got the following result:
=================================
codes 1.2: ./test-suite.log
TOTAL: 22
PASS: 20
SKIP: 0
XFAIL: 0
FAIL: 2
XPASS: 0
ERROR: 0
.. contents:: :depth: 2
FAIL: tests/modelnet-test-dragonfly-synthetic.sh
credit_size not specified, using default: 8
no credit_delay specified - all credit delays set to 1.42
Within-node eager limit (node_eager_limit) not specified, setting to 16000
../codes/tests/modelnet-test-dragonfly-synthetic.sh: line 3: 17976 Killed src/network-workloads/model-net-synthetic --sync=1 --num_messages=1 -- $srcdir/src/network-workloads/conf/modelnet-synthetic-dragonfly.conf
FAIL tests/modelnet-test-dragonfly-synthetic.sh (exit status: 137)
FAIL: tests/modelnet-prio-sched-test.sh
Bandwidth of compute node channels not specified, setting to 20.000000
Within-node eager limit (node_eager_limit) not specified, setting to 16000
/home/shahm/codes-dev/build-codes/tests/.libs/modelnet-prio-sched-test --sync=1 -- tests/conf/modelnet-prio-sched-test.conf
Thu Aug 6 17:54:43 2020
ROSS Version: v7.2.0
tw_net_start: Found world size to be 1
NIC num injection port not specified, setting to 1
NIC seq delay not specified, setting to 10.000000
NIC num copy queues not specified, setting to 1
within node transfer per byte delay is 0.050000
ROSS Core Configuration:
Total PEs 1
Total KPs [Nodes (1) x KPs (16)] 16
Total LPs 4
Simulation End Time 31536000000000000.00
LP-to-PE Mapping model defined
ROSS Event Memory Allocation:
Model events 1025
Network events 16
Total events 1040
*** START SEQUENTIAL SIMULATION ***
Set num_servers per router 1, servers per injection queue per router 1, servers per node copy queue per node 1, num nics 1
*** END SIMULATION ***
TW Library Statistics:
Total Events Processed 511
Events Aborted (part of RBs) 0
Events Rolled Back 0
Event Ties Detected in PE Queues 0
Efficiency 100.00 %
Total Remote (shared mem) Events Processed 0
Percent Remote Events 0.00 %
Total Remote (network) Events Processed 0
Percent Remote Events 0.00 %
TW Memory Statistics:
Events Allocated 1041
Memory Allocated 618
Memory Wasted 454
TW Data Structure sizes in bytes (sizeof):
PE struct 624
KP struct 144
LP struct 136
LP Model struct 96
LP RNGs 80
Total LP 312
Event struct 152
Event struct with Model 552
TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
Initialization 0.7451
Priority Queue (enq/deq) 0.0000
AVL Tree (insert/delete) 0.0000
LZ4 (de)compression 0.0000
Buddy system 0.0000
Event Processing 0.0000
Event Cancel 0.0000
Event Abort 0.0000
TW GVT Statistics: MPI AllReduce
GVT Interval 16
GVT Real Time Interval (cycles) 0
GVT Real Time Interval (sec) 0.00000000
Batch Size 16
mpirun has detected an attempt to run as root.
Running at root is strongly discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.
You can override this protection by adding the --allow-run-as-root
option to your cmd line. However, we reiterate our strong advice
against doing so - please do so at your own risk.
FAIL tests/modelnet-prio-sched-test.sh (exit status: 1)
Any help would be much appreciated!
The text was updated successfully, but these errors were encountered: