-
Notifications
You must be signed in to change notification settings - Fork 16
codes dumpi workload
Our primary network workload generator is via the DUMPI tool (http://sst.sandia.gov/about_dumpi.html). DUMPI collects and reads events from MPI applications. See the DUMPI documentation for how to generate traces. There are additionally publically-available traces at http://portal.nersc.gov/project/CAL/designforward.htm
Note on trace reading - the input file prefix to the dumpi workload generator should be everything up to the rank number. E.g., if the dumpi files are of the form "dumpi-YYYY.MM.DD.HH.MM.SS-XXXX.bin", then the input should be "dumpi-YYYY.MM.DD.HH.MM.SS-"
- Download and untar the DUMPI AMG application trace for 216 MPI ranks using the following download link:
wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n216_dumpi.tar.gz
-
Configure model-net config file (For this example config file is available at src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf)
-
Run the DUMPI trace replay simulation on top of model-net using: (/dumpi-2014.03.03.14.55.23- is the prefix of the DUMPI trace file. We skip the last 4 digit prefix of the DUMPI trace files).
./src/network-workloads//model-net-mpi-replay --sync=1
--num_net_traces=216 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014.03.03.15.09.03-
--workload_type="dumpi" --lp-io-dir=amg-216-trace --lp-io-use-suffix=1
-- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf
The simulation runs in ROSS serial, conservative and optimistic modes.
Note: Dragonfly and torus networks may have more number of nodes in the
network than the number network traces (Some network nodes will only forward
packets and they will not end up loading the traces). Thats why
--num_net_traces argument is used to specify exact number of traces
available in the DUMPI directory if there is a mis-match between number of
network nodes and traces.
* Running the simulation in optimistic mode
mpirun -np 4 ./src/network-workloads//model-net-mpi-replay --sync=3
--num_net_traces=216 --workload_type="dumpi" --lp-io-dir=amg_216-trace
--lp-io-use-suffix=1
--workload_file=/projects/radix-io/mubarak/df_traces/directory/dumpi-2014.03.03.15.09.03-
-- src/network-workloads//conf/modelnet-mpi-test-dfly-amg-216.conf
## Replaying Multiple Application Traces on Network Models
* Generate job allocation file (random or contiguous) using python scripts.
Allocation options
- Random allocation assigns a set of randomly selected network nodes to each
job.
- Contiguous allocation assigns a set of contiguous network nodes to the
jobs.
See [TODO] for instructions on how to generate job
allocation file using python. Example allocation files are in
src/network-workloads/conf/allocation-rand.conf, allocation-cont.conf.
* Run the simulation with multiple job allocations
./src/network-workloads//model-net-mpi-replay --sync=1 --workload_conf_file=../src/network-workloads/workloads.conf --alloc_file=../src/network-workloads/conf/allocation-rand.conf --workload_type="dumpi" -- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf