Skip to content

codes dumpi workload

Misbah Mubarak edited this page Dec 13, 2016 · 24 revisions

Replaying Application Traces on Network Models

Our primary network workload generator is via the DUMPI tool (http://sst.sandia.gov/about_dumpi.html). DUMPI collects and reads events from MPI applications. See the DUMPI documentation for how to generate traces. There are additionally publically-available traces at http://portal.nersc.gov/project/CAL/designforward.htm

Note on trace reading - the input file prefix to the dumpi workload generator should be everything up to the rank number. E.g., if the dumpi files are of the form "dumpi-YYYY.MM.DD.HH.MM.SS-XXXX.bin", then the input should be "dumpi-YYYY.MM.DD.HH.MM.SS-"

Replaying Application Trace on CODES MPI Simulation Layer

  • Download and untar the DUMPI AMG application trace for 216 MPI ranks using the following download link:

wget http://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n216_dumpi.tar.gz

  • Configure model-net config file (For this example config file is available at src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf)

  • Run the DUMPI trace replay simulation on top of model-net using: (/dumpi-2014.03.03.14.55.23- is the prefix of the DUMPI trace file. We skip the last 4 digit prefix of the DUMPI trace files).

./src/network-workloads//model-net-mpi-replay --sync=1 --num_net_traces=216 --workload_file=/path/to/dumpi/trace/directory/dumpi-2014.03.03.15.09.03-
--workload_type="dumpi" --lp-io-dir=amg-216-trace --lp-io-use-suffix=1 -- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf


  The simulation runs in ROSS serial, conservative and optimistic modes.

   Note: Dragonfly and torus networks may have more number of nodes in the
   network than the number network traces (Some network nodes will only forward
   packets and they will not end up loading the traces). Thats why
   --num_net_traces argument is used to specify exact number of traces
   available in the DUMPI directory if there is a mis-match between number of
   network nodes and traces.

* Running the simulation in optimistic mode 
    
mpirun -np 4 ./src/network-workloads//model-net-mpi-replay --sync=3
--num_net_traces=216 --workload_type="dumpi" --lp-io-dir=amg_216-trace
--lp-io-use-suffix=1
--workload_file=/home/df_traces/directory/dumpi-2014.03.03.15.09.03- 
-- src/network-workloads//conf/modelnet-mpi-test-dfly-amg-216.conf 

## Replaying Multiple Application Traces on Network Models
* Generate job allocation file (random or contiguous) using python scripts.

Allocation options 
1. Random allocation assigns a set of randomly selected network nodes to each job. 
1. Contiguous allocation assigns a set of contiguous network nodes to the jobs. 

See [TODO] for instructions on how to generate job
allocation file using python. Example allocation files are in
src/network-workloads/conf/allocation-rand.conf, allocation-cont.conf.

* Run the simulation with multiple job allocations

./src/network-workloads//model-net-mpi-replay --sync=1 --workload_conf_file=../src/network-workloads/workloads.conf --alloc_file=../src/network-workloads/conf/allocation-rand.conf --workload_type="dumpi" -- ../src/network-workloads/conf/modelnet-mpi-test-dfly-amg-216.conf

Clone this wiki locally