title | tags | authors | bibliography | affiliations | date | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ronswanson: Building Table Models for 3ML |
|
|
paper.bib |
|
13 October 2022 |
ronswanson
provides a simple-to-use framework for building so-called table or
template models for astromodels
[@astromodels] the modeling package for
multi-messenger astrophysical data-analysis framework, 3ML
[@threeml]. With
astromodels
and 3ML
one can build the interpolation table of a physical model
result of an expensive computer simulation. This then enables efficient
reevaluation of the model while, for example, fitting it to a
dataset. While 3ML
and astromodels
provide factories for building table
models, the construction of pipelines for models that must be run on
high-performance computing (HPC) systems can be cumbersome. ronswanson
removes
this complexity with a simple, reproducible templating system. Users can easily
prototype their pipeline on multi-core workstations and then switch to a
multi-node HPC system. ronswanson
automatically generates the required
Python
and SLURM
scripts to scale the execution of 3ML
with astromodel
's
table models on an HPC system.
Spatio-spectral fitting of astrophysical data typically requires iterative
evaluation of a complex physical model obtained from a computationally expensive
simulation. In these situations, the evaluation of the likelihood is computationally intractable
even on HPC systems. To circumvent this issue, one can create a so-called
template or table model by evaluating the simulation on a grid of
parameter values, then interpolating this output.
Several spectral fitting packages, including XSPEC
[@xspec], 3ML
,
and gammapy
[@gammapy;@acero], implement frameworks that allow for the reading these
template models in various file formats. However, none of these libraries provide
tools for uniformly generating the data from which these
templates are built. ronswanson
builds table models for astromodels
, the
modeling language of the multi-messenger data analysis framework 3ML
in an
attempt to solve this problem. astromodels
stores its table models as
HDF5
[@hdf5] files. While astromodels
provides a set of user-friendly factories for
constructing table models, the workflow for using these factories on desktop
workstations or HPC systems can be complex. However, these workflows are easily
abstracted to a templating system that can be user-friendly and reproducible.
Once the user selects a simulation from which they would like to create a table
model, the first task is to create an interface class that tells ronswanson
how to run the simulation and collect its output. This is achieved by inheriting
a class from the package called Simulation
and defining its virtual run
member function. With this function, the user specifies how the model parameters
for each point in the simulation grid are fed to the simulation software. The
outputs from the simulation are passed to a dictionary with a
key for each different output. Finally, this
dictionary is returned from the run
function. This is all the programming that
is required as ronswanson
uses this subclass to run the simulations on the
user's specified architecture.
With the interface to the simulation defined, the user must specify the grid of
parameters on which to compute the output of the simulation. This is achieved by
specifying the grid points of each parameter in a YAML file. Parameter grids can
either be custom, or specified with ranges and a specific number of evaluation
points. Additionally, the energy grid corresponding to the evaluation of each of
the simulation outputs must be specified in this file. The final step is to
create a YAML configuration file telling ronswanson
how to create the
table. This includes specifying the name of the output HDF5 database, where to
find the simulation subclass created in the first step, the name of the
parameter YAML file, and details on the compute architecture on which the
simulation grid is to be run.
With these two configuration files defined, the user runs the command line
program simulation_build
on the main configuration file. This automatically
generates all the required Python
and SLURM
scripts required for the
construction of the table model. If running on a workstation, the user then
executes the run_simulation.py
script. If, instead, the simulation is run on
an HPC cluster, the user runs sbatch run_simulation.sh
. In the case of running
on an HPC system, the final step to build the database requires running sbatch gather_results.sh
which uses MPI
[@mpi] to gather the individual pieces of the
simulations into the main database.
The created HDF5
database can be loaded with utilities in ronswanson
to then
construct a table model in the astromodels
format (see here for
details). This
intermediate step allows the user to select subsets of the parameters from which
to construct the table model. This is useful as large interpolation tables can
consume a lot of computer memory, and it is possible that certain fits may only
need a limited parameter range. Additionally, utilities are provided that allow
adding parameter sets onto the primary database to extend the interpolation
range. Moreover, the database stores information such as the runtime of each
grid point of the simulation. Utilities are provided to view this
metadata. With future interfacing of 3ML
and gammapy
, these table models could
even be used to fit data from optical to very high energy gamma-rays. More
details and examples can be found in the
documentation.
This project was inspired by earlier works of Elisa Schoesser and Francesco Berlato.