-
Notifications
You must be signed in to change notification settings - Fork 9
Exercise A: The Event Loop
An event loop reads independent neutrino interactions and reduces them to some simplified format. Every data preservation analysis starts with one or more event loops over so-called AnaTuples. There are actually other event loops that were used to produce these AnaTuples, but they have already been run by the production team.
Our macro stage event loop will reduce AnaTuples to the ingredients we need for a cross section. We're going to learn to run and edit a simple 1-stage macro stage event loop. Analysis that produce multi-dimensional cross sections or have other unique computing needs might split their event loops up into more stages by memory requirements or run time. At the end of this exercise, you'll be have a .root file with cross section ingredients from the runEventLoop
program you updated.
Alex's talk yesterday explained what a cross section is, why they're important to measure, and one common procedure for extracting a cross section from our data using a Monte Carlo simulation. How do we turn his formula into a program for reducing AnaTuples to the histograms we'll need to extract a cross section?
Since i and j are true and reco bin indices, then each symbol in Alex's figure is a histogram. The efficiency * acceptance correction will turn out to be the ratio of two different histograms. We can sort these histograms along two axes:
- reco/true variables on their axes
- which cuts are applied
The DATA histogram will also be unique because it comes from the data sample. All other cross section ingredients in this tutorial will be derived from a Monte Carlo simulation of the MINERvA experiment.
- The data can only be measured and selected in reco variables
- The backgrounds are subtracted from the data, so they must also be binned in reco variables and pass the reco selection.
- The efficiency numerator characterizes signal events that pass the reco selection. It is applied to the data after the migration matrix, so it must be calculated in true variables.
- The efficiency denominator counts all events that pass the signal definition, even if they fail the reco selection. The AnaTool that produced the MasterAnaDev AnaTuples we'll be using already threw out some signal events, so the efficiency denominator has its own larger tuple of events that we could have detected.
- The migration matrix converts reco variables to true variables, so it is a 2D histogram with one axis of each type. We're going to apply it before the efficiency * acceptance correction.
- The flux is simulated and constrained independently from our analysis. It is provided in true variables.
runEventLoop
calculates the cross section ingredients in one pass over the data and MC samples. The flux our detector receives changed throughout data taking, so we split our data sample, and our MC sample to match, into flux periods called playlists. We're going to be analyzing the minervame1A playlist throughout this tutorial. runEventLoop
is designed to process only 1 playlist at a time, and we have to tell it which files to process on the command line like this:
runEventLoop data.txt mc.txt
If you hit the TAB key after typing runEventLoop
, the shell will list suitable file lists. This is a feature of the bash shell called auto-completion. If you run runEventLoop
with no arguments, it will tell you about its input and output. This is typical behavior for programs in UNIX-based operating systems. If you read its "help text" closely, you'll notice that runEventLoop
also looks for some environment variables.
The event loop itself is split into loops over 3 chains of TTree
s for data, Monte Carlo, and the so-called "Truth Tree" that's used for the efficiency denominator. The event loop is really a loop over systematic Universes from the MINERvA Analysis Toolkit. Each Universe triggers a separate analysis with an assumption about our detector or our reconstruction changed. The event selections made are controlled by a PlotUtils::Cutter
, and the physics model is reweighted to MnvTunev1 with a PlotUtils::Model
. Histograms are mapped to physics quantities by PlotUtils::Variable
s. The main()
function sets all of them up and delegates the event loops to 1 function for each chain. All of these things are more or less ready for you.
Check out the "exerciseA" branch:
#from opt/build...
cd ../../MINERvA-101-Cross-Section
git checkout Exercise-A
cd ../opt/build
make install #Compile the modified code to start this tutorial
You need to install the histograms we'll need to extract a cross section. The histograms need to be created for each physics observable, so make them member variables in Variable.h
. You'll need to do these things for each histogram:
- Initialize it in
InitializeMCHists()
orInitializeDataHists()
-
Write()
is inWriteMC()
orWriteData()
so it can be plotted in another program - Call
SyncCVHistos()
inSyncCVHistos()
so that systematics work correctly -
Fill()
it inrunEventLoop.cxx
in a function likeLoopAndFillData()
after a Universe passes the necessary Cuts.
There are a few example histograms that are not cross section ingredients to get you started.
Once you have the cross section ingredients installed, you're ready to use runEventLoop
. In a new shell:
- Set up ROOT. This might be automatic on your personal laptop. On the GPVMs,
source /path/to/opt/bin/setupROOT6OnGPVMs.sh
. -
source /path/to/opt/bin/setup.sh
#Do this every time you open a new terminal or log into a GPVM to work on this project - Create a working directory that is not in the source code directory or the build area. On my laptop, I put this in
Documents/MINERvA101
. On the GPVMs, I use/minerva/data/users/$USER/MINERvA101
. - Create a "playlist" file that will tell
runEventLoop
where to find anaTuple files. On the GPVMs, this is already done for you. On your laptop, plug in your MINERvA101 USB drive:- On MacOS, new USB drives end up in Volumes. Run
find Volumes/ -name "MAD_MINERvA101_2021" -type d
. That should print a directory name I'll call DIR. Now, run:find DIR/mc -name "*.root" > MAD_USB_MC.txt
andfind DIR/data -name "*.root" > MAD_USB_Data.txt
- On Linux, new USB drives end up in
/media/$USER/<a bunch of numbers>
. Look for a new directory here that I'll call DIR. Then run:find DIR/mc -name "*.root" > MAD_USB_MC.txt
andfind DIR/data -name "*.root" > MAD_USB_Data.txt
- On the GPVMs, just use
/minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_MC_andrewsGPVM.txt
- On MacOS, new USB drives end up in Volumes. Run
-
runEventLoop --help
to get a summary of how it works. - Test your event loop with shorter file lists and systematics off:
- Turn off systematics with
export MNV101_SKIP_SYST=1
- Generate truncated playlists with
tail -n 5 MAD_USB_MC.txt > shortMC.txt
andtail -n 5 MAD_USB_Data.txt > shortData.txt
On the GPVMs, instead dotail -n 5 /minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_MC_andrewsGPVM.txt > shortMC.txt
andtail -n 5 /minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_DATA_andrewsGPVM.txt > shortData.txt
- Turn off systematics with
- Now
runEventLoop shortData.txt shortMC.txt
. Do you get a message that says "Success" at the end? If not, ask for help from the instructor.
Now, let's look at some of the histograms you produced to make sure they're not empty. They're in .root files, so we're going to open them interactively using ROOT's c++ interpreter:
root -l runEventLoopMC.root
-
.ls #Lists ROOT objects like histograms in the current TDirectory
TFile** runEventLoopMC.root TFile* runEventLoopMC.root KEY: PlotUtils::MnvH1D pTmu_background_Wrong_Sign;1 Wrong_Sign KEY: PlotUtils::MnvH1D pTmu_background_NC;1 NC KEY: PlotUtils::MnvH1D pTmu_background_Other;1 Other KEY: PlotUtils::MnvH1D pTmu_efficiency_numerator;1 pTmu KEY: PlotUtils::MnvH1D pTmu_efficiency_denominator;1 pTmu KEY: PlotUtils::MnvH2D pTmu_migration;1 pTmu KEY: PlotUtils::MnvH1D pTmu_selected_signal_reco;1 pTmu KEY: PlotUtils::MnvH1D pTmu_data;1 pTmu KEY: TParameter<double> POTUsed;1 KEY: PlotUtils::MnvH1D pTmu_reweightedflux_integrated;1 pTmu KEY: TParameter<double> pTmu_fiducial_nucleons;1
pTmu_efficiency_numerator->SetLineWidth(3) #Make histogram line easier to see
pTmu_efficiency_numerator->Draw("HIST")
You should see something like this:
The "main" branch combines the code for the solutions to all exercises. Compare its runEventLoop.cpp
and utils/Variable.h
to your own with git diff
. Compare the histograms in runEventLoopMC.root
and runEventLoopData.root
to example data and example MC.
runEventLoop
takes a lot longer when it's accounting for our standard set of systematic uncertainties. Repeat the instructions to run it, but git checkout main
and do not set MNV101_SKIP_SYST
this time. This will take 1-2.5 hours to complete. If you have problems with your laptop disconnecting from a GPVM while the tutorial is running, read about using GNU screen
to wrap your interactive session.