Skip to content

Exercise A: The Event Loop

aolivier23 edited this page Jun 10, 2021 · 23 revisions

An event loop reads independent neutrino interactions and reduces them to some simplified format. Every data preservation analysis starts with one or more event loops over so-called AnaTuples. There are actually other event loops that were used to produce these AnaTuples, but they have already been run by the production team.

Our macro stage event loop will reduce AnaTuples to the ingredients we need for a cross section. We're going to learn to run and edit a simple 1-stage macro stage event loop. Analysis that produce multi-dimensional cross sections or have other unique computing needs might split their event loops up into more stages by memory requirements or run time. At the end of this exercise, you'll be have a .root file with cross section ingredients from the runEventLoop program you updated.

The Cross Section Ingredients

Alex's talk yesterday explained what a cross section is, why they're important to measure, and one common procedure for extracting a cross section from our data using a Monte Carlo simulation. How do we turn his formula into a program for reducing AnaTuples to the histograms we'll need to extract a cross section?

Since i and j are true and reco bin indices, then each symbol in Alex's figure is a histogram. The efficiency * acceptance correction will turn out to be the ratio of two different histograms. We can sort these histograms along two axes:

  • reco/true variables on their axes
  • which cuts are applied

The DATA histogram will also be unique because it comes from the data sample. All other cross section ingredients in this tutorial will be derived from a Monte Carlo simulation of the MINERvA experiment.

  • The data can only be measured and selected in reco variables
  • The backgrounds are subtracted from the data, so they must also be binned in reco variables and pass the reco selection.
  • The efficiency numerator characterizes signal events that pass the reco selection. It is applied to the data after the migration matrix, so it must be calculated in true variables.
  • The efficiency denominator counts all events that pass the signal definition, even if they fail the reco selection. The AnaTool that produced the MasterAnaDev AnaTuples we'll be using already threw out some signal events, so the efficiency denominator has its own larger tuple of events that we could have detected.
  • The migration matrix converts reco variables to true variables, so it is a 2D histogram with one axis of each type. We're going to apply it before the efficiency * acceptance correction.
  • The flux is simulated and constrained independently from our analysis. It is provided in true variables.

Whirlwind Tour of runEventLoop

runEventLoop calculates the cross section ingredients in one pass over the data and MC samples. The flux our detector receives changed throughout data taking, so we split our data sample, and our MC sample to match, into flux periods called playlists. We're going to be analyzing the minervame1A playlist throughout this tutorial. runEventLoop is designed to process only 1 playlist at a time, and we have to tell it which files to process on the command line like this: runEventLoop data.txt mc.txt

If you hit the TAB key after typing runEventLoop , the shell will list suitable file lists. This is a feature of the bash shell called auto-completion. If you run runEventLoop with no arguments, it will tell you about its input and output. This is typical behavior for programs in UNIX-based operating systems. If you read its "help text" closely, you'll notice that runEventLoop also looks for some environment variables.

The event loop itself is split into loops over 3 chains of TTrees for data, Monte Carlo, and the so-called "Truth Tree" that's used for the efficiency denominator. The event loop is really a loop over systematic Universes from the MINERvA Analysis Toolkit. Each Universe triggers a separate analysis with an assumption about our detector or our reconstruction changed. The event selections made are controlled by a PlotUtils::Cutter, and the physics model is reweighted to MnvTunev1 with a PlotUtils::Model. Histograms are mapped to physics quantities by PlotUtils::Variables. The main() function sets all of them up and delegates the event loops to 1 function for each chain. All of these things are more or less ready for you.

Your Task

Check out the "exerciseA" branch:

#from opt/build...
cd ../../MINERvA-101-Cross-Section
git checkout Exercise-A
cd ../opt/build
make install #Compile the modified code to start this tutorial

You need to install the histograms we'll need to extract a cross section. The histograms need to be created for each physics observable, so make them member variables in Variable.h. You'll need to do these things for each histogram:

  1. Initialize it in InitializeMCHists() or InitializeDataHists()
  2. Write() is in WriteMC() or WriteData() so it can be plotted in another program
  3. Call SyncCVHistos() in SyncCVHistos() so that systematics work correctly
  4. Fill() it in runEventLoop.cxx in a function like LoopAndFillData() after a Universe passes the necessary Cuts.

There are a few example histograms that are not cross section ingredients to get you started.

Once you have the cross section ingredients installed, you're ready to use runEventLoop. In a new shell:

  1. Set up ROOT. This might be automatic on your personal laptop. On the GPVMs, source /path/to/opt/bin/setupROOT6OnGPVMs.sh.
  2. source /path/to/opt/bin/setup.sh #Do this every time you open a new terminal or log into a GPVM to work on this project
  3. Create a working directory that is not in the source code directory or the build area. On my laptop, I put this in Documents/MINERvA101. On the GPVMs, I use /minerva/data/users/$USER/MINERvA101.
  4. Create a "playlist" file that will tell runEventLoop where to find anaTuple files. On the GPVMs, this is already done for you. On your laptop, plug in your MINERvA101 USB drive:
    • On MacOS, new USB drives end up in Volumes. Run find Volumes/ -name "MAD_MINERvA101_2021" -type d. That should print a directory name I'll call DIR. Now, run: find DIR/mc -name "*.root" > MAD_USB_MC.txt and find DIR/data -name "*.root" > MAD_USB_Data.txt
    • On Linux, new USB drives end up in /media/$USER/<a bunch of numbers>. Look for a new directory here that I'll call DIR. Then run: find DIR/mc -name "*.root" > MAD_USB_MC.txt and find DIR/data -name "*.root" > MAD_USB_Data.txt
    • On the GPVMs, just use /minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_MC_andrewsGPVM.txt
  5. runEventLoop --help to get a summary of how it works.
  6. Test your event loop with shorter file lists and systematics off:
    • Turn off systematics with export MNV101_SKIP_SYST=1
    • Generate truncated playlists with tail -n 5 MAD_USB_MC.txt > shortMC.txt and tail -n 5 MAD_USB_Data.txt > shortData.txt On the GPVMs, instead do tail -n 5 /minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_MC_andrewsGPVM.txt > shortMC.txt and tail -n 5 /minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_DATA_andrewsGPVM.txt > shortData.txt
  7. Now runEventLoop shortData.txt shortMC.txt. Do you get a message that says "Success" at the end? If not, ask for help from the instructor.

Now, let's look at some of the histograms you produced to make sure they're not empty. They're in .root files, so we're going to open them interactively using ROOT's c++ interpreter:

  1. root -l runEventLoopMC.root
  2. .ls #Lists ROOT objects like histograms in the current TDirectory
    TFile**		runEventLoopMC.root	
     TFile*		runEventLoopMC.root	
      KEY: PlotUtils::MnvH1D	pTmu_background_Wrong_Sign;1	Wrong_Sign
      KEY: PlotUtils::MnvH1D	pTmu_background_NC;1	NC
      KEY: PlotUtils::MnvH1D	pTmu_background_Other;1	Other
      KEY: PlotUtils::MnvH1D	pTmu_efficiency_numerator;1	pTmu
      KEY: PlotUtils::MnvH1D	pTmu_efficiency_denominator;1	pTmu
      KEY: PlotUtils::MnvH2D	pTmu_migration;1	pTmu
      KEY: PlotUtils::MnvH1D	pTmu_selected_signal_reco;1	pTmu
      KEY: PlotUtils::MnvH1D	pTmu_data;1	pTmu
      KEY: TParameter<double>	POTUsed;1	
      KEY: PlotUtils::MnvH1D	pTmu_reweightedflux_integrated;1	pTmu
      KEY: TParameter<double>	pTmu_fiducial_nucleons;1	
    
  3. pTmu_efficiency_numerator->SetLineWidth(3) #Make histogram line easier to see
  4. pTmu_efficiency_numerator->Draw("HIST")

You should see something like this:

The Solution

The "main" branch combines the code for the solutions to all exercises. Compare its runEventLoop.cpp and utils/Variable.h to your own with git diff. Compare the histograms in runEventLoopMC.root and runEventLoopData.root to example data and example MC.

Homework

runEventLoop takes a lot longer when it's accounting for our standard set of systematic uncertainties. Repeat the instructions to run it, but git checkout main and do not set MNV101_SKIP_SYST this time. This will take 1-2.5 hours to complete. If you have problems with your laptop disconnecting from a GPVM while the tutorial is running, read about using GNU screen to wrap your interactive session.