Training and Testing Scripts

Typical CNN Training/Testing Procedure

Submit CNN_LoadMultipleFiles.py to train for many epochs (100s)
Check progress using plot_loss_from_column.py during training
Use CNN_TestOnly.py to check results at any time (using oscnext test)
- When to check results: if validation curve leveling off or want to check results at specific epoch
- Save PegLeg or Retro test once you have settled on final model

Description of Scripts

CNN_LoadMultipleFiles.py - used for training the CNN
- Takes in multiple training files (of certain file pattern), loads one and trains for an epoch before loading the next file for the next epoch
  - Makes sure not too much data is stored at once (~30G)
  - Shuffles within the file between full file pass sets
  - Expects a train, test, validate set to load data in
- Learning rate adjustable with parserr args
- Batch size and dropout currently constant
- Loss functions:
  - Energy = mean_absolute_percentage_error
  - Zenith = mean_squared_error
  - Track Length = mean_squared_error (NOT optimized)
- Loads model architecture from cnn_model.py
- Functionality:
  - Can train for energy or zenith alone
    - parser arg option --variables 1 and --first_variable "zenith" or "energy"
  - Can train for energy, zenith, and/or track at the same time
    - parser arg option --variables 2 or 3
    - Can only do order energy then zenith then track
  - Starts at the given epoch, runs for the number of epochs specified
    - Helps to continue training model if killed (loads weights from given model)
    - Helps to kill and reload tensorflow to avoid memory leak
  - Can plot "test", comparing to oscnext flat test sample
- Appends loss to saveloss_currentepoch.txt file in output directory
- Look at make_jobs/run_CNN/ for slurm submission examples and make_jobs_condor/run_CNN/ for HTCondor examples
plot_loss_from_column.py - plot loss from column sorted saveloss txt file
- CNN_LoadMultipleFiles.py output column sorted saveloss txt file
- File also stores time to train per epoch and per loading data file + training per epoch
- Order of loss, validation loss, etc. varies on number of variables training for (uses dict keys to pull correct values)
- Functionality:
  - Can give ylim as ymin and ymax, parser args
  - Can specify which epoch to plot until, to shorten x axis (parser arg)
  - Manually can change number of files to average over and start at
    - Set to 7 files to average over
    - Set to start plotting avg plots at epoch 49
- Outputs plots to outdir folder
  - TrainingTimePerEpoch.png
  - loss_vs_epochs.png
  - AvgLossVsEpoch.png
  - AvgRangeVsEpoch.png
- Look at make_jobs/plot_CNN/ for slurm submission examples and make_jobs_condor/plot_CNN/ for HTCondor examples
CNN_TestOnly.py - used for testing the CNN
- Takes in one file
  - Use make_test_file.py to make multiple files into one testonly set
  - See Processing Scripts section of README for more information
- Evauluates network at given model:
  - Parser arg the directory name where model is stored
  - Parser arg the epoch number of the model to grab
- Can compare to old reco
  - Parser arg boolean flag --compare_reco
  - Give test name --test PegLeg or "Retro". Use "oscnext for no comparison
- Need to load in same model as training (cnn_model.py)
- Creates many plots and outputs to model directory, with subfolder that has the test name and epoch number (gives ability to perform multiple test types on multiple epoch stages)
- Look at make_jobs/plot_CNN/ for slurm submission examples and make_jobs_condor/plot_CNN/ for HTCondor examples

FLERCNN by J. Micallef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training and Testing Scripts

Typical CNN Training/Testing Procedure

Description of Scripts

Clone this wiki locally