-
Notifications
You must be signed in to change notification settings - Fork 3
Training and Testing Scripts
Jessie Micallef edited this page May 22, 2020
·
1 revision
- Submit
CNN_LoadMultipleFiles.py
to train for many epochs (100s) - Check progress using
plot_loss_from_column.py
during training - Use
CNN_TestOnly.py
to check results at any time (using oscnext test)- When to check results: if validation curve leveling off or want to check results at specific epoch
- Save PegLeg or Retro test once you have settled on final model
-
CNN_LoadMultipleFiles.py
- used for training the CNN- Takes in multiple training files (of certain file pattern), loads one and trains for an epoch before loading the next file for the next epoch
- Makes sure not too much data is stored at once (~30G)
- Shuffles within the file between full file pass sets
- Expects a train, test, validate set to load data in
- Learning rate adjustable with parserr args
- Batch size and dropout currently constant
- Loss functions:
- Energy = mean_absolute_percentage_error
- Zenith = mean_squared_error
- Track Length = mean_squared_error (NOT optimized)
- Loads model architecture from
cnn_model.py
- Functionality:
- Can train for energy or zenith alone
- parser arg option --variables 1 and --first_variable "zenith" or "energy"
- Can train for energy, zenith, and/or track at the same time
- parser arg option --variables 2 or 3
- Can only do order energy then zenith then track
- Starts at the given epoch, runs for the number of epochs specified
- Helps to continue training model if killed (loads weights from given model)
- Helps to kill and reload tensorflow to avoid memory leak
- Can plot "test", comparing to oscnext flat test sample
- Can train for energy or zenith alone
- Appends loss to
saveloss_currentepoch.txt
file in output directory - Look at
make_jobs/run_CNN/
for slurm submission examples andmake_jobs_condor/run_CNN/
for HTCondor examples
- Takes in multiple training files (of certain file pattern), loads one and trains for an epoch before loading the next file for the next epoch
-
plot_loss_from_column.py
- plot loss from column sorted saveloss txt file-
CNN_LoadMultipleFiles.py
output column sorted saveloss txt file - File also stores time to train per epoch and per loading data file + training per epoch
- Order of loss, validation loss, etc. varies on number of variables training for (uses dict keys to pull correct values)
- Functionality:
- Can give ylim as ymin and ymax, parser args
- Can specify which epoch to plot until, to shorten x axis (parser arg)
- Manually can change number of files to average over and start at
- Set to 7 files to average over
- Set to start plotting avg plots at epoch 49
- Outputs plots to outdir folder
TrainingTimePerEpoch.png
loss_vs_epochs.png
AvgLossVsEpoch.png
AvgRangeVsEpoch.png
- Look at
make_jobs/plot_CNN/
for slurm submission examples andmake_jobs_condor/plot_CNN/
for HTCondor examples
-
-
CNN_TestOnly.py
- used for testing the CNN- Takes in one file
- Use
make_test_file.py
to make multiple files into onetestonly
set - See Processing Scripts section of README for more information
- Use
- Evauluates network at given model:
- Parser arg the directory name where model is stored
- Parser arg the epoch number of the model to grab
- Can compare to old reco
- Parser arg boolean flag
--compare_reco
- Give test name
--test PegLeg
or "Retro". Use "oscnext for no comparison
- Parser arg boolean flag
- Need to load in same model as training (
cnn_model.py
) - Creates many plots and outputs to model directory, with subfolder that has the test name and epoch number (gives ability to perform multiple test types on multiple epoch stages)
- Look at
make_jobs/plot_CNN/
for slurm submission examples andmake_jobs_condor/plot_CNN/
for HTCondor examples
- Takes in one file
FLERCNN by J. Micallef