Skip to content

Latest commit

 

History

History
executable file
·
174 lines (122 loc) · 9.21 KB

journal.md

File metadata and controls

executable file
·
174 lines (122 loc) · 9.21 KB

Project journal

Implementing binary single output channel

Start Date End Date
2020-04-25 2020-04-26

Description

Implemented binary prediction for the existing multi-Class attention-gated Unet ie. the unet can now also output a single binary channel

Delivrables

  • Dice loss with binary channel
  • Visualisation, prediction and plotting for binary channels
2 output channels 1 output channel
2 Output Channels Dice Loss 1 Output Channel Dice Loss

Conclusion

  • Unet is correctly implemented
  • Multi-channel prediction remains superior in terms of convergence

Focal Tversky loss

Start Date End Date
2020-04-26 2020-04-26

Implemented focal Tversky (FT) loss function with multi-channel output.

FT-Loss over time Dice over time with FT as loss
FT-Loss FT-Dice

Conclusion

  • Focal tversky loss performs well
  • Similar overall performance when comparing to dice loss with 2 output channels
  • Maybe more accurate on small segments
  • Maybe more prone to overfitting

Standardisation

Start Date End Date
2020-06-01 2020-06-30

Contributor: Quentin Uhl

Implemented image-wise standardisation to obtain zero mean and unit standard deviation.

Dice without standardisation Dice with standardisation
Dice without std

Best Validation Dice: 0.211099805
Dice without std

Best Validation Dice: 0.222637893

Conclusion

  • Standardisation seems to improve performance slightly.
  • However, stability (and resistance to overfitting?) seems to be worse.

Optimisation algorithm

Start Date End Date
2020-07-10 2020-07-13

Tested adam optimizer and learning rate adaptation "Plateau" policy, as this combination has been reported to perform particularly with our architecture on the MLEBE dataset. We thus compare stochastic gradient descent (SGD) with Adam as optimizer as well as a "Step" vs a "Plateau" learning rate adaption strategy.

Definitions of learning rate strategies:

from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau
# Step:
StepLR(optimizer, step_size=250, gamma=0.5)
# Plateau
ReduceLROnPlateau(optimizer, mode='min', factor=0.1, threshold=0.01, patience=5)
Optimizer & LR policy SGD Adam
Step LR SGD & LR Step

Best Validation Dice: 0.222637893
Epoch: 186
Adam & LR Step

Best Validation Dice: 0.201341671
Epoch: 115
Reduce LR on plateau SGD & LR Plateau

Best Validation Dice: 0.080664843
Epoch: 67
Adam & LR Plateau

Best Validation Dice: 0.20092963
Epoch: 123

Conclusion

  • SGD with LR step policy still seems to perform best on the gsd dataset.
  • Adam seems to train faster.
  • LR plateau policy seems to offer more stability for Adam
  • LR plateau policy performs terribly with SGD

As some future architectural changes might change the outcome of this comparison, SGD with LR step vs. Adam with plateau should be integrated in a future hyperoptimisation.

Weight decay

Start Date End Date
2020-07-25 2020-07-26

Evaluated weight decay to reduce overfitting.

Model Lambda 10-4 Lambda 10-2 Lambda 10-1
Dice Loss e-4 e-2 e-1
Best validation class 1 dice 0.224319922 0.22760875 0.200507353
Best validation epoch 120 219 99

Conclusion

  • Adjusting weight decay is effective in reducing overfitting.
  • However, weight decay slows down overall learning
  • At lambda 10-1, weight decay begins to hurt performance
  • Weight decay should be integrated in a future grid search

Combined loss

Start Date End Date
2020-07-17 2020-07-31

Inspired by the work by Yu et al, we evaluated a combined loss defined as follows: Loss = Weighted binary cross entropy + L1 loss + 0.5×(1 – DSC) + 0.25×Volume loss

Losses

Class wise implementations:

  • Single class: Computed only for Stroke presence
  • Multi class: mean computed for all classes (thus stroke presence and absence)

Weighted binary cross-entropy loss

WBCE-Loss
R0 weight R1 weight

Volume loss

Volume-Loss

  • Volume loss is not differentiable (as max/threshold is not differentiable) and has thus to be combined with another loss.
  • Single class volume loss high spikes are due to augmented train volumes with no lesion visible

L1 loss

L1-Loss

Tversky and focal tversky loss

Original work by Salehi et al and Abraham et al.

Tversky-Loss
Focal Tversky-Loss

Loss Loss Evolution Dice over time Best validation dice Best validation epoch
Multi class Dice loss Multi class Dice-Loss Dice Score 0.222637893 186
Single class Dice loss Single class Dice-Loss Dice Score 0.213009384 301
L1 loss L1-Loss Dice Score 0.005964876 14
WBCE loss WBCE-Loss Dice Score 0.127579561 172
Single class Volume + Dice loss Volume-Dice-Loss Dice Score 0.210912979 176
Combined Loss Combined-Loss Dice Score 0.20187068 106
Multi class Volume + Dice loss Multi-class-Volume-Dice-Loss Dice Score 0.207735219 182
Multi class focal tversky loss Multi-class-focal-tversky-Loss Dice Score 0.209134249 251
Single class focal tversky loss Single-class-focal-tversky-Loss Dice Score 0.225328473 201

Conclusion

  • combined loss with single class volume loss seems to converge slightly faster
  • combined loss yields worse results than dice loss alone on validation, with probably smaller std

TODO

  • Implement augmentation
  • implement combined loss