Start Date | End Date |
---|---|
2020-04-25 | 2020-04-26 |
Implemented binary prediction for the existing multi-Class attention-gated Unet ie. the unet can now also output a single binary channel
- Dice loss with binary channel
- Visualisation, prediction and plotting for binary channels
2 output channels | 1 output channel |
---|---|
- Unet is correctly implemented
- Multi-channel prediction remains superior in terms of convergence
Start Date | End Date |
---|---|
2020-04-26 | 2020-04-26 |
Implemented focal Tversky (FT) loss function with multi-channel output.
FT-Loss over time | Dice over time with FT as loss |
---|---|
- Focal tversky loss performs well
- Similar overall performance when comparing to dice loss with 2 output channels
- Maybe more accurate on small segments
- Maybe more prone to overfitting
Start Date | End Date |
---|---|
2020-06-01 | 2020-06-30 |
Contributor: Quentin Uhl
Implemented image-wise standardisation to obtain zero mean and unit standard deviation.
Dice without standardisation | Dice with standardisation |
---|---|
Best Validation Dice: 0.211099805 |
Best Validation Dice: 0.222637893 |
- Standardisation seems to improve performance slightly.
- However, stability (and resistance to overfitting?) seems to be worse.
Start Date | End Date |
---|---|
2020-07-10 | 2020-07-13 |
Tested adam optimizer and learning rate adaptation "Plateau" policy, as this combination has been reported to perform particularly with our architecture on the MLEBE dataset. We thus compare stochastic gradient descent (SGD) with Adam as optimizer as well as a "Step" vs a "Plateau" learning rate adaption strategy.
Definitions of learning rate strategies:
from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau
# Step:
StepLR(optimizer, step_size=250, gamma=0.5)
# Plateau
ReduceLROnPlateau(optimizer, mode='min', factor=0.1, threshold=0.01, patience=5)
- SGD with LR step policy still seems to perform best on the gsd dataset.
- Adam seems to train faster.
- LR plateau policy seems to offer more stability for Adam
- LR plateau policy performs terribly with SGD
As some future architectural changes might change the outcome of this comparison, SGD with LR step vs. Adam with plateau should be integrated in a future hyperoptimisation.
Start Date | End Date |
---|---|
2020-07-25 | 2020-07-26 |
Evaluated weight decay to reduce overfitting.
Model | Lambda 10-4 | Lambda 10-2 | Lambda 10-1 |
---|---|---|---|
Dice Loss | |||
Best validation class 1 dice | 0.224319922 | 0.22760875 | 0.200507353 |
Best validation epoch | 120 | 219 | 99 |
- Adjusting weight decay is effective in reducing overfitting.
- However, weight decay slows down overall learning
- At lambda 10-1, weight decay begins to hurt performance
- Weight decay should be integrated in a future grid search
Start Date | End Date |
---|---|
2020-07-17 | 2020-07-31 |
Inspired by the work by Yu et al, we evaluated a combined loss defined as follows:
Loss = Weighted binary cross entropy + L1 loss + 0.5×(1 – DSC) + 0.25×Volume loss
Class wise implementations:
- Single class: Computed only for Stroke presence
- Multi class: mean computed for all classes (thus stroke presence and absence)
- Volume loss is not differentiable (as max/threshold is not differentiable) and has thus to be combined with another loss.
- Single class volume loss high spikes are due to augmented train volumes with no lesion visible
Original work by Salehi et al and Abraham et al.
- combined loss with single class volume loss seems to converge slightly faster
- combined loss yields worse results than dice loss alone on validation, with probably smaller std
- Implement augmentation
- implement combined loss