Project journal

Implementing binary single output channel

Start Date	End Date
2020-04-25	2020-04-26

Description

Implemented binary prediction for the existing multi-Class attention-gated Unet ie. the unet can now also output a single binary channel

Delivrables

Dice loss with binary channel
Visualisation, prediction and plotting for binary channels

2 output channels	1 output channel

Conclusion

Unet is correctly implemented
Multi-channel prediction remains superior in terms of convergence

Focal Tversky loss

Start Date	End Date
2020-04-26	2020-04-26

Implemented focal Tversky (FT) loss function with multi-channel output.

FT-Loss over time	Dice over time with FT as loss

Conclusion

Focal tversky loss performs well
Similar overall performance when comparing to dice loss with 2 output channels
Maybe more accurate on small segments
Maybe more prone to overfitting

Standardisation

Start Date	End Date
2020-06-01	2020-06-30

Contributor: Quentin Uhl

Implemented image-wise standardisation to obtain zero mean and unit standard deviation.

Dice without standardisation	Dice with standardisation
Best Validation Dice: 0.211099805	Best Validation Dice: 0.222637893

Conclusion

Standardisation seems to improve performance slightly.
However, stability (and resistance to overfitting?) seems to be worse.

Optimisation algorithm

Start Date	End Date
2020-07-10	2020-07-13

Tested adam optimizer and learning rate adaptation "Plateau" policy, as this combination has been reported to perform particularly with our architecture on the MLEBE dataset. We thus compare stochastic gradient descent (SGD) with Adam as optimizer as well as a "Step" vs a "Plateau" learning rate adaption strategy.

Definitions of learning rate strategies:

from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau
# Step:
StepLR(optimizer, step_size=250, gamma=0.5)
# Plateau
ReduceLROnPlateau(optimizer, mode='min', factor=0.1, threshold=0.01, patience=5)

Optimizer & LR policy	SGD	Adam
Step LR	Best Validation Dice: 0.222637893 Epoch: 186	Best Validation Dice: 0.201341671 Epoch: 115
Reduce LR on plateau	Best Validation Dice: 0.080664843 Epoch: 67	Best Validation Dice: 0.20092963 Epoch: 123

Conclusion

SGD with LR step policy still seems to perform best on the gsd dataset.
Adam seems to train faster.
LR plateau policy seems to offer more stability for Adam
LR plateau policy performs terribly with SGD

As some future architectural changes might change the outcome of this comparison, SGD with LR step vs. Adam with plateau should be integrated in a future hyperoptimisation.

Weight decay

Start Date	End Date
2020-07-25	2020-07-26

Evaluated weight decay to reduce overfitting.

Model	Lambda 10^-4	Lambda 10^-2	Lambda 10^-1
Dice Loss
Best validation class 1 dice	0.224319922	0.22760875	0.200507353
Best validation epoch	120	219	99

Conclusion

Adjusting weight decay is effective in reducing overfitting.
However, weight decay slows down overall learning
At lambda 10^-1, weight decay begins to hurt performance
Weight decay should be integrated in a future grid search

Combined loss

Start Date	End Date
2020-07-17	2020-07-31

Inspired by the work by Yu et al, we evaluated a combined loss defined as follows: Loss = Weighted binary cross entropy + L1 loss + 0.5×(1 – DSC) + 0.25×Volume loss

Losses

Class wise implementations:

Single class: Computed only for Stroke presence
Multi class: mean computed for all classes (thus stroke presence and absence)

Weighted binary cross-entropy loss

Volume loss

Volume loss is not differentiable (as max/threshold is not differentiable) and has thus to be combined with another loss.
Single class volume loss high spikes are due to augmented train volumes with no lesion visible

L1 loss

Tversky and focal tversky loss

Original work by Salehi et al and Abraham et al.

Loss	Best validation dice	Best validation epoch
Multi class Dice loss	0.222637893	186
Single class Dice loss	0.213009384	301
L1 loss	0.005964876	14
WBCE loss	0.127579561	172
Single class Volume + Dice loss	0.210912979	176
Combined Loss	0.20187068	106
Multi class Volume + Dice loss	0.207735219	182
Multi class focal tversky loss	0.209134249	251
Single class focal tversky loss	0.225328473	201

Conclusion

combined loss with single class volume loss seems to converge slightly faster
combined loss yields worse results than dice loss alone on validation, with probably smaller std

TODO

Implement augmentation
implement combined loss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

journal.md

journal.md

Project journal

Implementing binary single output channel

Description

Delivrables

Conclusion

Focal Tversky loss

Conclusion

Standardisation

Conclusion

Optimisation algorithm

Conclusion

Weight decay

Conclusion

Combined loss

Losses

Weighted binary cross-entropy loss

Volume loss

L1 loss

Tversky and focal tversky loss

Conclusion

TODO

Files

journal.md

Latest commit

History

journal.md

File metadata and controls

Project journal

Implementing binary single output channel

Description

Delivrables

Conclusion

Focal Tversky loss

Conclusion

Standardisation

Conclusion

Optimisation algorithm

Conclusion

Weight decay

Conclusion

Combined loss

Losses

Weighted binary cross-entropy loss

Volume loss

L1 loss

Tversky and focal tversky loss

Conclusion

TODO