Lab 11 References: Generalization On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima Train longer, generalize better: closing the generalization gap in large batch training of neural networks Sharpness-Aware Minimization for Efficiently Improving Generalization SAM Github Benchmarks: CIFAR-100: https://paperswithcode.com/sota/image-classification-on-cifar-100 CIFAR-100-Noisy: https://paperswithcode.com/sota/learning-with-noisy-labels-on-cifar-100n 94% on CIFAR-10 in 3.29 Seconds on a Single GPU Mamba: Github https://arxiv.org/abs/2405.21060 xLSTM: Github https://arxiv.org/abs/2405.04517