- Writer: Dongmin You
- Title: (cs229) Lecture 9 : CNN Architectures
- Link: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
- Keywords: AlexNet, VGGNet, GoogLeNet, ResNet, Network in Network, Wide ResNet, ResNeXT, Stochastic Depth, DenseNet, FractalNet, SqueezeNet
*Details *First use of ReLU *Network spread across 2 GPUs, half the neurons one each GPU due to the lack of memory (communications across GPUs : conv3, FCs)
- Small filters(3x3 conv), Deeper networks
- Why use small filters? -> Stack of three 3x3 conv(stride 1) layers has same "Effective receptive field" as one 7x7 conv layer with deeper, more linearities, small parameters : 27 vs 49
- Problems : too many computes & parameters (heavy)
*Details *FC7 features generalize well to other tasks
- Computational efficiency, inception modules, Bottleneck layers, Auxiliary classification outputs
- Apply parallel filter operations on the input from previous layer : 1x1, 3x3, 5x5, max pooling(add small translation invariance) -> concatenate
- Problems of Naive Inception module : Conmutational became heavy after concatenate
- Solution : Dimension reduction with adding 'bottleneck layers(1x1 conv)' to control depth
*Details *Add Auxiliary classification outputs to inject additional gradient at lower layers to prevent gradient vanishing
- Revolutional deep 152 Layers, Residual Connections
- Previous Problem : Just stacking deeper layers on a "plain" convolutional neural network makes optimiazation problem
- Solution : Using identity mapping
*Details *Xavier/2 initialization
- Randomly drop a subset of layers during each training pass
- Residual representations are not necessary. Key is to transitioning effectively from shallow to deep, and use dropout
- Compress network by consisting 'squeeze' layer with 1x1 filters feeding an 'expand' layer with 1x1 and 3x3
