This is quick evaluation of conv1 depth impact on performance on ImageNet-2012.

The architecture is similar to CaffeNet, but has differences:

Images are resized to small side = 128 for speed reasons.
fc6 and fc7 layers have 2048 neurons instead of 4096.
Networks are initialized with LSUV-init
No LRN layers.

Default augmentation: random crop 128x128 from 144xN image, 50% random horizontal flip. Here I am trying to check how much improvement can we get by learning "deeper" == "more complex" conv1, without receptive field change. To do this, I am adding series of 96 channels 1x1 convolution + ReLU after conv1, before pool1.

For comparison I also add 96 channels 3x3 convolution, which has more parameters AND - which is more important - which increases receptive field

Conv1 depth

Name	Accuracy	LogLoss	Comments
Default, no 1x1 or 3x3	0.471	2.36	conv1 -> pool1
+ 1x1x96 NiN	0.490	2.24	conv1 -> 96C1 -> pool1
+ 3x (1x1x96 NiN)	0.509	2.10	conv1 -> 3x(96C1) -> pool1
+ 5x (1x1x96 NiN)	0.514	2.11	conv1 -> 5x(96C1) -> pool1
+ 7x (1x1x96 NiN)	0.514	2.11	conv1 -> 7x(96C1) -> pool1
+ 9x (1x1x96 NiN)	0.516	2.10	conv1 -> 9x(96C1) -> pool1
+ 9x (1x1x96 NiN)R	0.509	2.13	conv1 -> Residual9x(96C1) -> pool1. 276k iters
+ 1x (3x3x96 NiN)	0.500	2.19	conv1 -> 1x(96C3) -> pool1
+ 3x (3x3x96 NiN)	0.538	1.99	conv1 -> 1x(96C3) -> pool1
+ 5x (3x3x96 NiN)	0.551	1.91	conv1 -> 1x(96C3) -> pool1

So impact of "more complex conv1" without inreasing the receptual field quickly saturates, while for 3x3 convolutions - not. See prototxt in the beginning of the log file logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conv1_depth.md

Conv1_depth.md

Conv1 depth

Files

Conv1_depth.md

Latest commit

History

Conv1_depth.md

File metadata and controls

Conv1 depth