Benchmark of ResNet on CIFAR-10

This is a TensorFlow replication of experiments on CIFAR-10 mentioned in ResNet (K. He, et al., Deep Residual Learning for Image Recognition). My codes:

  • Adapts Keras’s example codes of ResNet for CIFAR-10 (note that this is a simpler version specially designed for CIFAR-10);
  • Apply the SENet module to the ResNet;
  • Re-train the ResNet w/o SENet on CIFAR-10 for benchmark evaluation.

For statistical validation, each group of experiment has been run for 5 times.

Firstly, I try to reproduce the results with totally the same model codes given by Keras. The batch size is set to 32. The optimizer is Adam with an initial learning rate 0.001. The validation split is 0, and the whole train set is used as training data. The data augmentation could be found at keras_augmentation. The leanring scheduler could be found at keras_lr_scheduler. Our results outperform Keras’s on ResNet44v1_CIFAR10, but go worse a little on other models.

ResNet20 keras_augmentation Adam 0.001 0 keras_lr_scheduler 0.9183

ModelAuthorbest test accuracy
ResNet20v1_CIFAR10Keras0.9216
ResNet20v1_CIFAR10Kan0.9183
ResNet32v1_CIFAR10Keras0.9246
ResNet32v1_CIFAR10Kan0.9227
ResNet44v1_CIFAR10Keras0.9250
ResNet44v1_CIFAR10Kan0.9252
ResNet56v1_CIFAR10Keras0.9271
ResNet56v1_CIFAR10Kan0.9236
ResNet110v1_CIFAR10Keras0.9265
ResNet110v1_CIFAR10Kan0.9260

Later, I follow the same configuration by K. He, et al., and use standard normalization. Random translation (padding, and then crop with a horizonal flip) is also applied. The validation split changes from 0 to 0.1 thus results in a 45k/5k train/val split. We follow the same optimizer settings. We use the SGD optimizer with an initial learning rate of 0.1, a momentum of 0.9, a weight decay of 0.0001 and finally a mini-batch size of 128. cifar10_scheduler is applied. In original paper, learning rate is reduced by a factor 0.1 on some a step such as the 32k and 48k step. Here we convert these step indices into epoch indices by equation: $$ steps_per_epoch = \lceil \frac{45000}{batch_size} \rceil , $$ $$ epoch_to_reduce_lr = \lceil \frac{32000|48000}{steps_per_epoch} \rceil . $$ Thus we have an adapted schedule:

steps (batchsize 128)epochLR (SGD)
32k1~910.1
48k92~1370.01
64k137~1820.001

The results are listed as the table below:

Modelpre-processingdata_augmentationAuthorbest test accuracy
ResNet20v1_CIFAR10subtract_meanpad_crop_flipHe et al.91.25%
ResNet20v1_CIFAR10std_normrandom_translationKan91.30%
ResNet32v1_CIFAR10subtract_meanpad_crop_flipHe et al.92.49%
ResNet32v1_CIFAR10std_normrandom_translationKan92.16%
ResNet44v1_CIFAR10subtract_meanpad_crop_flipHe et al.92.83%
ResNet44v1_CIFAR10std_normrandom_translationKanN/A
ResNet56v1_CIFAR10subtract_meanpad_crop_flipHe et al.93.03%
ResNet56v1_CIFAR10std_normrandom_translationKanN/A
ResNet110v1_CIFAR10subtract_meanpad_crop_flipHe et al.93.39+-.16%
ResNet110v1_CIFAR10std_normrandom_translationKan0.9210
ResNet164v1_CIFAR10subtract_meanpad_crop_flipHe et al.N/A
ResNet164v1_CIFAR10std_normrandom_translationKan0.9174

The main difference between random_translation and pad_crop_flip is that the former supply with a new fill_mode='nearest'. Among the model that both He et al. and me have the results, our results only outperform He’s on ResNet20v1_CIFAR10, and are not as good as He claims on ResNet32v1_CIFAR10 and ResNet110v1_CIFAR10. Models listed below are further used as benchmarks for SENet counterpart:

  • ResNet20v1_CIFAR10
  • ResNet32v1_CIFAR10
  • ResNet110v1_CIFAR10
  • ResNet164v1_CIFAR10