This is a TensorFlow replication of experiments on CIFAR-10 mentioned in ResNet (K. He, et al., Deep Residual Learning for Image Recognition). My codes:
- Adapts Keras’s example codes of ResNet for CIFAR-10 (note that this is a simpler version specially designed for CIFAR-10);
- Apply the SENet module to the ResNet;
- Re-train the ResNet w/o SENet on CIFAR-10 for benchmark evaluation.
For statistical validation, each group of experiment has been run for 5 times.
Firstly, I try to reproduce the results with totally the same model codes given by Keras. The batch size is set to 32. The optimizer is Adam with an initial learning rate 0.001. The validation split is 0, and the whole train set is used as training data. The data augmentation could be found at keras_augmentation. The leanring scheduler could be found at keras_lr_scheduler. Our results outperform Keras’s on ResNet44v1_CIFAR10
, but go worse a little on other models.
ResNet20 keras_augmentation Adam 0.001 0 keras_lr_scheduler 0.9183
Model | Author | best test accuracy |
---|---|---|
ResNet20v1_CIFAR10 | Keras | 0.9216 |
ResNet20v1_CIFAR10 | Kan | 0.9183 |
ResNet32v1_CIFAR10 | Keras | 0.9246 |
ResNet32v1_CIFAR10 | Kan | 0.9227 |
ResNet44v1_CIFAR10 | Keras | 0.9250 |
ResNet44v1_CIFAR10 | Kan | 0.9252 |
ResNet56v1_CIFAR10 | Keras | 0.9271 |
ResNet56v1_CIFAR10 | Kan | 0.9236 |
ResNet110v1_CIFAR10 | Keras | 0.9265 |
ResNet110v1_CIFAR10 | Kan | 0.9260 |
Later, I follow the same configuration by K. He, et al., and use standard normalization. Random translation (padding, and then crop with a horizonal flip) is also applied. The validation split changes from 0 to 0.1 thus results in a 45k/5k train/val split. We follow the same optimizer settings. We use the SGD optimizer with an initial learning rate of 0.1, a momentum of 0.9, a weight decay of 0.0001 and finally a mini-batch size of 128.
cifar10_scheduler
is applied. In original paper, learning rate is reduced by a factor 0.1 on some a step such as the 32k and 48k step. Here we convert these step indices into epoch indices by equation:
$$ steps_per_epoch = \lceil \frac{45000}{batch_size} \rceil , $$
$$ epoch_to_reduce_lr = \lceil \frac{32000|48000}{steps_per_epoch} \rceil . $$
Thus we have an adapted schedule:
steps (batchsize 128) | epoch | LR (SGD) |
---|---|---|
32k | 1~91 | 0.1 |
48k | 92~137 | 0.01 |
64k | 137~182 | 0.001 |
The results are listed as the table below:
Model | pre-processing | data_augmentation | Author | best test accuracy |
---|---|---|---|---|
ResNet20v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | 91.25% |
ResNet20v1_CIFAR10 | std_norm | random_translation | Kan | 91.30% |
ResNet32v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | 92.49% |
ResNet32v1_CIFAR10 | std_norm | random_translation | Kan | 92.16% |
ResNet44v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | 92.83% |
ResNet44v1_CIFAR10 | std_norm | random_translation | Kan | N/A |
ResNet56v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | 93.03% |
ResNet56v1_CIFAR10 | std_norm | random_translation | Kan | N/A |
ResNet110v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | 93.39+-.16% |
ResNet110v1_CIFAR10 | std_norm | random_translation | Kan | 0.9210 |
ResNet164v1_CIFAR10 | subtract_mean | pad_crop_flip | He et al. | N/A |
ResNet164v1_CIFAR10 | std_norm | random_translation | Kan | 0.9174 |
The main difference between random_translation
and pad_crop_flip
is that the former supply with a new fill_mode='nearest'
.
Among the model that both He et al. and me have the results, our results only outperform He’s on ResNet20v1_CIFAR10, and are not as good as He claims on ResNet32v1_CIFAR10 and ResNet110v1_CIFAR10.
Models listed below are further used as benchmarks for SENet counterpart:
- ResNet20v1_CIFAR10
- ResNet32v1_CIFAR10
- ResNet110v1_CIFAR10
- ResNet164v1_CIFAR10