Experiment using state of the art activation functions (#297) · Issue · PaddlePaddle / ERNIE

Experiment using state of the art activation functions

Created by: LifeIsStrange

This is an interesting, easy low hanging fruit to try.

The experiments show that Mish tends to work better than both ReLU and Swish along with other standard activation functions in many deep networks across challenging datasets. For instance, in Squeeze Excite Net- 18 for CIFAR 100 classification, the network with Mish had an increase in Top-1 test accuracy by 0.494% and 1.671% as compared to the same network with Swish and ReLU respectively. The similarity to Swish along with providing a boost in performance and its simplicity in implementation makes it easier for researchers and developers to use Mish in their Neural Network Models.

https://arxiv.org/abs/1908.08681v1

PaddlePaddle / ERNIE 1 年多 前同步成功

Experiment using state of the art activation functions

PaddlePaddle / ERNIE
1 年多前同步成功