Bad convergence when using momentum optimizer (#696) · Issue · PaddlePaddle / models

Bad convergence when using momentum optimizer

Created by: kuke

The contrast training experiment shows the DeepASR model converges well when the Adam optimizer is used #676 (closed). But when changed to the momentum optimizer, the convergence turns out to be bad. There should be some problems in the implementation of the momentum optimizer in Fluid.

Here is the comparsion of training accuracy on 4 GPUs between Fluid and Houyi with the same setting:

Parameters:

batch_size: 128
device: GPU
hidden_dim: 1024
learning_rate: 0.00016
minimum_batch_size: 1
parallel: True
proj_dim: 512
stacked_num: 5
momentum: 0.9

PaddlePaddle / models 大约 1 年 前同步成功

Bad convergence when using momentum optimizer

PaddlePaddle / models
大约 1 年前同步成功