For classification task fluid version trains slower than v2 version with same learning rate.
Created by: TheodoreG
We found that for the same input, same network structure, same Momentum optimizer and all the sam e hyper parameters, fluid classification version has 1% ~ 2% lower accuracy compared to v2 version and trains slower. When we set learning rate 10x larger for the fluid version, the loss descendant curve resembles v2's curve more. So is there any explanation for my observation?