Created by: qingqing01
Fix #6842 (closed)
Mainly for the broadcast in Eigen. The time changes after optimization are as follows:
-
Experiments Env:
- config: 3 stacked LSTM network, the hidden size is 64
- 2 epoc
-
Total time of 2 epoc:
- CPU: 345.54137s -> 304.96511s .
- GPU: 89.72162s vs 89.22058s. This optimization does not change the execution time on GPU.