Created by: Xreki
For elementwise_op
, the computation can be largely simplified if the two inputs have the same dims.
This PR uses eigen to optimize this case. See detail profiling result in PaddingRNN large model.
- Before
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::elementwise_mul 2170 69.5541 59.471037 (0.855033) 10.083067 (0.144967) 0.022085 0.225423 0.0320526 0.0551425
thread0::elementwise_add 1410 40.7991 36.844589 (0.903073) 3.954532 (0.096927) 0.020992 0.143524 0.0289355 0.0323455
- After
Event Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio.
thread0::elementwise_mul 2170 51.895 42.160885 (0.812427) 9.734098 (0.187573) 0.01669 0.21234 0.0239147 0.0419772
thread0::elementwise_add 1410 33.9144 29.955723 (0.883274) 3.958685 (0.116726) 0.01461 0.116554 0.0240528 0.0274329