Created by: jianhang-liu
This PR is based on Tensor Tang's PR (13118, fused LSTM op). It has 3 commits:
- Refine code in fused LSTM op for better code readability
- Add peephole in seq mode as well as unit test for peephole
- Add peephole in batch mode as well as change default mode to batch mode