For ds2, we should not involve padding data when computing loss and gradient
Created by: pkuyym
To solve the problem, two things should be done.
- Expose SubSequenceLayer (including bug fixes) https://github.com/PaddlePaddle/Paddle/issues/5335
- Add a ScaleSubRegionLayer (support set value for given indices) https://github.com/PaddlePaddle/Paddle/issues/5416
- Modify the network configuration. https://github.com/PaddlePaddle/models/pull/444
- Adapt tune.py https://github.com/PaddlePaddle/models/pull/447