Fork自 PaddlePaddle / PaddleDetection
* add split lod tensor operator * add more test cast * clean code * add merge lod tensor operator * fix bug * clean code * add grad operator * make mask support GPU * add comments