Fork自 PaddlePaddle / Paddle
* [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu
* support setting xpu place * add ut, test=kunlun