Fork自 PaddlePaddle / Paddle
Previously, CVM OP was only able to run in CPU. This PR implements its GPU kernel. What's more, we improve the UTs about CVM OP.