Fork自 PaddlePaddle / PaddleDetection
1. finish lookup table CPU and GPU kernel 2. Add some cuda helper 3. Add some math funtor