* enable fast test compilation && push opencl kernels by run.py * merge cl kernels into so * restore pre-commit