* [CustomDevice] add profiler apis * migrate CalculateEstOccupancy into cuda_tracer * update * add ut