* support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder