* fix some op bugs * fix some bugs * follow comments * fix log level * add ut
* add npu kernel for concat op * add npu kernel for concat op * refine code * update * refine concat_grad