- 10 5月, 2023 1 次提交
 - 
- 
由 Bo Zhang 提交于
* Support different dtypes of inputs for broadcast for dropout optimization (#52093) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * PR comment * dropout_nd_optimization (#51479) * with printf * add DropOutNdForwardKernel * PR comment * Dropout optimize & clean broadcast inT and ElementwiseType (#52969) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * clean ElementwiseT and InT for BroadcastKernel * default axis and clean inT * remove redundant fast divmod computation * optimize drop_nd & drop_nd_grad * optimize BroadcastDataLoader bf16 fp16 * rm InT etc. after merge develop * delete constexpr for windows ci * fix conflict * fix conflic with develop * fix conflic * new clean * clean * Fix xpu2 kp compile error (#53548) * fix conflict * conflict
 
 - 
 - 16 3月, 2023 1 次提交
 - 
- 
由 Huang Jiyi 提交于
* remove contexts in tensor_utils * update from_blob * update from_blob * update from_blob * fix bug * fix bug
 
 - 
 - 08 3月, 2023 1 次提交
 - 
- 
由 FlyingQianMM 提交于
 
 - 
 - 21 11月, 2022 1 次提交
 - 
- 
由 wanghuancoder 提交于
* refine reduce_all
 
 - 
 - 08 8月, 2022 1 次提交
 - 
- 
由 hong 提交于
* move reduce_all_flag from python to c++ * fix infer shape bug * fix bug; * fix sum infer meta bug * fix reduce sum grad gpu bug * fix amin amax bug;
 
 - 
 - 01 8月, 2022 1 次提交
 - 
- 
由 Xiaoxu Chen 提交于
 
 -