- 30 4月, 2021 2 次提交
- 29 4月, 2021 4 次提交
-
-
由 Leo Chen 提交于
-
由 LielinJiang 提交于
* add op read_file and decode_jpeg
-
由 WeiXin 提交于
-
由 joanna.wozna.intel 提交于
* Add bf16 uniform random initializer * Remove duplicated section * Change UT to CPU place only * Put detail functions into anonymous namespace
-
- 28 4月, 2021 5 次提交
-
-
由 Leo Chen 提交于
* add input EpsilonTensor for adam * update python api * add unit test * add npu test * add more ut
-
由 arlesniak 提交于
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error * fix some error message * fix error * fix some error * fix some error
-
由 Jacek Czaja 提交于
* - Added clearing oneDNN per executor * - Executor is nt always having FLAGS_use_mkldnn set to true
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 5 次提交
-
-
由 lilong12 提交于
* add alltoall api, test=develop
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Zhang Zheng 提交于
-
由 Baibaifan 提交于
-
由 Aurelius84 提交于
-
- 26 4月, 2021 5 次提交
-
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 WangXi 提交于
-
由 ShenLiang 提交于
* fix model parallel * rm parallel_help.py * add embedding
-
由 WeiXin 提交于
* support backward return None. * edit unittest. * edit code according to CI * Improve error information
-
由 jiangcheng 提交于
* optimize slice op and slice grad op, test=develop * optimize variable name and annotation information, test=develop
-
- 25 4月, 2021 9 次提交
-
-
由 liym27 提交于
-
由 Baibaifan 提交于
-
由 Zhang Ting 提交于
-
由 Qi Li 提交于
-
由 minghaoBD 提交于
-
由 wawltor 提交于
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result * support the cuda for fix the compare broadcast bug
-
由 Chen Weihang 提交于
-
由 Leo Chen 提交于
* use ZerosLike instead of NPUMemsetAsync * fix compile
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug
-
- 23 4月, 2021 8 次提交
-
-
由 lilong12 提交于
* add c_identity op, test=develop
-
由 Leo Chen 提交于
* refactor_check_finite_and_scale_npu_kernel * fix compile * add alloc_float_status op * add alloc_float_status op * add FloatStatus for check_finite_and_unscale * refine code * remove unneccessary logic * refine for fleet
-
由 Baibaifan 提交于
solve hccl communicate conflict (#32447)
-
由 lilong12 提交于
* add c_concat op
-
由 shanliang1992 提交于
-
由 ronnywang 提交于
-
由 Leo Chen 提交于
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error
-
- 22 4月, 2021 2 次提交
-
-
由 wuyefeilin 提交于
support int32 and int64 kernel for clip operator
-
由 Zhang Zheng 提交于
-