- 28 4月, 2021 8 次提交
-
-
由 Leo Chen 提交于
* add input EpsilonTensor for adam * update python api * add unit test * add npu test * add more ut
-
由 arlesniak 提交于
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug * Add delete dropout_op pass * Fix some format bug * Fix format bug
-
由 Thunderbrook 提交于
* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)" This reverts commit 809ac036. * brpc dep
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error * fix some error message * fix error * fix some error * fix some error
-
由 zhulei 提交于
-
由 Jacek Czaja 提交于
* - Added clearing oneDNN per executor * - Executor is nt always having FLAGS_use_mkldnn set to true
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 10 次提交
-
-
由 lilong12 提交于
* add alltoall api, test=develop
-
由 WeiXin 提交于
* clear 'BasicEngine' when an exception occurs in the backward. * deal with conflict. * deal with conflict.
-
由 wenbin 提交于
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Zhang Zheng 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
-
由 tianshuo78520a 提交于
This reverts commit 4b7242b0.
-
由 Aurelius84 提交于
-
由 XiangGao 提交于
Co-authored-by: NYang Zhang <yangzhang@live.com>
-
- 26 4月, 2021 13 次提交
-
-
由 lilong12 提交于
* add sendrecv, test=develop
-
由 Zhou Wei 提交于
* clear CUDA compile environment on windows * fix Windows CI * fix Windows CI * fix Windows CI
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 Thunderbrook 提交于
* optimize pull sparse * optimize pull sparse * change macro * format
-
由 Yiqun Liu 提交于
* Unset ReserveSpace for inference program. * Support training from an inference program.
-
由 WangXi 提交于
-
由 ShenLiang 提交于
* fix model parallel * rm parallel_help.py * add embedding
-
由 石晓伟 提交于
-
由 WeiXin 提交于
* support backward return None. * edit unittest. * edit code according to CI * Improve error information
-
由 jiangcheng 提交于
* optimize slice op and slice grad op, test=develop * optimize variable name and annotation information, test=develop
-
由 Leo Chen 提交于
* skip op has no fp16 kernel * add ut
-
由 Leo Chen 提交于
-
由 Shang Zhizhou 提交于
-
- 25 4月, 2021 9 次提交
-
-
由 Pei Yang 提交于
* fix airank bert emb order * move input num check to converter * add input num check * add unused var check white list
-
由 liym27 提交于
-
由 liym27 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
* add trt runtime version check * use different wrap, and change to major version check
-
由 Pei Yang 提交于
-
由 Zhang Ting 提交于
-
由 Leo Chen 提交于
-
由 Wilber 提交于
-