- 30 4月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 29 4月, 2021 10 次提交
-
-
由 liuyuhui 提交于
-
由 Leo Chen 提交于
-
由 LielinJiang 提交于
* add op read_file and decode_jpeg
-
由 Chen Weihang 提交于
-
由 WeiXin 提交于
-
由 cc 提交于
-
由 Pei Yang 提交于
-
由 joanna.wozna.intel 提交于
* Add bf16 uniform random initializer * Remove duplicated section * Change UT to CPU place only * Put detail functions into anonymous namespace
-
由 Wilber 提交于
-
由 zlsh80826 提交于
* implement MHA order same as training * fix fp16 compile issue on old architecture * fix format * fix format
-
- 28 4月, 2021 8 次提交
-
-
由 Leo Chen 提交于
* add input EpsilonTensor for adam * update python api * add unit test * add npu test * add more ut
-
由 arlesniak 提交于
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug * Add delete dropout_op pass * Fix some format bug * Fix format bug
-
由 Thunderbrook 提交于
* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)" This reverts commit 809ac036. * brpc dep
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error * fix some error message * fix error * fix some error * fix some error
-
由 zhulei 提交于
-
由 Jacek Czaja 提交于
* - Added clearing oneDNN per executor * - Executor is nt always having FLAGS_use_mkldnn set to true
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 10 次提交
-
-
由 lilong12 提交于
* add alltoall api, test=develop
-
由 WeiXin 提交于
* clear 'BasicEngine' when an exception occurs in the backward. * deal with conflict. * deal with conflict.
-
由 wenbin 提交于
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Zhang Zheng 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
-
由 tianshuo78520a 提交于
This reverts commit 4b7242b0.
-
由 Aurelius84 提交于
-
由 XiangGao 提交于
Co-authored-by: NYang Zhang <yangzhang@live.com>
-
- 26 4月, 2021 11 次提交
-
-
由 lilong12 提交于
* add sendrecv, test=develop
-
由 Zhou Wei 提交于
* clear CUDA compile environment on windows * fix Windows CI * fix Windows CI * fix Windows CI
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 Thunderbrook 提交于
* optimize pull sparse * optimize pull sparse * change macro * format
-
由 Yiqun Liu 提交于
* Unset ReserveSpace for inference program. * Support training from an inference program.
-
由 WangXi 提交于
-
由 ShenLiang 提交于
* fix model parallel * rm parallel_help.py * add embedding
-
由 石晓伟 提交于
-
由 WeiXin 提交于
* support backward return None. * edit unittest. * edit code according to CI * Improve error information
-
由 jiangcheng 提交于
* optimize slice op and slice grad op, test=develop * optimize variable name and annotation information, test=develop
-
由 Leo Chen 提交于
* skip op has no fp16 kernel * add ut
-