- 30 4月, 2021 7 次提交
-
-
由 Pei Yang 提交于
-
由 Zhou Wei 提交于
-
由 pangyoki 提交于
* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs * add softmax_with_cross_entropy_ Inplace API * add clip_ scale_ add_ subtract_ Inplace APIs * add wlist * fix parameter of scale api * add add_n_ Inplace API and remove log_ Inplace API * fix elementwise_add_ and elementwise_sub_ broadcast problem * elementwise inplace api give error message before run the op * use broadcast_shape in elementwise inplace op * add 8 inplace apis that is auto generated * add unittest for all inplace apis * add decorator for inplace apis in static mode * fix windows blas fail of exp inplace api, change array_equal to allclose * add flatten inplace api * add flatten unittest * fix flatten unittest * add decorator * fix grad.numpy in test_pylayer_op * unsupport softmax_with_cross_entropy_ * add test_inplace_softmax_with_cross_entropy to static_mode_white_list * delete __all__ in inplace_utils * delete activation inplace function and add Tensor.inplace_func * change paddle.inplace_ to Tensor.inplace_ * fix little problem * add paddle in inplace_utils
-
由 ceci3 提交于
-
由 123malin 提交于
-
由 Baibaifan 提交于
-
由 jakpiase 提交于
-
- 29 4月, 2021 10 次提交
-
-
由 liuyuhui 提交于
-
由 Leo Chen 提交于
-
由 LielinJiang 提交于
* add op read_file and decode_jpeg
-
由 Chen Weihang 提交于
-
由 WeiXin 提交于
-
由 cc 提交于
-
由 Pei Yang 提交于
-
由 joanna.wozna.intel 提交于
* Add bf16 uniform random initializer * Remove duplicated section * Change UT to CPU place only * Put detail functions into anonymous namespace
-
由 Wilber 提交于
-
由 zlsh80826 提交于
* implement MHA order same as training * fix fp16 compile issue on old architecture * fix format * fix format
-
- 28 4月, 2021 8 次提交
-
-
由 Leo Chen 提交于
* add input EpsilonTensor for adam * update python api * add unit test * add npu test * add more ut
-
由 arlesniak 提交于
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug * Add delete dropout_op pass * Fix some format bug * Fix format bug
-
由 Thunderbrook 提交于
* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)" This reverts commit 809ac036. * brpc dep
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error * fix some error message * fix error * fix some error * fix some error
-
由 zhulei 提交于
-
由 Jacek Czaja 提交于
* - Added clearing oneDNN per executor * - Executor is nt always having FLAGS_use_mkldnn set to true
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 10 次提交
-
-
由 lilong12 提交于
* add alltoall api, test=develop
-
由 WeiXin 提交于
* clear 'BasicEngine' when an exception occurs in the backward. * deal with conflict. * deal with conflict.
-
由 wenbin 提交于
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Zhang Zheng 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
-
由 tianshuo78520a 提交于
This reverts commit 4b7242b0.
-
由 Aurelius84 提交于
-
由 XiangGao 提交于
Co-authored-by: NYang Zhang <yangzhang@live.com>
-
- 26 4月, 2021 5 次提交
-
-
由 lilong12 提交于
* add sendrecv, test=develop
-
由 Zhou Wei 提交于
* clear CUDA compile environment on windows * fix Windows CI * fix Windows CI * fix Windows CI
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 Thunderbrook 提交于
* optimize pull sparse * optimize pull sparse * change macro * format
-
由 Yiqun Liu 提交于
* Unset ReserveSpace for inference program. * Support training from an inference program.
-