- 28 4月, 2021 1 次提交
-
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 5 次提交
-
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Pei Yang 提交于
-
由 tianshuo78520a 提交于
This reverts commit 4b7242b0.
-
由 Aurelius84 提交于
-
由 XiangGao 提交于
Co-authored-by: NYang Zhang <yangzhang@live.com>
-
- 26 4月, 2021 12 次提交
-
-
由 lilong12 提交于
* add sendrecv, test=develop
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 Thunderbrook 提交于
* optimize pull sparse * optimize pull sparse * change macro * format
-
由 Yiqun Liu 提交于
* Unset ReserveSpace for inference program. * Support training from an inference program.
-
由 WangXi 提交于
-
由 ShenLiang 提交于
* fix model parallel * rm parallel_help.py * add embedding
-
由 石晓伟 提交于
-
由 WeiXin 提交于
* support backward return None. * edit unittest. * edit code according to CI * Improve error information
-
由 jiangcheng 提交于
* optimize slice op and slice grad op, test=develop * optimize variable name and annotation information, test=develop
-
由 Leo Chen 提交于
* skip op has no fp16 kernel * add ut
-
由 Leo Chen 提交于
-
由 Shang Zhizhou 提交于
-
- 25 4月, 2021 17 次提交
-
-
由 Pei Yang 提交于
* fix airank bert emb order * move input num check to converter * add input num check * add unused var check white list
-
由 liym27 提交于
-
由 liym27 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
* add trt runtime version check * use different wrap, and change to major version check
-
由 Pei Yang 提交于
-
由 Zhang Ting 提交于
-
由 Leo Chen 提交于
-
由 Wilber 提交于
-
由 Qi Li 提交于
-
由 minghaoBD 提交于
-
由 lilong12 提交于
* update
-
由 wawltor 提交于
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result * support the cuda for fix the compare broadcast bug
-
由 Shang Zhizhou 提交于
* fix tc trt shape * fix fc dynamic shape * add fc shape assert * update
-
由 Chen Weihang 提交于
-
由 Leo Chen 提交于
* use ZerosLike instead of NPUMemsetAsync * fix compile
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug
-
- 24 4月, 2021 1 次提交
-
-
由 winter-wang 提交于
-
- 23 4月, 2021 4 次提交
-
-
由 lilong12 提交于
* add c_identity op, test=develop
-
由 Aurelius84 提交于
* Refine Constructor logic of ParallelExecutor * refine function name * refine code comment
-
由 Leo Chen 提交于
* refactor_check_finite_and_scale_npu_kernel * fix compile * add alloc_float_status op * add alloc_float_status op * add FloatStatus for check_finite_and_unscale * refine code * remove unneccessary logic * refine for fleet
-
由 ceci3 提交于
-