- 27 1月, 2021 1 次提交
-
-
由 Wojciech Uss 提交于
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
-
- 20 1月, 2021 3 次提交
-
-
由 AshburnLee 提交于
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) * Fixed an error * Fixed an error
-
由 AshburnLee 提交于
This PR is cherry-picked from PR: #29192 Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
-
由 Wilber 提交于
-
- 19 1月, 2021 1 次提交
-
-
由 liuyuhui 提交于
-
- 14 1月, 2021 1 次提交
-
-
由 QingshuChen 提交于
* optimize memcpy perf for kunlun (#30291) * optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor * fix bug that cann't find mkldnn(kunlun) (#30394)
-
- 13 1月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
[Cherry-pick] Remove c++ stacktrace open hint,cherry-pick of #30325
-
- 11 1月, 2021 1 次提交
-
-
由 WeiXin 提交于
[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280) 为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。 原始PR:#30161
-
- 29 12月, 2020 5 次提交
-
-
由 liuyuhui 提交于
* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) * add bkcl.so in whl for kunlun (#29947) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29961) Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
-
由 Chen Weihang 提交于
* [Complex] Add support for complex grad accumulated (#29889) * add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line * [Complex] Handle complex to real after type promotion (#29855) * try to add fwd op input dtypes * refactor base impl * return tmp_ins after dygraph prepare data * fix typo found in debug * polish comment & add complex net test * revert detail change * fix unittest failed * add complex kernel condition control * fix xpu test failed & polish comment * polish details by review comments * Complex op test (#29753) * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * change grad elementwise_mul for complex types (#29757) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * modify grad of mul for complex types * fix the grads of inputs args order not match bug * change the grad of div when complex types (#29804) * change the grad of div when complex types * fix the grads of inputs args order not match bug Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
-
由 石晓伟 提交于
-
由 Wilber 提交于
-
由 Wilber 提交于
* [Inference] FLAGS_call_statck is turned on default when ON_INFER=ON * cherry-pick 29828
-
- 28 12月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
* [Dy2stat] Enable jit.save to Save Without Running (#29579) Enable jit.save to Save Without Running. * Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617) Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
-
- 21 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 17 12月, 2020 1 次提交
-
-
由 arlesniak 提交于
fix #27935 (comment) by QA @OliverLPH (Could you add some MKLDNN-related print log when use FLAGS_use_mkldnn?)
-
- 15 12月, 2020 1 次提交
-
-
由 QingshuChen 提交于
* support mobilenet for kunlun (#29458) * add xpu ops for training transformer in kunlun (#29539) * 1.fix matmul bug 2. add one hot * add xpu error msg Co-authored-by: Nprocr <procrboo@gmail.com> Co-authored-by: Ntaixiurong <taixiurong@126.com>
-
- 08 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
* add deformable_conv op on xpu (#29234) * rebase develop * update deformable_conv op on xpu * update deformable_conv op on xpu * update kunlun conv2d/softmax/elementwise implemetation (#29229) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun * support global pooling for kunlun (#29293) * test=kunlun * update reduce_sum op on xpu (#29367) * update reduce_sum op on xpu * update reduce_sum op on xpu * support running on xpu * fix expand/uniform_random && concat/transpose to new api on xpu (#29280) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header * 1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448) Co-authored-by: Nroot <root@bjhw-sys-rpm0223.bjhw.baidu.com> Co-authored-by: N卖鱼的哲学 <tangzhiyi11@users.noreply.github.com> Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com> Co-authored-by: Ntaixiurong <taixiurong@126.com> Co-authored-by: Nroot <root@bjhw-sys-rpm0223.bjhw.baidu.com>
-
- 05 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* fix random failed of complex matmul * Make transpose, trace, kron, reshape, sum op support complex type (#29321) * add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases
-
- 04 12月, 2020 2 次提交
-
-
由 lilong12 提交于
-
由 Chen Weihang 提交于
* basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments
-
- 01 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest
-
- 27 11月, 2020 5 次提交
-
-
由 ShenLiang 提交于
* add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document
-
由 Zhou Wei 提交于
-
由 arlesniak 提交于
-
由 Shang Zhizhou 提交于
* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
-
由 Leo Chen 提交于
-
- 26 11月, 2020 1 次提交
-
-
由 Aurelius84 提交于
-
- 25 11月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
* default not show cpp statck & add hint * fix failed unittest * fix failed unittests
-
由 wawltor 提交于
remove eigen threadpool for the speed up
-
- 23 11月, 2020 2 次提交
-
-
由 Jacek Czaja 提交于
-
由 Pei Yang 提交于
* change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg
-
- 20 11月, 2020 2 次提交
-
-
由 gongweibao 提交于
-
由 QingshuChen 提交于
* adjust kunlun header file *test=kunlun * update kunlun unittest *test=kunlun * update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun
-
- 17 11月, 2020 2 次提交
-
-
由 Jacek Czaja 提交于
-
由 lilong12 提交于
-
- 13 11月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 04 11月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 03 11月, 2020 2 次提交
-
-
由 Shang Zhizhou 提交于
* fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code
-
由 Jacek Czaja 提交于
-