1. 13 1月, 2021 1 次提交
  2. 11 1月, 2021 1 次提交
  3. 29 12月, 2020 5 次提交
    • L
      [Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172
      liuyuhui 提交于
      * [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
      
      * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)
      
      * add bkcl.so in whl for kunlun (#29947)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
      Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
      847aa172
    • C
      [Cherry-pick] Complex network execute support (#29905) · 91ebc460
      Chen Weihang 提交于
      * [Complex] Add support for complex grad accumulated (#29889)
      
      * add support for complex grad accumulated
      
      * add unittest for coverage
      
      * update test dtype
      
      * remove useless blank line
      
      * [Complex] Handle complex to real after type promotion (#29855)
      
      * try to add fwd op input dtypes
      
      * refactor base impl
      
      * return tmp_ins after dygraph prepare data
      
      * fix typo found in debug
      
      * polish comment & add complex net test
      
      * revert detail change
      
      * fix unittest failed
      
      * add complex kernel condition control
      
      * fix xpu test failed & polish comment
      
      * polish details by review comments
      
      * Complex op test (#29753)
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * change grad elementwise_mul for complex types (#29757)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * modify grad of mul for complex types
      
      * fix the grads of inputs args order not match bug
      
      * change the grad of div when complex types (#29804)
      
      * change the grad of div when complex types
      
      * fix the grads of inputs args order not match bug
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      91ebc460
    • [cherry-pick] #26920 , #22924 (#29948) · bea300dd
      石晓伟 提交于
      bea300dd
    • W
      Support mips (#29943) · 5a8d43bb
      Wilber 提交于
      5a8d43bb
    • W
      [Inference] FLAGS_call_statck is turned on default when ON_INFER=ON (#29800) · fae406ae
      Wilber 提交于
      * [Inference] FLAGS_call_statck is turned on default when ON_INFER=ON
      
      * cherry-pick 29828
      fae406ae
  4. 28 12月, 2020 1 次提交
    • H
      [Cherry-pick] Cherry-pick of PR#29579 and PR#29617 (#29904) · 63939597
      Huihuang Zheng 提交于
      * [Dy2stat] Enable jit.save to Save Without Running (#29579)
      
      Enable jit.save to Save Without Running.
      
      * Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
      
      Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
      63939597
  5. 21 12月, 2020 1 次提交
  6. 17 12月, 2020 1 次提交
  7. 15 12月, 2020 1 次提交
  8. 08 12月, 2020 1 次提交
  9. 05 12月, 2020 1 次提交
    • C
      Release/2.0 rc1 (#29388) · fbb6cd70
      chentianyu03 提交于
      * fix random failed of complex matmul
      
      * Make transpose, trace, kron, reshape, sum op support complex type (#29321)
      
      * add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
      
      * add test cases for complex elementwise, matmul and getitem unittest
      
      * add test cases for complex types
      
      * add test cases for complex matmul unittest
      
      * kron, reshape, transpose support complex types
      
      * sum and trace op support complex types
      
      * add test case of sum and trace op
      
      * fix the bug of imag part of complex not initialized
      
      * format file
      
      * format code style
      
      * kron support type promotion; modify test cases
      fbb6cd70
  10. 04 12月, 2020 2 次提交
  11. 01 12月, 2020 1 次提交
  12. 27 11月, 2020 5 次提交
    • S
      Support dynamic graph distributed (#28997) · e2d01eb6
      ShenLiang 提交于
      * add reducer
      
      * refine envent for memorycopy
      
      * add concat&split for allreduce
      
      * apply concat & split for fuse tensor
      
      * fix nccl dep
      
      * fix the untest, compile problem and ddp initialize problem
      
      * fix untest for mac & add some comments & solve the repeated param in sublayers
      
      * fix untest for windows & fix document
      e2d01eb6
    • Z
      fix CUDA 11 error on windows (#29101) · e668cb07
      Zhou Wei 提交于
      e668cb07
    • A
      bc902044
    • S
      detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01
      Shang Zhizhou 提交于
      * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
      
      * comile with cuda9
      
      * add some unittest
      
      * notest;test=coverage
      
      * add unittest for trt plugin swish && split
      
      * update ernie unittest
      
      * fix some error message
      
      * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
      
      * fix comile errror when CUDA_ARCH_NAME < Pascal"
      
      * fix comile error
      
      * update unittest timeout
      
      * compile with cuda9
      
      * update error msg
      
      * fix code style
      
      * add some comments
      
      * add define IF_CUDA_ARCH_SUPPORT_FP16
      
      * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
      b9e76a01
    • L
      fix typo of flag name (#29154) · fd3fcb05
      Leo Chen 提交于
      fd3fcb05
  13. 26 11月, 2020 1 次提交
  14. 25 11月, 2020 2 次提交
  15. 23 11月, 2020 2 次提交
  16. 20 11月, 2020 2 次提交
  17. 17 11月, 2020 2 次提交
  18. 13 11月, 2020 1 次提交
  19. 04 11月, 2020 1 次提交
  20. 03 11月, 2020 4 次提交
  21. 02 11月, 2020 2 次提交
  22. 30 10月, 2020 1 次提交
  23. 28 10月, 2020 1 次提交