1. 11 1月, 2021 7 次提交
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • Z
      [Cherry pick] improve dropout (#30260) · b4931ab1
      Zhang Ting 提交于
      * improve dropout (#29465)
      
      * improve drop out
      
      * add VectorizedRandomGeneratorWithGenerator
      
      * fix bug
      
      * modify according to comments
      
      * improve dropout grad (#29605)
      
      * improve grad perf
      
      * fix the bug of dropout_grad (#29813)
      b4931ab1
    • G
      [cherry-pick] softmax optimize (#30279) · b80beb16
      GaoWei8 提交于
      * Softmax vectorization (#29404)
      
      * vec softmax fw
      
      * vec softmax bw
      
      * add a message argument for compiler compatibility
      
      * optimize softmax forward (#30217)
      
      * optimize softmax forward
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      b80beb16
    • W
      Cherry-pick 30194 30164 30201(#30202) · 36de178a
      Wilber 提交于
      36de178a
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f
    • Q
      add aarch64 and sunway kunlun lib (#30027) (#30237) · eacbd488
      QingshuChen 提交于
      * add aarch64 and sunway kunlun lib
      
      * minor
      
      * optimize elementwise_add for kunlun
      
      * update kunlun dependence
      
      * minor
      
      * minor
      eacbd488
  2. 08 1月, 2021 3 次提交
  3. 07 1月, 2021 3 次提交
  4. 05 1月, 2021 1 次提交
  5. 29 12月, 2020 3 次提交
    • C
      [Cherry-pick] Complex network execute support (#29905) · 91ebc460
      Chen Weihang 提交于
      * [Complex] Add support for complex grad accumulated (#29889)
      
      * add support for complex grad accumulated
      
      * add unittest for coverage
      
      * update test dtype
      
      * remove useless blank line
      
      * [Complex] Handle complex to real after type promotion (#29855)
      
      * try to add fwd op input dtypes
      
      * refactor base impl
      
      * return tmp_ins after dygraph prepare data
      
      * fix typo found in debug
      
      * polish comment & add complex net test
      
      * revert detail change
      
      * fix unittest failed
      
      * add complex kernel condition control
      
      * fix xpu test failed & polish comment
      
      * polish details by review comments
      
      * Complex op test (#29753)
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * change grad elementwise_mul for complex types (#29757)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * modify grad of mul for complex types
      
      * fix the grads of inputs args order not match bug
      
      * change the grad of div when complex types (#29804)
      
      * change the grad of div when complex types
      
      * fix the grads of inputs args order not match bug
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      91ebc460
    • W
      Support mips (#29943) · 5a8d43bb
      Wilber 提交于
      5a8d43bb
    • T
      cherry pick heter ps (#29955) · a839ddca
      Thunderbrook 提交于
      * cherry pick heter ps
      
      *  CMakeList
      a839ddca
  6. 28 12月, 2020 1 次提交
  7. 25 12月, 2020 2 次提交
    • Q
      feat: support check_nan_inf for kunlun/xpu device (#29694) (#29898) · 41917fb5
      QingshuChen 提交于
      * feat: support check_nan_inf for kunlun device
      
      * support kunlun stack
      
      * minor
      41917fb5
    • T
      2 0 ps core 2 (#29894) · f781ab08
      tangwei12 提交于
      * add ps table (#29463)
      
      * add ps table
      
      Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178
      
      * add service (#29560)
      
      * add service, remove ut on mac
      
      * fix heter_profiler & add heter stop method
      
      * fix code style
      
      * merge pscore
      
      Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57
      
      * fix cmake
      
      Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb
      
      * fix conflit
      
      Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba
      
      * fix conflit
      
      Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa
      f781ab08
  8. 22 12月, 2020 2 次提交
  9. 21 12月, 2020 1 次提交
  10. 18 12月, 2020 1 次提交
    • C
      [Cherry-pick] Add complex api conj, real and imag (#29750) · ab5cc042
      Chen Weihang 提交于
      * Add complex dtype op (add) test example (#29603)
      
      
      * add op test case for complex
      
      * polish code details
      
      * add xpu set constant support
      
      * fix argument rror
      
      * remove useless pyc file
      
      * [Complex] Add real & imag op and api for complex tensor (#29672)
      
      * add complex real op & api & unittest
      
      * add imag op & api & unittest
      
      * refactor op impl
      
      * revert simplify writing due to complile failed
      
      * polish details
      
      * polish grad op code
      
      * add conj op for complex types (#29527)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      ab5cc042
  11. 17 12月, 2020 3 次提交
    • S
      [cherry-pick]fix matmulv2 bug & add rebuild group & fix bug of download (#29726) · df0430dc
      ShenLiang 提交于
      * Fix the dowanload bug in the case of multiple machines (#29551)
      
      * fix the dowanload bug
      * add sort for ips
      
      * Fix bug of matmul_v2 for broadcast case (#29599)
      
      * fix bug of matmul_v2 for broadcast
      
      * Rebuild group automatically in dynamic graph distributed (#29255)
      
      * add tensor_indices in AssignGroupBySize
      
      * add rebuild group in reducer
      
      * fix error message of gather nd (#29521)
      df0430dc
    • A
      [bug fix] Added verbose oneDNN lib version (#29671) · ef04d3d3
      arlesniak 提交于
       fix #27935 (comment) by QA @OliverLPH (Could you add some MKLDNN-related print log when use FLAGS_use_mkldnn?)
      ef04d3d3
    • T
      update activation op on kunlun (#29577) (#29717) · e82efc0c
      TTerror 提交于
      * fix expand && concat/transpose to new api
      
      * update xpu_header
      
      * update activation op on kunlun
      
      * update activation op on kunlun
      
      * update activation op on kunlun
      
      * update activation op on kunlun
      
      * update activation op on kunlun
      
      * add nearest_interp on kunlun
      
      * update error message
      e82efc0c
  12. 16 12月, 2020 1 次提交
  13. 15 12月, 2020 1 次提交
  14. 08 12月, 2020 3 次提交
  15. 07 12月, 2020 2 次提交
  16. 05 12月, 2020 1 次提交
    • C
      Release/2.0 rc1 (#29388) · fbb6cd70
      chentianyu03 提交于
      * fix random failed of complex matmul
      
      * Make transpose, trace, kron, reshape, sum op support complex type (#29321)
      
      * add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
      
      * add test cases for complex elementwise, matmul and getitem unittest
      
      * add test cases for complex types
      
      * add test cases for complex matmul unittest
      
      * kron, reshape, transpose support complex types
      
      * sum and trace op support complex types
      
      * add test case of sum and trace op
      
      * fix the bug of imag part of complex not initialized
      
      * format file
      
      * format code style
      
      * kron support type promotion; modify test cases
      fbb6cd70
  17. 04 12月, 2020 2 次提交
  18. 03 12月, 2020 2 次提交
    • L
      fix shape of tile_grad op (#29289) (#29324) · 8cd8cd53
      Leo Chen 提交于
      8cd8cd53
    • Z
      [Cherry-pick] Add pure fp16 training with master weights. (#29301) · d8ea8a06
      Zhen Wang 提交于
      * Add pure fp16 training with master weights. (#27712)
      
      * add the weight decay func for the momentum op
      
      * Add the multi_precision function in Momentum Optimizer.
      
      * Make sure that the initial value of master weights are same with the fp16 weights.
      
      * add static loss scaling.
      
      * add the rescale_grad function in the pure fp16 training.
      
      * use the original momentum updating method.
      
      * Polish some codes, such as variable names.
      
      * add docstring for apis.
      
      * update the var creation details of _create_master_weight.
      
      * not modify codes about imperative momentum updating.
      
      * Fix the error of test_dist_sparse_tensor_load_momentum UT.
      
      * add unit test for multi precision fp16 training.
      
      * add more unit tests for CI.
      
      * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
      d8ea8a06
  19. 01 12月, 2020 1 次提交