1. 15 1月, 2021 5 次提交
  2. 14 1月, 2021 8 次提交
  3. 13 1月, 2021 6 次提交
  4. 12 1月, 2021 8 次提交
  5. 11 1月, 2021 13 次提交
    • L
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e
      liym27 提交于
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)
      
      Cherry-Pick #30126
      1. Support vector<float64> as type of op attribute.
      2. op set_value suppports float64 numpy.array
      d839761e
    • L
      [cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
      Leo Chen 提交于
      [cherry-pick] Async drop scope in executor (#29714)
      93ce7f69
    • L
      [Cherry-Pick 2.0] Check the rank of input in kernel of set_value op (#30147) (#30301) · a2bbd06a
      liym27 提交于
      cherry-pick #30147,For op set_value, check input's rank < 7
      a2bbd06a
    • W
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c
      WeiXin 提交于
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)
      
      为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
      原始PR:#30161
      04cc659c
    • P
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65
      pangyoki 提交于
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)
      
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      7c943a65
    • Z
      [cherry-pick]add cast cuda kernel (#29352) #30263 · afbc6367
      Zhang Ting 提交于
       add cast cuda kernel
      
      cherry-pick #29352
      afbc6367
    • W
      [cherry-pick]add support for place string representation #30264 · fb66355e
      wangchaochaohu 提交于
      cherry-pick #28769, add support for place string representation 
      fb66355e
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • Z
      [Cherry pick] improve dropout (#30260) · b4931ab1
      Zhang Ting 提交于
      * improve dropout (#29465)
      
      * improve drop out
      
      * add VectorizedRandomGeneratorWithGenerator
      
      * fix bug
      
      * modify according to comments
      
      * improve dropout grad (#29605)
      
      * improve grad perf
      
      * fix the bug of dropout_grad (#29813)
      b4931ab1
    • G
      [cherry-pick] softmax optimize (#30279) · b80beb16
      GaoWei8 提交于
      * Softmax vectorization (#29404)
      
      * vec softmax fw
      
      * vec softmax bw
      
      * add a message argument for compiler compatibility
      
      * optimize softmax forward (#30217)
      
      * optimize softmax forward
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      b80beb16
    • W
      Cherry-pick 30194 30164 30201(#30202) · 36de178a
      Wilber 提交于
      36de178a
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f