1. 14 1月, 2021 9 次提交
    • Q
      optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2
      QingshuChen 提交于
      * optimize memcpy perf for kunlun (#30291)
      
      * optimize memcpy perf for kunlun
      
      * remove useless unitest for kunlun mean
      
      * minor
      
      * fix bug that cann't find mkldnn(kunlun) (#30394)
      9de42be2
    • L
      [cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a
      LielinJiang 提交于
      * Add double grad for conv_transpose (#29706)
      
      * add double grad for conv_transpose
      
      * register cudnn conv double grad for depthwise conv (#29807)
      1552343a
    • Z
      ac70275a
    • A
    • B
      cherry-pick 30354 (#30407) · 5d30d072
      Bai Yifan 提交于
      5d30d072
    • C
      fix bug of celoss when using ignore_index and reduction (#30395) · c22ee575
      chajchaj 提交于
      * fix bug of celoss when using ignore_index and reduction (#30180)
      
      * fix bug of using ignore_index and reduction,test=develop
      
      * fix bug of celoss when using ignore_index and reduction, test=develop
      
      * improve performance when ignore_index=-100, test=develop
      
      * add test in test_cross_entropy_loss.py for coverage rate, test=develop
      
      * rm comment in test_cross_entropy_loss.py, test=develop
      
      * del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * del comment in python/paddle/nn/functional/loss.py, test=develop
      
      * del hard code and change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * fix bug of celoss when using ignore_index and reduction (#30180)
      
      * fix bug of using ignore_index and reduction,test=develop
      
      * fix bug of celoss when using ignore_index and reduction, test=develop
      
      * improve performance when ignore_index=-100, test=develop
      
      * add test in test_cross_entropy_loss.py for coverage rate, test=develop
      
      * rm comment in test_cross_entropy_loss.py, test=develop
      
      * del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * del comment in python/paddle/nn/functional/loss.py, test=develop
      
      * del hard code and change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      c22ee575
    • C
      fix (#30399) · e1bad4d7
      Chengmo 提交于
      Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
      e1bad4d7
    • W
      fix compile error on ARM (#30390) · 14b60947
      Wilber 提交于
      14b60947
    • G
      Softmax backward optimize (#30249) (#30400) · 4cc0337f
      GaoWei8 提交于
      * softmax backward optimize
      4cc0337f
  2. 13 1月, 2021 9 次提交
  3. 12 1月, 2021 9 次提交
  4. 11 1月, 2021 13 次提交
    • L
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e
      liym27 提交于
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)
      
      Cherry-Pick #30126
      1. Support vector<float64> as type of op attribute.
      2. op set_value suppports float64 numpy.array
      d839761e
    • L
      [cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
      Leo Chen 提交于
      [cherry-pick] Async drop scope in executor (#29714)
      93ce7f69
    • L
      [Cherry-Pick 2.0] Check the rank of input in kernel of set_value op (#30147) (#30301) · a2bbd06a
      liym27 提交于
      cherry-pick #30147,For op set_value, check input's rank < 7
      a2bbd06a
    • W
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c
      WeiXin 提交于
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)
      
      为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
      原始PR:#30161
      04cc659c
    • W
      [cherry pick] Fix bug for 'save mutiple method' (#30218) (#30278) · d9c70217
      WeiXin 提交于
      * Fix bug for 'save mutiple method'
      
      * To pass coverage.
      
      * edit code to pass coverage.
      
      * edit code to pass coverage.
      
      * add unittest for coverage.
      
      * change for coverage.
      
      * edit for coverage.
      d9c70217
    • P
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65
      pangyoki 提交于
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)
      
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      7c943a65
    • Z
      [cherry-pick]add cast cuda kernel (#29352) #30263 · afbc6367
      Zhang Ting 提交于
       add cast cuda kernel
      
      cherry-pick #29352
      afbc6367
    • H
      [Cherry-pick] Add Static Variable Clone (#30208) #30270 · 6dd70b9b
      Huihuang Zheng 提交于
      Cherry-pick of PR #30208 , this PR added clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat where users called clone of dygraph Tensor.
      6dd70b9b
    • W
      [cherry-pick]add support for place string representation #30264 · fb66355e
      wangchaochaohu 提交于
      cherry-pick #28769, add support for place string representation 
      fb66355e
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • Z
      [Cherry pick] improve dropout (#30260) · b4931ab1
      Zhang Ting 提交于
      * improve dropout (#29465)
      
      * improve drop out
      
      * add VectorizedRandomGeneratorWithGenerator
      
      * fix bug
      
      * modify according to comments
      
      * improve dropout grad (#29605)
      
      * improve grad perf
      
      * fix the bug of dropout_grad (#29813)
      b4931ab1
    • G
      [cherry-pick] softmax optimize (#30279) · b80beb16
      GaoWei8 提交于
      * Softmax vectorization (#29404)
      
      * vec softmax fw
      
      * vec softmax bw
      
      * add a message argument for compiler compatibility
      
      * optimize softmax forward (#30217)
      
      * optimize softmax forward
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      b80beb16