1. 15 1月, 2021 5 次提交
  2. 14 1月, 2021 9 次提交
    • C
      skip quantizing ops in cpu inference (#30342) (#30405) · 2f16e0c6
      cc 提交于
      2f16e0c6
    • W
    • C
      [Cherry-pick] Fix prune input bug of jit.save #30425 · 2cdc36f4
      Chen Weihang 提交于
      [Cherry-pick] Fix prune input bug of jit.save
      
      cheryy-pick of #30384
      2cdc36f4
    • Q
      optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2
      QingshuChen 提交于
      * optimize memcpy perf for kunlun (#30291)
      
      * optimize memcpy perf for kunlun
      
      * remove useless unitest for kunlun mean
      
      * minor
      
      * fix bug that cann't find mkldnn(kunlun) (#30394)
      9de42be2
    • L
      [cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a
      LielinJiang 提交于
      * Add double grad for conv_transpose (#29706)
      
      * add double grad for conv_transpose
      
      * register cudnn conv double grad for depthwise conv (#29807)
      1552343a
    • B
      cherry-pick 30354 (#30407) · 5d30d072
      Bai Yifan 提交于
      5d30d072
    • C
      fix bug of celoss when using ignore_index and reduction (#30395) · c22ee575
      chajchaj 提交于
      * fix bug of celoss when using ignore_index and reduction (#30180)
      
      * fix bug of using ignore_index and reduction,test=develop
      
      * fix bug of celoss when using ignore_index and reduction, test=develop
      
      * improve performance when ignore_index=-100, test=develop
      
      * add test in test_cross_entropy_loss.py for coverage rate, test=develop
      
      * rm comment in test_cross_entropy_loss.py, test=develop
      
      * del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * del comment in python/paddle/nn/functional/loss.py, test=develop
      
      * del hard code and change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * fix bug of celoss when using ignore_index and reduction (#30180)
      
      * fix bug of using ignore_index and reduction,test=develop
      
      * fix bug of celoss when using ignore_index and reduction, test=develop
      
      * improve performance when ignore_index=-100, test=develop
      
      * add test in test_cross_entropy_loss.py for coverage rate, test=develop
      
      * rm comment in test_cross_entropy_loss.py, test=develop
      
      * del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * del comment in python/paddle/nn/functional/loss.py, test=develop
      
      * del hard code and change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      
      * change mask to a more simplified implementation, test=develop
      c22ee575
    • C
      fix (#30399) · e1bad4d7
      Chengmo 提交于
      Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
      e1bad4d7
    • W
      fix compile error on ARM (#30390) · 14b60947
      Wilber 提交于
      14b60947
  3. 13 1月, 2021 7 次提交
  4. 12 1月, 2021 8 次提交
  5. 11 1月, 2021 11 次提交
    • L
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e
      liym27 提交于
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)
      
      Cherry-Pick #30126
      1. Support vector<float64> as type of op attribute.
      2. op set_value suppports float64 numpy.array
      d839761e
    • W
      [cherry pick] Fix bug for 'save mutiple method' (#30218) (#30278) · d9c70217
      WeiXin 提交于
      * Fix bug for 'save mutiple method'
      
      * To pass coverage.
      
      * edit code to pass coverage.
      
      * edit code to pass coverage.
      
      * add unittest for coverage.
      
      * change for coverage.
      
      * edit for coverage.
      d9c70217
    • P
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65
      pangyoki 提交于
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)
      
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      7c943a65
    • H
      [Cherry-pick] Add Static Variable Clone (#30208) #30270 · 6dd70b9b
      Huihuang Zheng 提交于
      Cherry-pick of PR #30208 , this PR added clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat where users called clone of dygraph Tensor.
      6dd70b9b
    • W
      [cherry-pick]add support for place string representation #30264 · fb66355e
      wangchaochaohu 提交于
      cherry-pick #28769, add support for place string representation 
      fb66355e
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • A
      Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
      Aurelius84 提交于
      * fix tensor shape bug
      
      * fix op_num
      
      * clean code
      55604248
    • G
      Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
      guofei 提交于
      * Quantization supports 2.0 APIs
      
      * Fix the error of save_quantized_model
      393a91f1
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f
    • C
      [Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
      Chen Weihang 提交于
      att, cherry-pick of #30219
      1fa98c5d