1. 11 1月, 2021 7 次提交
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • A
      Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
      Aurelius84 提交于
      * fix tensor shape bug
      
      * fix op_num
      
      * clean code
      55604248
    • G
      Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
      guofei 提交于
      * Quantization supports 2.0 APIs
      
      * Fix the error of save_quantized_model
      393a91f1
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f
    • C
      [Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
      Chen Weihang 提交于
      att, cherry-pick of #30219
      1fa98c5d
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 2 (#30244) · 70cbde83
      XiaoguangHu 提交于
      * fix dynamic to static error
      
      * delete paddle.nn.functional.assign
      70cbde83
  2. 10 1月, 2021 1 次提交
  3. 09 1月, 2021 1 次提交
  4. 08 1月, 2021 11 次提交
  5. 07 1月, 2021 5 次提交
    • W
      [cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613
      WeiXin 提交于
      * Support storage of large parameters (#29988)
      
      * Support storage of large parameters
      
      * Reduce the complexity of the unittest
      
      * Reduce the complexity of the unittest,commented out unittest for
      
      * add unittest for static.save/load
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Extend the timeout for the (#30151)
      bfb6f613
    • L
      [cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad
      Leo Chen 提交于
      * Improve performance of elementwise_add grad op (#29187)
      
      * pass stop_gradient for cast op
      
      * improve performance of elementwise_add grad
      
      * use tensor copy async
      
      * dygraph branch
      
      * fix dygraph branch
      
      * add ut
      
      * make gelu fp16 computing more robust (#29484)
      
      * Add fast path for dropout when p == 0  (#29553)
      
      * add fast path for p == 0 in dropout
      
      * add ut
      07f68fad
    • F
      [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
      furnace 提交于
      * Layer norm fp16 (#29169)
      
      * add fp16 for layer_norm op
      
      * revert layernorm api
      
      * fix forward
      
      * fix forward
      
      * fix backward for layernorm with fp16
      
      * fix unit test for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * fix layer_norm accuracy (#29434)
      
      * Layernorm opt (#29522)
      
      * layernorm fw opt
      
      * layernorm bw opt
      
      * fix typo, test=develop
      
      * remove const dim3 for windows CI compatibility
      
      * merge develop
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      
      * Fix compile problem when cuda_arch < 6000 (#29576)
      
      * fix compile problem when cuda_arch < 6000
      
      * refine code
      
      * refine code
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      44b81e63
    • T
      pre padding in dygraph (#30179) · a2b0357d
      tangwei12 提交于
      Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
      a2b0357d
    • C
      Cherry pick bn (#30136) · 157ff094
      ceci3 提交于
      * fix bn docs (#30096)
      
      * add attribute for batch_norm (#29950)
      
      * add attribute for batch_norm
      157ff094
  6. 06 1月, 2021 3 次提交
  7. 05 1月, 2021 6 次提交
  8. 04 1月, 2021 1 次提交
  9. 31 12月, 2020 5 次提交