1. 11 1月, 2021 7 次提交
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • A
      Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
      Aurelius84 提交于
      * fix tensor shape bug
      
      * fix op_num
      
      * clean code
      55604248
    • G
      Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
      guofei 提交于
      * Quantization supports 2.0 APIs
      
      * Fix the error of save_quantized_model
      393a91f1
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f
    • C
      [Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
      Chen Weihang 提交于
      att, cherry-pick of #30219
      1fa98c5d
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 2 (#30244) · 70cbde83
      XiaoguangHu 提交于
      * fix dynamic to static error
      
      * delete paddle.nn.functional.assign
      70cbde83
  2. 10 1月, 2021 1 次提交
  3. 08 1月, 2021 9 次提交
    • L
      [cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not... · 2ba9bdd7
      liym27 提交于
      [cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive #29965 (#30235)
      
      * [Cherry-Pick 2.0] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive (#29965)
      
      1. When x is Variable, call nn.shape(x) only in following cases:
       1)The shape of x is used in control flow condition.
       2)The dim to be used is negetive
      2. When x is Variable, but x.shape or x.shape[idx] doesn't contain negetive value, don't convert to paddle.shape()
      
      * [Cherry-Pick 2.0] [Dy2Stat] Use Paddle2.0 api paddle.tensor.array_* (#30156)
      2ba9bdd7
    • H
      [Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4
      huangxu96 提交于
      * Optimizer trans momentum (#29597)
      
      * merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.
      
      * Add unittest for 2.0  Momentum API.
      
      * fix some bugs in weight_decay.
      
      * add alias for fluid.contrib.mixed_precision (#29562)
      
      * add alias for fluid.contrib.mixed_precision
      
      * add static.amp into setup.pu.in (#29621)
      
      * add static.amp into setup.pu.in
      
      * add unittest for api
      
      * fix a bug in multi_precision_fp16 unittest. (#29756)
      9f7c66b4
    • L
      [cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39
      liym27 提交于
      [cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)
      
      1. when slice_item is a slice:
       1) the start of __getitem__ should be std::max(start, 0) if slice
       2) the start of __getitem__ should be std::min(end, dim)
      2. when slice_item is an integer, it should be in [-dim_len, dim_len)
      3. Fix error message to use accurate data
      5fe3da39
    • L
      [Cherry-Pick 2.0][setitem] Support Tensor setitem in static mode (#29708) (#30104) · f46ddc0e
      liym27 提交于
      1. Type of index: int, slice(step must be 1).
      
      2. Type of value:
       (1) int32, int64, float32, bool;
       (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
       (3) paddle.Tensor(int32, int64, float32, float64, bool);
      f46ddc0e
    • J
      Fix beam search bug (#29824) (#30140) · b2ca2cad
      Jiaqi Liu 提交于
      * fix beam search bug
      
      * add dygraph unittest
      
      * update dynamic_decode argument doc
      
      * add warning info for state which has no lengths attribute
      b2ca2cad
    • C
      [Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35
      Chen Weihang 提交于
      * simplify prepared op impl to improve performance
      
      * fix kunlun compile error
      
      * continue fix kunlun compile error
      
      * only transform diff place when dtype diff
      
      * fix failed unittests
      
      * remove useless file
      
      * polish impl by review comment
      0e3a1d35
    • 1
      【2.0API CherryPick】LookAhead, ModelAverage, IndexSelect (#30205) · 3ce4d34d
      123malin 提交于
      * Add Lookahead and ModelAverage Optimizer (#30004)
      
      * test=develop, add model_average and lookahead
      
      * Improve Index select cuda kernel (#30139)
      
      * test=develop, add index_select_cuda kernel
      3ce4d34d
    • C
      fix syncbn convert (#30158) (#30176) · 030d678c
      ceci3 提交于
      * fix syncbn convet
      
      * add unittest
      030d678c
    • C
      [Cherry-pick] Simplify the options of spawn based on fleetrun (#30144) (#30197) · 39204d56
      Chen Weihang 提交于
      * Simplify the options of spawn based on fleetrun (#30144)
      
      * Simplify the options of spawn based on fleetrun
      
      * polish details
      
      * polish doc details
      
      * cleanup enum test=develop (#29294)
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      39204d56
  4. 07 1月, 2021 5 次提交
    • W
      [cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613
      WeiXin 提交于
      * Support storage of large parameters (#29988)
      
      * Support storage of large parameters
      
      * Reduce the complexity of the unittest
      
      * Reduce the complexity of the unittest,commented out unittest for
      
      * add unittest for static.save/load
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Extend the timeout for the (#30151)
      bfb6f613
    • L
      [cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad
      Leo Chen 提交于
      * Improve performance of elementwise_add grad op (#29187)
      
      * pass stop_gradient for cast op
      
      * improve performance of elementwise_add grad
      
      * use tensor copy async
      
      * dygraph branch
      
      * fix dygraph branch
      
      * add ut
      
      * make gelu fp16 computing more robust (#29484)
      
      * Add fast path for dropout when p == 0  (#29553)
      
      * add fast path for p == 0 in dropout
      
      * add ut
      07f68fad
    • F
      [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
      furnace 提交于
      * Layer norm fp16 (#29169)
      
      * add fp16 for layer_norm op
      
      * revert layernorm api
      
      * fix forward
      
      * fix forward
      
      * fix backward for layernorm with fp16
      
      * fix unit test for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * fix layer_norm accuracy (#29434)
      
      * Layernorm opt (#29522)
      
      * layernorm fw opt
      
      * layernorm bw opt
      
      * fix typo, test=develop
      
      * remove const dim3 for windows CI compatibility
      
      * merge develop
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      
      * Fix compile problem when cuda_arch < 6000 (#29576)
      
      * fix compile problem when cuda_arch < 6000
      
      * refine code
      
      * refine code
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      44b81e63
    • T
      pre padding in dygraph (#30179) · a2b0357d
      tangwei12 提交于
      Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
      a2b0357d
    • C
      Cherry pick bn (#30136) · 157ff094
      ceci3 提交于
      * fix bn docs (#30096)
      
      * add attribute for batch_norm (#29950)
      
      * add attribute for batch_norm
      157ff094
  5. 06 1月, 2021 2 次提交
  6. 05 1月, 2021 4 次提交
  7. 04 1月, 2021 1 次提交
  8. 31 12月, 2020 3 次提交
  9. 30 12月, 2020 1 次提交
  10. 29 12月, 2020 5 次提交
    • L
      [Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172
      liuyuhui 提交于
      * [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
      
      * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)
      
      * add bkcl.so in whl for kunlun (#29947)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
      Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
      847aa172
    • C
      [Cherry-pick] Complex network execute support (#29905) · 91ebc460
      Chen Weihang 提交于
      * [Complex] Add support for complex grad accumulated (#29889)
      
      * add support for complex grad accumulated
      
      * add unittest for coverage
      
      * update test dtype
      
      * remove useless blank line
      
      * [Complex] Handle complex to real after type promotion (#29855)
      
      * try to add fwd op input dtypes
      
      * refactor base impl
      
      * return tmp_ins after dygraph prepare data
      
      * fix typo found in debug
      
      * polish comment & add complex net test
      
      * revert detail change
      
      * fix unittest failed
      
      * add complex kernel condition control
      
      * fix xpu test failed & polish comment
      
      * polish details by review comments
      
      * Complex op test (#29753)
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * change grad elementwise_mul for complex types (#29757)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * modify grad of mul for complex types
      
      * fix the grads of inputs args order not match bug
      
      * change the grad of div when complex types (#29804)
      
      * change the grad of div when complex types
      
      * fix the grads of inputs args order not match bug
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      91ebc460
    • T
      cherry pick heter ps (#29955) · a839ddca
      Thunderbrook 提交于
      * cherry pick heter ps
      
      *  CMakeList
      a839ddca
    • L
      Fix Conv2DTanspose bug when padding='same' (#29915) (#29936) · acb29ff8
      LielinJiang 提交于
      * fix conv_transpose bug when padding=same
      acb29ff8
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 1 #29928 (#29960) · c9c835b5
      XiaoguangHu 提交于
      * [cherry-pick] cherry-pick of PR#29928
      
      * delete paddle.metric.chunk_eval and paddle.metric.mean_iou
      
      * delete paddle.nn.clip and paddle.nn.clip_by_norm
      
      * delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish
      
      * [cherry-pick] cherry-pick of PR#29928
      
      * fix extension import error
      c9c835b5
  11. 28 12月, 2020 2 次提交
    • L
      [Cherry-Pick 2.0][Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step... · a8b6dd86
      liym27 提交于
      [Cherry-Pick 2.0][Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step value is negative in for-range stmts (#29519) (#29874)
      
      1. Fix error in _build_cond_stmt of for-range stmts.
      
      2. Support that step value is negative in for-range stmts
      
      3. Fix code because of the diff between Py2 and Py3
      a8b6dd86
    • H
      [Cherry-pick] Cherry-pick of PR#29579 and PR#29617 (#29904) · 63939597
      Huihuang Zheng 提交于
      * [Dy2stat] Enable jit.save to Save Without Running (#29579)
      
      Enable jit.save to Save Without Running.
      
      * Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
      
      Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
      63939597