1. 11 1月, 2021 5 次提交
  2. 10 1月, 2021 1 次提交
  3. 08 1月, 2021 9 次提交
    • L
      [cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not... · 2ba9bdd7
      liym27 提交于
      [cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive #29965 (#30235)
      
      * [Cherry-Pick 2.0] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive (#29965)
      
      1. When x is Variable, call nn.shape(x) only in following cases:
       1)The shape of x is used in control flow condition.
       2)The dim to be used is negetive
      2. When x is Variable, but x.shape or x.shape[idx] doesn't contain negetive value, don't convert to paddle.shape()
      
      * [Cherry-Pick 2.0] [Dy2Stat] Use Paddle2.0 api paddle.tensor.array_* (#30156)
      2ba9bdd7
    • H
      [Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4
      huangxu96 提交于
      * Optimizer trans momentum (#29597)
      
      * merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.
      
      * Add unittest for 2.0  Momentum API.
      
      * fix some bugs in weight_decay.
      
      * add alias for fluid.contrib.mixed_precision (#29562)
      
      * add alias for fluid.contrib.mixed_precision
      
      * add static.amp into setup.pu.in (#29621)
      
      * add static.amp into setup.pu.in
      
      * add unittest for api
      
      * fix a bug in multi_precision_fp16 unittest. (#29756)
      9f7c66b4
    • L
      [cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39
      liym27 提交于
      [cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)
      
      1. when slice_item is a slice:
       1) the start of __getitem__ should be std::max(start, 0) if slice
       2) the start of __getitem__ should be std::min(end, dim)
      2. when slice_item is an integer, it should be in [-dim_len, dim_len)
      3. Fix error message to use accurate data
      5fe3da39
    • L
      [Cherry-Pick 2.0][setitem] Support Tensor setitem in static mode (#29708) (#30104) · f46ddc0e
      liym27 提交于
      1. Type of index: int, slice(step must be 1).
      
      2. Type of value:
       (1) int32, int64, float32, bool;
       (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
       (3) paddle.Tensor(int32, int64, float32, float64, bool);
      f46ddc0e
    • J
      Fix beam search bug (#29824) (#30140) · b2ca2cad
      Jiaqi Liu 提交于
      * fix beam search bug
      
      * add dygraph unittest
      
      * update dynamic_decode argument doc
      
      * add warning info for state which has no lengths attribute
      b2ca2cad
    • C
      [Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35
      Chen Weihang 提交于
      * simplify prepared op impl to improve performance
      
      * fix kunlun compile error
      
      * continue fix kunlun compile error
      
      * only transform diff place when dtype diff
      
      * fix failed unittests
      
      * remove useless file
      
      * polish impl by review comment
      0e3a1d35
    • 1
      【2.0API CherryPick】LookAhead, ModelAverage, IndexSelect (#30205) · 3ce4d34d
      123malin 提交于
      * Add Lookahead and ModelAverage Optimizer (#30004)
      
      * test=develop, add model_average and lookahead
      
      * Improve Index select cuda kernel (#30139)
      
      * test=develop, add index_select_cuda kernel
      3ce4d34d
    • C
      fix syncbn convert (#30158) (#30176) · 030d678c
      ceci3 提交于
      * fix syncbn convet
      
      * add unittest
      030d678c
    • C
      [Cherry-pick] Simplify the options of spawn based on fleetrun (#30144) (#30197) · 39204d56
      Chen Weihang 提交于
      * Simplify the options of spawn based on fleetrun (#30144)
      
      * Simplify the options of spawn based on fleetrun
      
      * polish details
      
      * polish doc details
      
      * cleanup enum test=develop (#29294)
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      39204d56
  4. 07 1月, 2021 5 次提交
    • W
      [cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613
      WeiXin 提交于
      * Support storage of large parameters (#29988)
      
      * Support storage of large parameters
      
      * Reduce the complexity of the unittest
      
      * Reduce the complexity of the unittest,commented out unittest for
      
      * add unittest for static.save/load
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Extend the timeout for the (#30151)
      bfb6f613
    • L
      [cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad
      Leo Chen 提交于
      * Improve performance of elementwise_add grad op (#29187)
      
      * pass stop_gradient for cast op
      
      * improve performance of elementwise_add grad
      
      * use tensor copy async
      
      * dygraph branch
      
      * fix dygraph branch
      
      * add ut
      
      * make gelu fp16 computing more robust (#29484)
      
      * Add fast path for dropout when p == 0  (#29553)
      
      * add fast path for p == 0 in dropout
      
      * add ut
      07f68fad
    • F
      [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
      furnace 提交于
      * Layer norm fp16 (#29169)
      
      * add fp16 for layer_norm op
      
      * revert layernorm api
      
      * fix forward
      
      * fix forward
      
      * fix backward for layernorm with fp16
      
      * fix unit test for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * fix layer_norm accuracy (#29434)
      
      * Layernorm opt (#29522)
      
      * layernorm fw opt
      
      * layernorm bw opt
      
      * fix typo, test=develop
      
      * remove const dim3 for windows CI compatibility
      
      * merge develop
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      
      * Fix compile problem when cuda_arch < 6000 (#29576)
      
      * fix compile problem when cuda_arch < 6000
      
      * refine code
      
      * refine code
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      44b81e63
    • T
      pre padding in dygraph (#30179) · a2b0357d
      tangwei12 提交于
      Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
      a2b0357d
    • C
      Cherry pick bn (#30136) · 157ff094
      ceci3 提交于
      * fix bn docs (#30096)
      
      * add attribute for batch_norm (#29950)
      
      * add attribute for batch_norm
      157ff094
  5. 06 1月, 2021 2 次提交
  6. 05 1月, 2021 4 次提交
  7. 04 1月, 2021 1 次提交
  8. 31 12月, 2020 3 次提交
  9. 30 12月, 2020 1 次提交
  10. 29 12月, 2020 5 次提交
    • L
      [Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172
      liuyuhui 提交于
      * [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
      
      * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)
      
      * add bkcl.so in whl for kunlun (#29947)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
      Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
      847aa172
    • C
      [Cherry-pick] Complex network execute support (#29905) · 91ebc460
      Chen Weihang 提交于
      * [Complex] Add support for complex grad accumulated (#29889)
      
      * add support for complex grad accumulated
      
      * add unittest for coverage
      
      * update test dtype
      
      * remove useless blank line
      
      * [Complex] Handle complex to real after type promotion (#29855)
      
      * try to add fwd op input dtypes
      
      * refactor base impl
      
      * return tmp_ins after dygraph prepare data
      
      * fix typo found in debug
      
      * polish comment & add complex net test
      
      * revert detail change
      
      * fix unittest failed
      
      * add complex kernel condition control
      
      * fix xpu test failed & polish comment
      
      * polish details by review comments
      
      * Complex op test (#29753)
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * change grad elementwise_mul for complex types (#29757)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * modify grad of mul for complex types
      
      * fix the grads of inputs args order not match bug
      
      * change the grad of div when complex types (#29804)
      
      * change the grad of div when complex types
      
      * fix the grads of inputs args order not match bug
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      91ebc460
    • T
      cherry pick heter ps (#29955) · a839ddca
      Thunderbrook 提交于
      * cherry pick heter ps
      
      *  CMakeList
      a839ddca
    • L
      Fix Conv2DTanspose bug when padding='same' (#29915) (#29936) · acb29ff8
      LielinJiang 提交于
      * fix conv_transpose bug when padding=same
      acb29ff8
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 1 #29928 (#29960) · c9c835b5
      XiaoguangHu 提交于
      * [cherry-pick] cherry-pick of PR#29928
      
      * delete paddle.metric.chunk_eval and paddle.metric.mean_iou
      
      * delete paddle.nn.clip and paddle.nn.clip_by_norm
      
      * delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish
      
      * [cherry-pick] cherry-pick of PR#29928
      
      * fix extension import error
      c9c835b5
  11. 28 12月, 2020 2 次提交
    • L
      [Cherry-Pick 2.0][Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step... · a8b6dd86
      liym27 提交于
      [Cherry-Pick 2.0][Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step value is negative in for-range stmts (#29519) (#29874)
      
      1. Fix error in _build_cond_stmt of for-range stmts.
      
      2. Support that step value is negative in for-range stmts
      
      3. Fix code because of the diff between Py2 and Py3
      a8b6dd86
    • H
      [Cherry-pick] Cherry-pick of PR#29579 and PR#29617 (#29904) · 63939597
      Huihuang Zheng 提交于
      * [Dy2stat] Enable jit.save to Save Without Running (#29579)
      
      Enable jit.save to Save Without Running.
      
      * Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
      
      Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
      63939597
  12. 25 12月, 2020 2 次提交
    • Q
      feat: support check_nan_inf for kunlun/xpu device (#29694) (#29898) · 41917fb5
      QingshuChen 提交于
      * feat: support check_nan_inf for kunlun device
      
      * support kunlun stack
      
      * minor
      41917fb5
    • T
      2 0 ps core 2 (#29894) · f781ab08
      tangwei12 提交于
      * add ps table (#29463)
      
      * add ps table
      
      Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178
      
      * add service (#29560)
      
      * add service, remove ut on mac
      
      * fix heter_profiler & add heter stop method
      
      * fix code style
      
      * merge pscore
      
      Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57
      
      * fix cmake
      
      Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb
      
      * fix conflit
      
      Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba
      
      * fix conflit
      
      Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa
      f781ab08