1. 08 1月, 2021 11 次提交
  2. 07 1月, 2021 5 次提交
    • W
      [cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613
      WeiXin 提交于
      * Support storage of large parameters (#29988)
      
      * Support storage of large parameters
      
      * Reduce the complexity of the unittest
      
      * Reduce the complexity of the unittest,commented out unittest for
      
      * add unittest for static.save/load
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
      
      * Extend the timeout for the (#30151)
      bfb6f613
    • L
      [cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad
      Leo Chen 提交于
      * Improve performance of elementwise_add grad op (#29187)
      
      * pass stop_gradient for cast op
      
      * improve performance of elementwise_add grad
      
      * use tensor copy async
      
      * dygraph branch
      
      * fix dygraph branch
      
      * add ut
      
      * make gelu fp16 computing more robust (#29484)
      
      * Add fast path for dropout when p == 0  (#29553)
      
      * add fast path for p == 0 in dropout
      
      * add ut
      07f68fad
    • F
      [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
      furnace 提交于
      * Layer norm fp16 (#29169)
      
      * add fp16 for layer_norm op
      
      * revert layernorm api
      
      * fix forward
      
      * fix forward
      
      * fix backward for layernorm with fp16
      
      * fix unit test for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * fix layer_norm accuracy (#29434)
      
      * Layernorm opt (#29522)
      
      * layernorm fw opt
      
      * layernorm bw opt
      
      * fix typo, test=develop
      
      * remove const dim3 for windows CI compatibility
      
      * merge develop
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      
      * Fix compile problem when cuda_arch < 6000 (#29576)
      
      * fix compile problem when cuda_arch < 6000
      
      * refine code
      
      * refine code
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      44b81e63
    • T
      pre padding in dygraph (#30179) · a2b0357d
      tangwei12 提交于
      Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
      a2b0357d
    • C
      Cherry pick bn (#30136) · 157ff094
      ceci3 提交于
      * fix bn docs (#30096)
      
      * add attribute for batch_norm (#29950)
      
      * add attribute for batch_norm
      157ff094
  3. 06 1月, 2021 3 次提交
  4. 05 1月, 2021 6 次提交
  5. 04 1月, 2021 1 次提交
  6. 31 12月, 2020 5 次提交
  7. 30 12月, 2020 3 次提交
  8. 29 12月, 2020 5 次提交
    • L
      [Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172
      liuyuhui 提交于
      * [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
      
      * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)
      
      * add bkcl.so in whl for kunlun (#29947)
      
      * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
      Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
      847aa172
    • C
      [Cherry-pick] Complex network execute support (#29905) · 91ebc460
      Chen Weihang 提交于
      * [Complex] Add support for complex grad accumulated (#29889)
      
      * add support for complex grad accumulated
      
      * add unittest for coverage
      
      * update test dtype
      
      * remove useless blank line
      
      * [Complex] Handle complex to real after type promotion (#29855)
      
      * try to add fwd op input dtypes
      
      * refactor base impl
      
      * return tmp_ins after dygraph prepare data
      
      * fix typo found in debug
      
      * polish comment & add complex net test
      
      * revert detail change
      
      * fix unittest failed
      
      * add complex kernel condition control
      
      * fix xpu test failed & polish comment
      
      * polish details by review comments
      
      * Complex op test (#29753)
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * change grad elementwise_mul for complex types (#29757)
      
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      
      * delete no need to calculate inputs in dygraph op_test
      
      * delete no need to calculate inputs in dygraph op_test
      
      * modify grad of mul for complex types
      
      * fix the grads of inputs args order not match bug
      
      * change the grad of div when complex types (#29804)
      
      * change the grad of div when complex types
      
      * fix the grads of inputs args order not match bug
      Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
      91ebc460
    • T
      cherry pick heter ps (#29955) · a839ddca
      Thunderbrook 提交于
      * cherry pick heter ps
      
      *  CMakeList
      a839ddca
    • L
      Fix Conv2DTanspose bug when padding='same' (#29915) (#29936) · acb29ff8
      LielinJiang 提交于
      * fix conv_transpose bug when padding=same
      acb29ff8
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 1 #29928 (#29960) · c9c835b5
      XiaoguangHu 提交于
      * [cherry-pick] cherry-pick of PR#29928
      
      * delete paddle.metric.chunk_eval and paddle.metric.mean_iou
      
      * delete paddle.nn.clip and paddle.nn.clip_by_norm
      
      * delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish
      
      * [cherry-pick] cherry-pick of PR#29928
      
      * fix extension import error
      c9c835b5
  9. 28 12月, 2020 1 次提交