提交 · a8dfff991122ee208bd8b33010b38cccec27cf9b · 机器未来 / Paddle

02 2月, 2021 1 次提交

add DLA support：C++&&Python api (#30165) (#30810) · a8dfff99

由 Shang Zhizhou 提交于 2月 02, 2021

* add dla

* add python api
Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>
Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>

a8dfff99

27 1月, 2021 1 次提交
- W
  - Disabling oneDNN inplace pass (#30588) (#30710) · 5d604a6b
  由 Wojciech Uss 提交于 1月 27, 2021
```
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
```
  5d604a6b
22 1月, 2021 1 次提交
- P
  
  extend trt ut timeout threshold (#30633) · 02af1a62
  由 Pei Yang 提交于 1月 22, 2021
  
  02af1a62
21 1月, 2021 1 次提交
- Q
  
  fix softmax bug for multi_card in kunlun (#30600) (#30614) · c173887e
  由 QingshuChen 提交于 1月 21, 2021
  
  c173887e
20 1月, 2021 3 次提交
- A
  [cherry-pick]Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) (#30612) · fd9d6fda
  由 AshburnLee 提交于 1月 20, 2021
```
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)

* Fixed an error

* Fixed an error
```
  fd9d6fda
- A
  Add tf32 switch for cuDNN (#29192) (#30574) · 138a71b7
  由 AshburnLee 提交于 1月 20, 2021
```
This PR is cherry-picked from PR: #29192
Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
```
  138a71b7
- W
  
  fix compile error on sw and mips (#30584) · 619869bd
  由 Wilber 提交于 1月 20, 2021
  
  619869bd
19 1月, 2021 11 次提交
- P
  [Cherry-pick] PR 30520. fix error message of Inplace strategy (#30520) (#30568) · 40b3e752
  由 pangyoki 提交于 1月 19, 2021
```
Cherry pick PR #30520 .
Fix error message of Inplace strategy.
```
  40b3e752
- L
  [cherry-pick] support layer_norm fp16 in dygraph amp (#30430) #30566 · 0ea41e62
  由 Leo Chen 提交于 1月 19, 2021
```
[cherry-pick] support layer_norm fp16 in dygraph amp (#30430)
```
  0ea41e62
- Z
  fix bug of multicard grad ncclAllReduce (#30554) · 96058384
  由 Zhou Wei 提交于 1月 19, 2021
```
cherry-pick #30553
fix bug of multicard grad ncclAllReduce, the gradient accumulater of parameters should be keep order, otherwsie, it will influence multicard ncclAllReduce of grad.
```
  96058384
- L
  [Cherry-Pick] Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30564) · f15bed11
  由 liym27 提交于 1月 19, 2021
```
cherry-pick #30536
```
  f15bed11
- Z
  [Cherry-pick]Fix the compiling error of update_loss_scaling when using cuda9.(#30538) #30539 · e114f892
  由 Zhen Wang 提交于 1月 19, 2021
```
Fix the compiling error of update_loss_scaling when using cuda9.
```
  e114f892
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) (#30546) · 6f563ace
  由 hutuxian 提交于 1月 19, 2021
  
  6f563ace
- H
  
  Ascend Framework Part2: pybind files (#30410) (#30547) · 9b1031f3
  由 hutuxian 提交于 1月 19, 2021
  
  9b1031f3
- T
  【Cherry-Pick】add trainer number for pserver (#30524) · 3bdf1544
  由 tangwei12 提交于 1月 19, 2021
```
* add trainers for pserver

Change-Id: I99c0ab1cc427318f1f9bf8f8f5faff2b8890645d

* add trainers for pserver

Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8
```
  3bdf1544
- T
  Pd2.0 (#30532) · 1323e5e7
  由 taixiurong 提交于 1月 19, 2021
```
* support transformer v2.0

* fix range op crash in dygraph xpu place
```
  1323e5e7
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) (#30535) · 420fdbb2
  由 liuyuhui 提交于 1月 19, 2021
  
  420fdbb2
- J
  
  Recompute Offload: fixed bug in memcpy (#30484) (#30517) · 7a4ccf59
  由 JZ-LIANG 提交于 1月 19, 2021
  
  7a4ccf59
18 1月, 2021 5 次提交

L
fix cache key for inplaced elementwise ops (#30404) (#30478) · c2a4a50e
由 lidanqing 提交于 1月 18, 2021
```
Co-authored-by: NWojciech Uss <wojciech.uss@intel.com>
```
c2a4a50e

[cherry-pick]Modify the calculation logic of LambOptimizer (#29313) (#30510) · b3fa899b

由 guofei 提交于 1月 18, 2021

* Modify the calculation logic of LambOptimizer (#29313)

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

b3fa899b

C
[cherry-pick] add pad and concat double grad #29549 (#30432) · 5e4d54a1
由 ceci3 提交于 1月 18, 2021
```
* add pad and concat double grad

* resolve conflict
```
5e4d54a1

[cherry-pick] improve perfomance of cast and tril op (#30498) · de003cee

由 Zhang Ting 提交于 1月 18, 2021

* add fp16 support for tril_triu op (#30186)

* add VecCastCUDAKernel (#30296)
Co-authored-by: Nfurnace <34057289+windstamp@users.noreply.github.com>

de003cee

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea

由 pangyoki 提交于 1月 18, 2021

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

27c2f1ea

15 1月, 2021 6 次提交

Cherry pick 30072 (#30499) · 590e718b

由 pangyoki 提交于 1月 15, 2021

* Cherry-pick 30072, add dispenable input for core.ops.reshape2/expand/slice (#30072)

* add dispenable input 'shape' for core.ops.reshape2

* add dispenable inputs for core.ops.reshape2/expand/slice

* add ut

* save reshape update in pr 30180

* save reshape update v2 in pr 30180
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>

590e718b

Y
Fix float64 bug in layer norm (#30454) · c9d26423
由 Yang Zhang 提交于 1月 15, 2021
```
built-in `rsqrt` is shadowed
```
c9d26423

add transpose double grad , cherry-pick from #29600 (#30435) · badc6f22

由 lijianshe02 提交于 1月 15, 2021

* add transpose double grad test=develop (#29600)

* add transpose double grad test=develop

* cherry-pick test=develop

badc6f22

W

Support double backward rsqrt (#29589) (#30431) · 71ab8ae9
由 whs 提交于 1月 15, 2021

71ab8ae9

【Cherry-Pick】add distributed_infer (#30300) (#30427) · ae75affd

由 123malin 提交于 1月 15, 2021

* test=develop, add distributed_infer (#30300)

* test=develop, add distributed_infer

* test=develop, fix unittest cmakefile conflict

* test=develop, fix test_dist_fleet_base

ae75affd

W
Cherrypick fix rnn batch size diff (#30462) · e0e98627
由 wawltor 提交于 1月 15, 2021
```
* fix the rnn mask memory bug for out of read

* update the code for the rnn
```
e0e98627

14 1月, 2021 8 次提交
- S
  
  fix flatten api grad (#30426) (#30441) · 8b5307bf
  由 ShenLiang 提交于 1月 14, 2021
  
  8b5307bf
- L
  
  [cherry-pick] correct the allowed dimension size (#30326) (#30433) · 35c8eaf5
  由 lidanqing 提交于 1月 14, 2021
  
  35c8eaf5
- C
  
  skip quantizing ops in cpu inference (#30342) (#30405) · 2f16e0c6
  由 cc 提交于 1月 14, 2021
  
  2f16e0c6
- Q
  optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2
  由 QingshuChen 提交于 1月 14, 2021
```
* optimize memcpy perf for kunlun (#30291)

* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor

* fix bug that cann't find mkldnn(kunlun) (#30394)
```
  9de42be2
- L
  [cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a
  由 LielinJiang 提交于 1月 14, 2021
```
* Add double grad for conv_transpose (#29706)

* add double grad for conv_transpose

* register cudnn conv double grad for depthwise conv (#29807)
```
  1552343a
- Z
  
  [cherry-pick 2.0]enable MakeCipher api for inference (#30389) · ac70275a
  由 Zhang Jun 提交于 1月 14, 2021
  
  ac70275a
- A
  
  Added support for inference using quantization aware trained dygraph (#30288) (#30402) · 38faed7f
  由 alncat 提交于 1月 14, 2021
  
  38faed7f
- G
  Softmax backward optimize (#30249) (#30400) · 4cc0337f
  由 GaoWei8 提交于 1月 14, 2021
```
* softmax backward optimize
```
  4cc0337f
13 1月, 2021 3 次提交
- L
  [cherry-pick] Set expected place in child thread for dataloader #30383 · 9fb5a3e5
  由 Leo Chen 提交于 1月 13, 2021
```
* set expected place in child thread for dataloader

* set device id when set tensor from numpy

* revert tensor_py change

* add compile guard

* fix ci

* fix bug
```
  9fb5a3e5
- J
  
  Recompute Offload (#30233) (#30372) · 3fbc3cf4
  由 JZ-LIANG 提交于 1月 13, 2021
  
  3fbc3cf4
- S
  
  Support unused parameters in dynamic graph distributed (#30224) (#30374) · 020e2431
  由 ShenLiang 提交于 1月 13, 2021
  
  020e2431

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致