提交 · ff4612a3435e903eaf2c331c9820430ac554e88d · 机器未来 / Paddle

01 3月, 2021 1 次提交

[Cherry pick] cherry-pick #31102 #30750 #30626 (#31336) · ff4612a3

由 Thunderbrook 提交于 3月 01, 2021

* solve build gpu task core (#30626)

* build gpu task core

* format

* dump to cpu (#30750)

* dump to cpu

* format

* format

* format

* support multi node in heterps (#31102)

* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829

* optimizer

ff4612a3

23 2月, 2021 2 次提交

[CustomOp] New custom operator extension mechanism in 2.0.1 (#31097) · a19154ca

由 Chen Weihang 提交于 2月 23, 2021

[CustomOp] New custom operator extension mechanism in 2.0.1

Cherry-pick New custom operator basic implementation related PRs

a19154ca

test=develop, save/load, shrink (#30625) (#31107) · 36710ebc

由 tangwei12 提交于 2月 23, 2021

* test=develop, save/load, shrink
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
Co-authored-by: N123malin <malin10@baidu.com>

36710ebc

19 2月, 2021 1 次提交
- W
  
  cherry-pick pr (#31043) · 656124da
  由 Wilber 提交于 2月 19, 2021
  
  656124da
04 2月, 2021 1 次提交
- 石
  
  support xpu with analysis predictor, test=develop (#30832) (#30863) · d199edd8
  由石晓伟提交于 2月 04, 2021
  
  d199edd8
02 2月, 2021 1 次提交

add DLA support：C++&&Python api (#30165) (#30810) · a8dfff99

由 Shang Zhizhou 提交于 2月 02, 2021

* add dla

* add python api
Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>
Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>

a8dfff99

20 1月, 2021 2 次提交

A
[cherry-pick]Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) (#30612) · fd9d6fda
由 AshburnLee 提交于 1月 20, 2021
```
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)

* Fixed an error

* Fixed an error
```
fd9d6fda

Add tf32 switch for cuDNN (#29192) (#30574) · 138a71b7

由 AshburnLee 提交于 1月 20, 2021

This PR is cherry-picked from PR: #29192
Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False

138a71b7

19 1月, 2021 2 次提交
- L
  [cherry-pick] support layer_norm fp16 in dygraph amp (#30430) #30566 · 0ea41e62
  由 Leo Chen 提交于 1月 19, 2021
```
[cherry-pick] support layer_norm fp16 in dygraph amp (#30430)
```
  0ea41e62
- H
  
  Ascend Framework Part2: pybind files (#30410) (#30547) · 9b1031f3
  由 hutuxian 提交于 1月 19, 2021
  
  9b1031f3
18 1月, 2021 2 次提交

[cherry-pick]Modify the calculation logic of LambOptimizer (#29313) (#30510) · b3fa899b

由 guofei 提交于 1月 18, 2021

* Modify the calculation logic of LambOptimizer (#29313)

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

b3fa899b

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea

由 pangyoki 提交于 1月 18, 2021

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

27c2f1ea

15 1月, 2021 1 次提交

Cherry pick 30072 (#30499) · 590e718b

由 pangyoki 提交于 1月 15, 2021

* Cherry-pick 30072, add dispenable input for core.ops.reshape2/expand/slice (#30072)

* add dispenable input 'shape' for core.ops.reshape2

* add dispenable inputs for core.ops.reshape2/expand/slice

* add ut

* save reshape update in pr 30180

* save reshape update v2 in pr 30180
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>

590e718b

13 1月, 2021 3 次提交
- L
  [cherry-pick] Set expected place in child thread for dataloader #30383 · 9fb5a3e5
  由 Leo Chen 提交于 1月 13, 2021
```
* set expected place in child thread for dataloader

* set device id when set tensor from numpy

* revert tensor_py change

* add compile guard

* fix ci

* fix bug
```
  9fb5a3e5
- S
  
  Support unused parameters in dynamic graph distributed (#30224) (#30374) · 020e2431
  由 ShenLiang 提交于 1月 13, 2021
  
  020e2431
- T
  split ps with distributed (#30337) · a97ca56a
  由 tangwei12 提交于 1月 13, 2021
```
Change-Id: I3c788e7576688e63181e7f01562529b85a09cc59
```
  a97ca56a
12 1月, 2021 3 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

【Cherry-Pick】Fix device_context & Save Tensor & Gloo (#30336) · 284bae99

由 Chengmo 提交于 1月 12, 2021

* Fix server.h include device_context (#30243)

* fix cmake
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* 【Paddle.Fleet】Support local save sparse param (#30175)

* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

284bae99

C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

11 1月, 2021 1 次提交

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65

由 pangyoki 提交于 1月 11, 2021

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

7c943a65

08 1月, 2021 2 次提交

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39

由 liym27 提交于 1月 08, 2021

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)

1. when slice_item is a slice:
 1) the start of __getitem__ should be std::max(start, 0) if slice
 2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data

5fe3da39

【2.0API CherryPick】LookAhead, ModelAverage, IndexSelect (#30205) · 3ce4d34d

由 123malin 提交于 1月 08, 2021

* Add Lookahead and ModelAverage Optimizer (#30004)

* test=develop, add model_average and lookahead

* Improve Index select cuda kernel (#30139)

* test=develop, add index_select_cuda kernel

3ce4d34d

06 1月, 2021 1 次提交

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for... · 743649b5

由 liym27 提交于 1月 06, 2021

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105)

Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor.
But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape.

This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor.
This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

743649b5

05 1月, 2021 1 次提交
- T
  add topo-aware in heter-ps (#30087) (#30117) · 7fc2ce50
  由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
  7fc2ce50
04 1月, 2021 1 次提交
- Z
  [cherry pick 2.0]support deepcopy for Layer/Tensor/Paramerbase (#29387) (#29873) · c06350c9
  由 Zhou Wei 提交于 1月 04, 2021
```
* support deepcopy for Layer/Tensor/Paramerbase

* fix some code
```
  c06350c9
29 12月, 2020 2 次提交

[Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172

由 liuyuhui 提交于 12月 29, 2020

* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)

* [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)

* add bkcl.so in whl for kunlun (#29947)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>

847aa172

T
cherry pick heter ps (#29955) · a839ddca
由 Thunderbrook 提交于 12月 29, 2020
```
* cherry pick heter ps

* 　CMakeList
```
a839ddca

25 12月, 2020 1 次提交

2 0 ps core 2 (#29894) · f781ab08

由 tangwei12 提交于 12月 25, 2020

* add ps table (#29463)

* add ps table

Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178

* add service (#29560)

* add service, remove ut on mac

* fix heter_profiler & add heter stop method

* fix code style

* merge pscore

Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57

* fix cmake

Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb

* fix conflit

Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba

* fix conflit

Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa

f781ab08

22 12月, 2020 1 次提交
- S
  Support multi-stream communication for dynamic graph distributed (#29525) (#29821) · f7a598fa
  由 ShenLiang 提交于 12月 22, 2020
```
* fix fleet for multi-stream

* fix memcpy for ncclid

* use sync to solve move operation
```
  f7a598fa
21 12月, 2020 1 次提交
- W
  
  Fix none-contiguous bug for python api (#29616) · 147fa621
  由 Wilber 提交于 12月 21, 2020
  
  147fa621
17 12月, 2020 1 次提交

[cherry-pick]fix matmulv2 bug & add rebuild group & fix bug of download (#29726) · df0430dc

由 ShenLiang 提交于 12月 17, 2020

* Fix the dowanload bug in the case of multiple machines (#29551)

* fix the dowanload bug
* add sort for ips

* Fix bug of matmul_v2 for broadcast case (#29599)

* fix bug of matmul_v2 for broadcast

* Rebuild group automatically in dynamic graph distributed (#29255)

* add tensor_indices in AssignGroupBySize

* add rebuild group in reducer

* fix error message of gather nd (#29521)

df0430dc

05 12月, 2020 1 次提交

update unbind norm add CUDAPlace api doc information (#29322) (#29391) · 7e322b3c

由 myq406450149 提交于 12月 05, 2020

* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop

* fix format. test=develop

* format fix. test=develop

* add lod_rank_table. test=develop

* fix format. test=develop

* fix doc info. test=develop

* fix np error

* add unbind dygraph api. test=develop

* fix unbind doc.test=develop

7e322b3c

04 12月, 2020 2 次提交

[cherry-pick 2.0rc1][inplace] Add ShareHolderWith for class Variable and... · efb5ad62

由 liym27 提交于 12月 04, 2020

[cherry-pick 2.0rc1][inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267) (#29359)

efb5ad62

Support type promote for basic math ops (quantum required) (#29265) (#29354) · 0e7539e7

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

0e7539e7

03 12月, 2020 1 次提交

[Cherry-pick] Add pure fp16 training with master weights. (#29301) · d8ea8a06

由 Zhen Wang 提交于 12月 03, 2020

* Add pure fp16 training with master weights. (#27712)

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

d8ea8a06

01 12月, 2020 2 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429) · c0a991c8

由 Zhou Wei 提交于 12月 01, 2020

* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor

* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor

* fix coverage

* fix api doc

* fix CI unittest

* fix CI unittest

* fix unitest

* empty tensor does’t need inner_var_

* fix some error message

c0a991c8

30 11月, 2020 1 次提交

Check whether there is any inplace operation affecting gradient calculation. (#27901) · 865a4598

由 liym27 提交于 11月 30, 2020

* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.

* Add a new attribute `_inplace_version` for VarBase.

* Raise exception if an inplace operation can result in incorrect gradient computation.

* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.

* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.

* Use original var_wrapper if the inplace_version is not changed.

* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.

865a4598

27 11月, 2020 1 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

26 11月, 2020 1 次提交
- L
  Split train_mode and has_grad for tracer (#29064) · 770395cb
  由 Leo Chen 提交于 11月 26, 2020
```
* split train_mode and has_grad

* fix format

* fix ci problems

* fix sample code
```
  770395cb

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致