提交 · 5618f14047250f1325e7d544b4c147bf0a98c5a8 · BaiXuePrincess / Paddle

25 2月, 2021 1 次提交
- L
  refactor npu device manager (#31154) · ff4654e2
  由 Leo Chen 提交于 2月 25, 2021
```
refactor npu device manager (#31154)
```
  ff4654e2
23 2月, 2021 1 次提交

[NPU] Support executor with NPU (#31057) · 1435b4c0

由 liym27 提交于 2月 23, 2021

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

1435b4c0

09 2月, 2021 1 次提交
- L
  [feature] support npu allocator (#30840) · 81138239
  由 Leo Chen 提交于 2月 09, 2021
```
[feature] support npu allocator
```
  81138239
08 2月, 2021 1 次提交
- G
  Destroy session first. (#30954) · ebef6601
  由 gongweibao 提交于 2月 08, 2021
```
Destroy session first.
```
  ebef6601
28 1月, 2021 1 次提交
- L
  Dev/fix ascend string (#30749) · 88dfd067
  由 Leo Chen 提交于 1月 28, 2021
```
Dev/fix ascend string
```
  88dfd067
27 1月, 2021 1 次提交
- L
  fix compilation on ascend-20.1 (#30722) · 6eabbc80
  由 Leo Chen 提交于 1月 27, 2021
```
fix compilation on ascend-20.1
```
  6eabbc80
21 1月, 2021 1 次提交
- G
  Add distribution supported (#30578) · f9c97dd7
  由 gongweibao 提交于 1月 21, 2021
```
Add distribution supported
```
  f9c97dd7
15 1月, 2021 2 次提交
- G
  Fix compilcation on CANN20.1 and older (#30494) · 1882f2ce
  由 gongweibao 提交于 1月 15, 2021
```
Fix compilcation on CANN20.1 and older 
```
  1882f2ce
- H
  
  Ascend rc (#30483) · 6dd52c5b
  由 hutuxian 提交于 1月 15, 2021
  
  6dd52c5b
14 1月, 2021 1 次提交
- Y
  
  Heter ps new (#30198) · 6e0da01c
  由 yaoxuefeng 提交于 1月 14, 2021
  
  6e0da01c
13 1月, 2021 2 次提交

Set expected place in child thread for dataloader to avoid costing cuda memory... · 3d015f1c

由 Leo Chen 提交于 1月 13, 2021

Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)

* set expected place in child thread for dataloader

* set device id when set tensor from numpy

* revert tensor_py change

* add compile guard

* fix ci

* fix bug

3d015f1c

S

Support unused parameters in dynamic graph distributed (#30224) · a60f17b8
由 ShenLiang 提交于 1月 13, 2021

a60f17b8

12 1月, 2021 2 次提交
- T
  Fix/distributed proto (#29981) · 25f80fd3
  由 tangwei12 提交于 1月 12, 2021
```
* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed
```
  25f80fd3
- C
  【Paddle.Fleet】Support local save sparse param (#30175) · d479ae17
  由 Chengmo 提交于 1月 12, 2021
```
* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  d479ae17
11 1月, 2021 1 次提交
- A
  
  Add tf32 switch for cuDNN (#29192) · 924aac22
  由 AshburnLee 提交于 1月 11, 2021
  
  924aac22
09 1月, 2021 1 次提交

add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) · da16b33f

由 pangyoki 提交于 1月 09, 2021

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

da16b33f

08 1月, 2021 3 次提交

Fix dtype of ungenerated grad var (#28511) · 8696335f

由 Leo Chen 提交于 1月 08, 2021

* fix dtype of ungenerated grad var

* update ut

* refine code

* set default dtype

* fix could_use_cudnn bug

* remove debug code

* re-implement

* fix bug

8696335f

Add callback after TensorCopy (#30123) · 1f97d61c

由 Leo Chen 提交于 1月 08, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

1f97d61c

C
【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
528e03fc

07 1月, 2021 1 次提交
- 1
  Add Lookahead and ModelAverage Optimizer (#30004) · 198fbdfb
  由 123malin 提交于 1月 07, 2021
```
* test=develop, add model_average and lookahead
```
  198fbdfb
06 1月, 2021 2 次提交

add dispenable input for core.ops.reshape2/expand/slice (#30072) · adac38c5

由 Leo Chen 提交于 1月 06, 2021

* add dispenable input 'shape' for core.ops.reshape2

* add dispenable inputs for core.ops.reshape2/expand/slice

* add ut

adac38c5

Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003) · 9922bd41

由 liym27 提交于 1月 06, 2021

1. when slice_item is a slice: 
 1) the start of __getitem__ should be std::max(start, 0) if slice
 2) the start of __getitem__ should be std::min(end, dim) 
2. when slice_item is an integer, it should be in [-dim_len, dim_len) 
3. Fix error message to use accurate data

9922bd41

05 1月, 2021 1 次提交
- T
  add topo-aware in heter-ps (#30087) · 0b8e1fad
  由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
  0b8e1fad
04 1月, 2021 1 次提交
- C
  [Inference] zero_copy_tensor supports int8_t (#30053) · 68398abc
  由 cc 提交于 1月 04, 2021
```
* zero_copy_tensor supports int8_t
```
  68398abc
27 12月, 2020 1 次提交

[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) · 9602a182

由 liym27 提交于 12月 27, 2020

* Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)"

This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

9602a182

26 12月, 2020 1 次提交
- L
  
  [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) · 4427df37
  由 liuyuhui 提交于 12月 26, 2020
  
  4427df37
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

23 12月, 2020 1 次提交

heter box (#29734) · 09b6e719

由 Thunderbrook 提交于 12月 23, 2020

* 　add heter box

* add trainer, worker, wrapper...

* format

* for ci

* format

* remove boost get

* boost & copyright

* rename

* 　rename

* format

* format

* format
Co-authored-by: Nyaoxuefeng6 <yaoxuefeng@baidu.com>

09b6e719

22 12月, 2020 1 次提交
- S
  Support multi-stream communication for dynamic graph distributed (#29525) · 01e2874a
  由 ShenLiang 提交于 12月 22, 2020
```
* fix fleet for multi-stream

* fix memcpy for ncclid

* use sync to solve move operation
```
  01e2874a
16 12月, 2020 2 次提交

L

[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) · f13c3a9c
由 liuyuhui 提交于 12月 16, 2020

f13c3a9c

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

15 12月, 2020 2 次提交
- A
  
  Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) · efea540c
  由 AshburnLee 提交于 12月 15, 2020
  
  efea540c
- W
  
  fix none-contiguous bug for python api. (#29615) · 78dad786
  由 Wilber 提交于 12月 15, 2020
  
  78dad786
09 12月, 2020 2 次提交
- Z
  support deepcopy for Layer/Tensor/Paramerbase (#29387) · e74e1a22
  由 Zhou Wei 提交于 12月 09, 2020
```
* support deepcopy for Layer/Tensor/Paramerbase

* fix some code
```
  e74e1a22
- S
  Rebuild group automatically in dynamic graph distributed (#29255) · 2ef9e0e2
  由 ShenLiang 提交于 12月 09, 2020
```
* add tensor_indices in AssignGroupBySize

* add rebuild group in reducer
```
  2ef9e0e2
05 12月, 2020 1 次提交

update unbind norm add CUDAPlace api doc information (#29322) · 7c508d86

由 myq406450149 提交于 12月 05, 2020

* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop

* fix format. test=develop

* format fix. test=develop

* add lod_rank_table. test=develop

* fix format. test=develop

* fix doc info. test=develop

* fix np error

* add unbind dygraph api. test=develop

* fix unbind doc.test=develop

7c508d86

04 12月, 2020 2 次提交

[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in... · b10ecd9d

由 liym27 提交于 12月 04, 2020

[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)

b10ecd9d

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

02 12月, 2020 1 次提交

Add pure fp16 training with master weights. (#27712) · be3777a5

由 Zhen Wang 提交于 12月 02, 2020

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

* For CI Coverage Checking.

be3777a5

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致