提交 · 2a672f68e8079be10d0a34dd2a43fe2367d5de0b · PaddlePaddle / Paddle

01 4月, 2021 1 次提交

[NPU] enable async copy and add wait before sync operation (#31956) · 2a672f68

由 Leo Chen 提交于 4月 01, 2021

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

2a672f68

30 3月, 2021 1 次提交
- L
  [NPU] support npu for memcpy op (#31808) · a6343afc
  由 Leo Chen 提交于 3月 30, 2021
```
* support npu for memcpy op

* add ut

* fix ut

* fix typo
```
  a6343afc
29 3月, 2021 1 次提交
- A
  adapter npu (#31926) · 3ab39705
  由 An Improved PeleeNet Algorithm with Feature Pyramid Networks for Image Detection 提交于 3月 29, 2021
```
Co-authored-by: Nbaiyangfan <baiyangfan@baidu.com>
```
  3ab39705
26 3月, 2021 1 次提交

[NPU] support GarbageCollector for npu (#31874) · ac89174e

由 Leo Chen 提交于 3月 26, 2021

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

ac89174e

23 3月, 2021 1 次提交

Add 3d parallelism (#31796) · 228bce12

由 lilong12 提交于 3月 23, 2021

Add 3d Parallelism
Co-authored-by: NWangXi <wangxi16@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nroot <root@yq01-sys-hic-k8s-v100-box-a225-0562.yq01.baidu.com>

228bce12

10 3月, 2021 1 次提交
- L
  Support TensorFormVector, TensorToVector of bool type (#31518) · 3f206e97
  由 Leo Chen 提交于 3月 10, 2021
```
* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem
```
  3f206e97
01 3月, 2021 1 次提交
- L
  support list of list attribute for NPU (#31299) · d23bf89c
  由 Leo Chen 提交于 3月 01, 2021
```
* support list of list attribute for NPU

* fix compile problem

* fix reference
```
  d23bf89c
23 2月, 2021 2 次提交
- L
  [NPU] Support executor with NPU (#31057) · 1435b4c0
  由 liym27 提交于 2月 23, 2021
```
* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu
```
  1435b4c0
- L
  Fix compilation problem (#31100) · 85cbd556
  由 Leo Chen 提交于 2月 23, 2021
```
Fix compilation problem (#31100)
```
  85cbd556
22 2月, 2021 1 次提交

add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) · 5cb20f30

由 Leo Chen 提交于 2月 22, 2021

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

5cb20f30

09 2月, 2021 3 次提交

[feature] support npu allocator, part 2 (#30972) · 1201cd2e

由 Leo Chen 提交于 2月 09, 2021

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

1201cd2e

L
[feature] support npu operator (#30951) · 7e049108
由 Leo Chen 提交于 2月 09, 2021
```
[feature] support npu operator
```
7e049108
L
[feature] support npu allocator (#30840) · 81138239
由 Leo Chen 提交于 2月 09, 2021
```
[feature] support npu allocator
```
81138239

08 2月, 2021 1 次提交
- G
  Destroy session first. (#30954) · ebef6601
  由 gongweibao 提交于 2月 08, 2021
```
Destroy session first.
```
  ebef6601
28 1月, 2021 1 次提交
- L
  Dev/fix ascend string (#30749) · 88dfd067
  由 Leo Chen 提交于 1月 28, 2021
```
Dev/fix ascend string
```
  88dfd067
27 1月, 2021 1 次提交
- L
  fix compilation on ascend-20.1 (#30722) · 6eabbc80
  由 Leo Chen 提交于 1月 27, 2021
```
fix compilation on ascend-20.1
```
  6eabbc80
15 1月, 2021 2 次提交
- G
  Fix compilcation on CANN20.1 and older (#30494) · 1882f2ce
  由 gongweibao 提交于 1月 15, 2021
```
Fix compilcation on CANN20.1 and older 
```
  1882f2ce
- H
  
  Ascend rc (#30483) · 6dd52c5b
  由 hutuxian 提交于 1月 15, 2021
  
  6dd52c5b
14 1月, 2021 1 次提交
- Y
  
  Heter ps new (#30198) · 6e0da01c
  由 yaoxuefeng 提交于 1月 14, 2021
  
  6e0da01c
13 1月, 2021 3 次提交

C
skip quantizing ops in cpu inference (#30342) · 8e3a2940
由 cc 提交于 1月 13, 2021
```
* skip quantizing ops in cpu inference, test=develop
```
8e3a2940

Added support for inference using quantization aware trained dygraph (#30288) · 7bbf3ac5

由 alncat 提交于 1月 13, 2021

* added support for inference using qunatization aware trained dygraph

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* Delete incorrect warning message (#30196)

* fix warning and no grad

* clean redundant API alias in 2.0 - part 2 (#30013)

* delete paddle.nn.functional.assign

* fix dynamic to static error

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* Add Static Variable Clone (#30208)

Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat

* use wget to replace curl to download the lcov file (#30229)

* use wget to replace curl to download the lcov file

* add cache for lcov

* fix test_pool3d_op timeout issue (#30248)

* Fix unittests bugs. (#30250)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* Fix bug for 'save mutiple method' (#30218)

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

* added support for inference using qunatization aware trained dygraph

* Alias from  paddle.fluid.layers.auc to paddle.static.auc (#30206)

* add alias from  fluid.layers.auc to static.auc

* Update __init__.py

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* corrected boost get usage

* corrected naming issues and enforcing zero check

* correct paddle enforce message

* added more error checkings

* corrected error report message and optimized code

* corrected findvar usage

* corrected paddle_enforce in scope

* correct error messages

* correct error reporting format
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: NXiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: NYUNSHEN XIE <1084314248@qq.com>
Co-authored-by: NBai Yifan <me@ethanbai.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJiaqi Liu <liujiaqi06@baidu.com>

7bbf3ac5

fix bug on compiling inference shared lib with crypto;test=develop (#30269) · 10a8f3e5

由 Zhang Jun 提交于 1月 13, 2021

* fix bug on compiling inference shared lib with crypto;test=develop

* fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF

* update cmake

* remove unnecessary enforce message

10a8f3e5

12 1月, 2021 3 次提交

J

Recompute Offload (#30233) · 75936d83
由 JZ-LIANG 提交于 1月 12, 2021

75936d83

add sparse embedding & load vars for 2.0 & gloo bug fix (#30306) · 5e839e4d

由 tangwei12 提交于 1月 12, 2021

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0

5e839e4d

Fix/distributed proto (#29981) · 25f80fd3

由 tangwei12 提交于 1月 12, 2021

* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed

25f80fd3

11 1月, 2021 2 次提交
- L
  Support vector<double> as type of op attribute and op set_value suppport... · b4989fb7
  由 liym27 提交于 1月 11, 2021
```
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126)
```
  b4989fb7
- 石
  
  fix header file paths of gflags, commit 1, test=develop (#30271) · 8ce2482b
  由石晓伟提交于 1月 11, 2021
  
  8ce2482b
10 1月, 2021 1 次提交
- W
  reduce the occupied size of memory for the fused pattern of elementwise_add... · af80859d
  由 wangchaochaohu 提交于 1月 10, 2021
```
reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
```
  af80859d
08 1月, 2021 4 次提交

Support pure fp16 training for AMP API. (#29544) · 7f7dfccf

由 Zhen Wang 提交于 1月 08, 2021

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

7f7dfccf

L

use cuda generator in bernoulli cuda kernel (#30199) · 789743e1
由 Leo Chen 提交于 1月 08, 2021

789743e1

Add callback after TensorCopy (#30123) · 1f97d61c

由 Leo Chen 提交于 1月 08, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

1f97d61c

C
【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
528e03fc

07 1月, 2021 3 次提交
- H
  Refine PADDLE_ENFORCE Error Messages. test=develop (#30149) · 54bf3f5a
  由 Huihuang Zheng 提交于 1月 07, 2021
```
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
```
  54bf3f5a
- C
  [Complex] Simplify prepared op impl to improve performance (#30153) · d0fb06b2
  由 Chen Weihang 提交于 1月 07, 2021
```
* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment
```
  d0fb06b2
- L
  
  fix assign_op_xpu concat_op_xpu warining (#30120) · 15fac5e7
  由 liuyuhui 提交于 1月 07, 2021
  
  15fac5e7
06 1月, 2021 1 次提交
- 石
  
  fix a bug in op_version_registry, test=develop, test=op_version (#29994) · 53bb1265
  由石晓伟提交于 1月 06, 2021
  
  53bb1265
05 1月, 2021 2 次提交
- L
  
  fix xpu pe sync, test=notest (#30095) · 254ad619
  由 liuyuhui 提交于 1月 05, 2021
  
  254ad619
- T
  add topo-aware in heter-ps (#30087) · 0b8e1fad
  由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
  0b8e1fad
04 1月, 2021 2 次提交
- W
  
  Optimization grad merge performance (#29784) · ee16006b
  由 WangXi 提交于 1月 04, 2021
  
  ee16006b
- S
  fix op version checker of pass bug (#30028) · 08dc5bc2
  由 Shang Zhizhou 提交于 1月 04, 2021
```
* fix op version checker of pass bug

* fix code style

* update  pass version
```
  08dc5bc2

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功