提交 · caf9d39839b6afd00ab457964055082b94842f62 · wmsofts / Paddle

18 2月, 2021 1 次提交

Add Conv Transpose BF16 (#30877) · caf9d398

由 joanna.wozna.intel 提交于 2月 18, 2021

* Add conv transpose BF16

* Share function GetWeightsTz

* Adjust to review and fix op compatibility

* Add bias to unique handler name

* Remove errors related to paddle enforce

* Add conv2d_transpose to bf16 list and kernel refator

caf9d398

10 2月, 2021 1 次提交

New custom operator extension mechanism (#30690) · f649442d

由 Chen Weihang 提交于 2月 09, 2021

* initial commit: simple demo

* polish copyright format

* add grap op simple demo

* adapt uncertain number of argument

* change trait marco name

* add place & dtype support for add kernel

* add dispath and infershape func

* poish code & add notes

* add dynamic_loader dep for paddle_framework

* add new custom op test dir

* polish impl details

* add unittest for new custom op

* fix failed unittest

* Costum op (#1)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* refactor register design & add test

* change op_funtion to op_meta_info

* split op meta info into .h and .cc

* move get methods into friend class

* move OpMetaInfoHelper into framework space

* move CustomTensorUtils into framework space

* change pybind api name

* move PD C API into op meta info

* add register custom op api

* remove inference cmake change

* refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* polish detail & error message

* polish test details

* Add cast api && Change copy related api to copy_to && add more test (#4)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add type cast

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* add more error log

* add more error log

* polish code

* used for test

* remove test comment

* remove test comment

* fix uint8 type error

* fix lost uint8 type error

* add test for coverage

* polish details by reviewer comments

* add prefix for DISABLE_COPY_AND_ASSIGN
Co-authored-by: NJiabin Yang <360788950@qq.com>

f649442d

09 2月, 2021 2 次提交
- W
  
  Fix the problem that the number of ops executed by xpu is wrong (#30961) · 14d039e4
  由 WangXi 提交于 2月 09, 2021
  
  14d039e4
- A
  
  Fix LayerNorm tester for gcc4.8 (#30962) · 3ba69809
  由 Adam Osewski 提交于 2月 09, 2021
  
  3ba69809
05 2月, 2021 2 次提交
- W
  
  add include for heterbox_trainer.cc, develop=test (#30910) · aab3a301
  由 wanghuancoder 提交于 2月 05, 2021
  
  aab3a301
- A
  More UT for LayerNormFuse pass (#30891) · 092a2b14
  由 Adam Osewski 提交于 2月 05, 2021
```
* Additionally change to not throw error from inside pass.
```
  092a2b14
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
03 2月, 2021 1 次提交
- A
  
  Layer normalization fuse pass. (#30721) · 4f066e31
  由 Adam Osewski 提交于 2月 03, 2021
  
  4f066e31
01 2月, 2021 2 次提交
- T
  dump to cpu (#30750) · cb66c53c
  由 Thunderbrook 提交于 2月 01, 2021
```
* dump to cpu

* format

* format

* format
```
  cb66c53c
- W
  
  Fleet distributed strategy support pure fp16 (#30754) · 31ed9c9e
  由 WangXi 提交于 2月 01, 2021
  
  31ed9c9e
28 1月, 2021 1 次提交
- A
  
  fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733) · 5b59499e
  由 alncat 提交于 1月 28, 2021
  
  5b59499e
27 1月, 2021 2 次提交
- L
  
  [Kunlun] fix dead lock for exec_op_count_ (#30718) · 67abfc15
  由 liuyuhui 提交于 1月 27, 2021
  
  67abfc15
- A
  
  modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704) · 5ace20fc
  由 alncat 提交于 1月 27, 2021
  
  5ace20fc
26 1月, 2021 1 次提交
- L
  
  update, test=develop (#30692) · 7fbc68a2
  由 lilong12 提交于 1月 26, 2021
  
  7fbc68a2
25 1月, 2021 2 次提交
- A
  More precise mkldnn kernel rules in GetExpectedKernelType (#29840) · 5bf25d1e
  由 arlesniak 提交于 1月 25, 2021
```
* More precise mkldnn kernel choice in GetExpectedKernelType

* Fixes after review

* Refresh develop for CI

* CI experiment

* get back from CI exper
```
  5bf25d1e
- J
  
  [oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358) · 173660be
  由 Jacek Czaja 提交于 1月 25, 2021
  
  173660be
21 1月, 2021 1 次提交
- T
  solve build gpu task core (#30626) · 1bebc092
  由 Thunderbrook 提交于 1月 21, 2021
```
* build gpu task core

* format
```
  1bebc092
20 1月, 2021 1 次提交
- L
  
  [Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586) · e5b0d9e1
  由 liuyuhui 提交于 1月 20, 2021
  
  e5b0d9e1
19 1月, 2021 2 次提交
- L
  
  Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30536) · ff25c5b3
  由 liym27 提交于 1月 19, 2021
  
  ff25c5b3
- L
  unify calling cudaSetDevice (#30470) · 81217a94
  由 Leo Chen 提交于 1月 19, 2021
```
* unify calling cudaSetDevice

* fix compile
```
  81217a94
18 1月, 2021 2 次提交
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) · 40ede126
  由 hutuxian 提交于 1月 18, 2021
  
  40ede126
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) · 843dc3cd
  由 liuyuhui 提交于 1月 18, 2021
  
  843dc3cd
16 1月, 2021 1 次提交
- A
  [oneDNN] Refactor fuse pass helper functions to one place. (#30460) · c5ffad12
  由 Adam Osewski 提交于 1月 16, 2021
```
* Move pass tester helper functions to single common place.

* Use helper functions in two more fuse pass tests.
```
  c5ffad12
15 1月, 2021 1 次提交

Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) · 13d75736

由 pangyoki 提交于 1月 15, 2021

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

13d75736

14 1月, 2021 1 次提交
- Y
  
  Heter ps new (#30198) · 6e0da01c
  由 yaoxuefeng 提交于 1月 14, 2021
  
  6e0da01c
13 1月, 2021 3 次提交

C
skip quantizing ops in cpu inference (#30342) · 8e3a2940
由 cc 提交于 1月 13, 2021
```
* skip quantizing ops in cpu inference, test=develop
```
8e3a2940

Added support for inference using quantization aware trained dygraph (#30288) · 7bbf3ac5

由 alncat 提交于 1月 13, 2021

* added support for inference using qunatization aware trained dygraph

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* Delete incorrect warning message (#30196)

* fix warning and no grad

* clean redundant API alias in 2.0 - part 2 (#30013)

* delete paddle.nn.functional.assign

* fix dynamic to static error

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* Add Static Variable Clone (#30208)

Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat

* use wget to replace curl to download the lcov file (#30229)

* use wget to replace curl to download the lcov file

* add cache for lcov

* fix test_pool3d_op timeout issue (#30248)

* Fix unittests bugs. (#30250)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* Fix bug for 'save mutiple method' (#30218)

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

* added support for inference using qunatization aware trained dygraph

* Alias from  paddle.fluid.layers.auc to paddle.static.auc (#30206)

* add alias from  fluid.layers.auc to static.auc

* Update __init__.py

* added support for inference using qunatization aware trained dygraph
correct boost get usage

* corrected boost get usage

* corrected naming issues and enforcing zero check

* correct paddle enforce message

* added more error checkings

* corrected error report message and optimized code

* corrected findvar usage

* corrected paddle_enforce in scope

* correct error messages

* correct error reporting format
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: NXiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: NYUNSHEN XIE <1084314248@qq.com>
Co-authored-by: NBai Yifan <me@ethanbai.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJiaqi Liu <liujiaqi06@baidu.com>

7bbf3ac5

fix bug on compiling inference shared lib with crypto;test=develop (#30269) · 10a8f3e5

由 Zhang Jun 提交于 1月 13, 2021

* fix bug on compiling inference shared lib with crypto;test=develop

* fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF

* update cmake

* remove unnecessary enforce message

10a8f3e5

12 1月, 2021 3 次提交

J

Recompute Offload (#30233) · 75936d83
由 JZ-LIANG 提交于 1月 12, 2021

75936d83

add sparse embedding & load vars for 2.0 & gloo bug fix (#30306) · 5e839e4d

由 tangwei12 提交于 1月 12, 2021

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0

5e839e4d

Fix/distributed proto (#29981) · 25f80fd3

由 tangwei12 提交于 1月 12, 2021

* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed

25f80fd3

11 1月, 2021 2 次提交
- L
  Support vector<double> as type of op attribute and op set_value suppport... · b4989fb7
  由 liym27 提交于 1月 11, 2021
```
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126)
```
  b4989fb7
- 石
  
  fix header file paths of gflags, commit 1, test=develop (#30271) · 8ce2482b
  由石晓伟提交于 1月 11, 2021
  
  8ce2482b
10 1月, 2021 1 次提交
- W
  reduce the occupied size of memory for the fused pattern of elementwise_add... · af80859d
  由 wangchaochaohu 提交于 1月 10, 2021
```
reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
```
  af80859d
08 1月, 2021 4 次提交

Support pure fp16 training for AMP API. (#29544) · 7f7dfccf

由 Zhen Wang 提交于 1月 08, 2021

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

7f7dfccf

L

use cuda generator in bernoulli cuda kernel (#30199) · 789743e1
由 Leo Chen 提交于 1月 08, 2021

789743e1

Add callback after TensorCopy (#30123) · 1f97d61c

由 Leo Chen 提交于 1月 08, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

1f97d61c

C
【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
528e03fc

07 1月, 2021 2 次提交

H
Refine PADDLE_ENFORCE Error Messages. test=develop (#30149) · 54bf3f5a
由 Huihuang Zheng 提交于 1月 07, 2021
```
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
```
54bf3f5a

[Complex] Simplify prepared op impl to improve performance (#30153) · d0fb06b2

由 Chen Weihang 提交于 1月 07, 2021

* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment

d0fb06b2

wmsofts / Paddle 与 Fork 源项目一致

wmsofts / Paddle
与 Fork 源项目一致