提交 · 8140485aee3ac18b70b6abc8209fd853a70e48ce · 机器未来 / Paddle

02 4月, 2021 1 次提交

[Cherry-Pick] logclean & embedding doc (#32009) · 8140485a

由 tangwei12 提交于 4月 02, 2021

* fix en doc for emb (#31980)

* fix en doc for emb, test=document_fix;
Change-Id: I4757e67caacd7189f068493ed45a7445f87ffb40

* LOG CLEAN (#31819)

* upgrade vlog

* train from dataset fetch optimize

8140485a

31 3月, 2021 2 次提交

OneDNN hardswish integration (#30211) (#31870) · b934d0b8

由 lidanqing 提交于 3月 31, 2021

* OneDNN hardswish integration (#30211)

* keep only conv + hardswish in this PR
Co-authored-by: Njakpiase <62569058+jakpiase@users.noreply.github.com>

b934d0b8

Cherry pick bert transformer 2.0 support (#31959) · 967f4c2e

由 Pei Yang 提交于 3月 31, 2021

* [Paddle-TRT] TRT inference support for BERT/Transformer in paddle 2.0 api (#31744)

* support multihead_matmul_fuse_pass_v3

* fix compile problems

* embedding_eltwise_ln pass support lookup_table_v2

* suppoort matmul and matmul_v2 in qkv matmul

* map_matmul_to_mul_pass support 3dim

967f4c2e

02 3月, 2021 1 次提交

[CP] align fleet param (#31220) · d15e73b0

由 lilong12 提交于 3月 02, 2021

* update, test=develop (#30692)

* align the default value of some configuration for fleet to that of single cards (#30740)

* update, test=develop

d15e73b0

01 3月, 2021 5 次提交

[Cherry pick] cherry-pick #31102 #30750 #30626 (#31336) · ff4612a3

由 Thunderbrook 提交于 3月 01, 2021

* solve build gpu task core (#30626)

* build gpu task core

* format

* dump to cpu (#30750)

* dump to cpu

* format

* format

* format

* support multi node in heterps (#31102)

* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829

* optimizer

ff4612a3

C
[Cherry-pick] Fix dtype unmatched in custom op API #31306 · a891032f
由 Chen Weihang 提交于 3月 01, 2021
```
[Cherry-pick] Fix dtype unmatched in custom op API

cherry-pick of #31305
```
a891032f
石

[Cherry-pick] inference modification for custom operator (#31283) (#31300) · 628f0856
由石晓伟提交于 3月 01, 2021

628f0856
W

cherry-pick (#31279) · 6330fc94
由 Wilber 提交于 3月 01, 2021

6330fc94

[Cherry-pick] The 4th part of new custom op (#31282) · 777d1a45

由 Chen Weihang 提交于 3月 01, 2021

* modify custom op dependent from paddle_framework to paddle_custom_op (#31195)

* [Custom Op] Remove unsupport dtypes (#31232)

* remove remove_unsupport_dtype

* remove remove_unsupport_dtype

* remove test dtype

* add more include

* change dtype.h's enum as enum class to avoid conflict with inference lib

* make enum as enum class

* remove additional test

* merge develop

* polish code

* [Custom OP] Support stream set on Custom Op (#31257)

* [Custom OP] change the user header file format, test=develop (#31274)

* [Custom OP]add PD_THROW and PD_CHECK for User Error message (#31253)

* [Custom OP]add PD_THROW and PD_CHECK for User error message

* PD_THROW and PD_CHECK, fix comment

* fix Windows error message

* fix Windows error message

* fix CI

* [Custom OP]add MSVC compile check on Windows (#31265)

* fix test_check_abi
Co-authored-by: NZhou Wei <52485244+zhouwei25@users.noreply.github.com>
Co-authored-by: NJiabin Yang <marsyang199376@gmail.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nzhouwei25 <zhouwei25@baidu.com>

777d1a45

27 2月, 2021 1 次提交

[Cherry-Pick] Split Macros and Add modeling unittest (#31266) · 52f7e773

由 Aurelius84 提交于 2月 27, 2021

* [CustomOp] Add Modeling with Custom op unittest (#31218)

* add unittest for static/dygraph/dy2stat

* add PE unittet

* remove usless code

* add unittest in CMakeList.txt

* [CustomOp] Split build op marco & polish details (#31229)

* split build op marco & polish details

* revert register api del

* fix other unittest

* [CustomOP]Support Incremental compilation and Add Version management (#31228)

* Support Incremental compilation and Add Version management

* replace hash with hashlib

* fix test_op_num unittest

* Revert "fix test_op_num unittest"

This reverts commit 2f78de976e1d7ca60915b2310717b38a32ae204a.
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

52f7e773

26 2月, 2021 2 次提交
- C
  [Cherry-pick] The Second part of new custom op extension in 2.0.1 (#31237) · d3e60959
  由 Chen Weihang 提交于 2月 26, 2021
```
[Cherry-pick] The Second part of new custom op extension in 2.0.1
```
  d3e60959
- W
  
  Fleet distributed strategy support pure fp16 (#30754) (#31238) · 03babe17
  由 WangXi 提交于 2月 26, 2021
  
  03babe17
24 2月, 2021 1 次提交

added support for fake_quantize_dequantize_abs_max op in quantization… (#30896) (#31162) · 011a6a51

由 alncat 提交于 2月 24, 2021

* added support for fake_quantize_dequantize_abs_max op in quantization inference pass

* remove const_cast to pass ci

* remove compare operator to pass ci-coverage

* added detailed error message for unregistered tensorrt_subgrah_pass

011a6a51

23 2月, 2021 2 次提交

[CustomOp] New custom operator extension mechanism in 2.0.1 (#31097) · a19154ca

由 Chen Weihang 提交于 2月 23, 2021

[CustomOp] New custom operator extension mechanism in 2.0.1

Cherry-pick New custom operator basic implementation related PRs

a19154ca

[cherry-pick 2.0.1] [kunlun] fix xpu bind threaded executor (#31116) · 29467060

由 WangXi 提交于 2月 23, 2021

* [Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586)

* [Kunlun] fix dead lock for exec_op_count_ (#30718)

* Fix the problem that the number of ops executed by xpu is wrong (#30961)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

29467060

02 2月, 2021 1 次提交

Conv bn fuse fix (#30830) · b4be9717

由 alncat 提交于 2月 02, 2021

* fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733)

* modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704)

b4be9717

19 1月, 2021 3 次提交
- L
  [Cherry-Pick] Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30564) · f15bed11
  由 liym27 提交于 1月 19, 2021
```
cherry-pick #30536
```
  f15bed11
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) (#30546) · 6f563ace
  由 hutuxian 提交于 1月 19, 2021
  
  6f563ace
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) (#30535) · 420fdbb2
  由 liuyuhui 提交于 1月 19, 2021
  
  420fdbb2
18 1月, 2021 1 次提交

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea

由 pangyoki 提交于 1月 18, 2021

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

27c2f1ea

14 1月, 2021 3 次提交
- C
  
  skip quantizing ops in cpu inference (#30342) (#30405) · 2f16e0c6
  由 cc 提交于 1月 14, 2021
  
  2f16e0c6
- Z
  
  [cherry-pick 2.0]enable MakeCipher api for inference (#30389) · ac70275a
  由 Zhang Jun 提交于 1月 14, 2021
  
  ac70275a
- A
  
  Added support for inference using quantization aware trained dygraph (#30288) (#30402) · 38faed7f
  由 alncat 提交于 1月 14, 2021
  
  38faed7f
13 1月, 2021 3 次提交

J

Recompute Offload (#30233) (#30372) · 3fbc3cf4
由 JZ-LIANG 提交于 1月 13, 2021

3fbc3cf4
T
split ps with distributed (#30337) · a97ca56a
由 tangwei12 提交于 1月 13, 2021
```
Change-Id: I3c788e7576688e63181e7f01562529b85a09cc59
```
a97ca56a

石

git cherry-pick the commits of operator version registries, test=release/2.0 (#30292) · 5eab1a38

由石晓伟提交于 1月 13, 2021

* Register op version for grid_sampler, test=op_version (#29916)

* add op version for fake_quant and fake_dequant ops, test=op_version (#29923)

* Register op version for print, test=op_version (#29945)

* add gru op_register_version; test=op_version; (#29931)

* Register op version for coalesce_tensor. (#29940)

* register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)

* add op_register_version for allclose op; test=op_version (#29968)

* register ModifyAttr for instance_norm, test=op_version (#29938)

* add op_version for flip op [test=op_version] (#30019)

* add the op version check for the elementwise ops, test=op_version (#30010)

* add the support the op version check for matmul, test=op_version (#30011)

* Revert "register ModifyAttr for instance_norm, test=op_version (#29938)"

* add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034)

* Fix rank_attention op_version, test=op_version (#30006)

* fix rank_attention, test=op_version

* Register op version for linspace,test=op_version (#30025)

* fix op_register_version for compare ops, test=op_version (#30007)
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>

* register ModifyAttr for instance_norm, test=op_version (#30065)

* register instance norm, test=op_version

* add trace op_register_version and fix version bug; test=op_version (#30000)

* fix a bug in op_version_registry, test=develop, test=op_version (#29994)

* Add version checking, test=op_version (#30129)

* fix a bug in gaussian_random_op version, test=release/2.0
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: Ncc <52520497+juncaipeng@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NGuo Sheng <whucsgs@163.com>
Co-authored-by: Nwangxinxin08 <69842442+wangxinxin08@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NFlyingQianMM <245467267@qq.com>
Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Nchalsliu <45041955+chalsliu@users.noreply.github.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: NShenLiang <shenliang03@baidu.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: Nchannings <chenlingchi@baidu.com>
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
Co-authored-by: Nruri <shipeng1108@163.com>

5eab1a38

12 1月, 2021 6 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

【Cherry-Pick】Fix device_context & Save Tensor & Gloo (#30336) · 284bae99

由 Chengmo 提交于 1月 12, 2021

* Fix server.h include device_context (#30243)

* fix cmake
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* 【Paddle.Fleet】Support local save sparse param (#30175)

* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

284bae99

[2.0 Cherry-pick]fix 2.0 error message (#30332) · df67b317

由 swtkiwi 提交于 1月 12, 2021

* fix datanorm error msg (#30294)

* Optimize the error message of framework. (#30134)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* fix enforce msg of sum xpu op (#30113)

* enhance error info for py_func (#30138)

* enhance error info for py_func

* update

* fix elugradgrad test fail & error message opt (#30171)

* fix elugradgrad test fail and error message opt

* fix unitest,test=develop

* Update prroi_pool_op.h

fix error message

* opt message,test=develop

* fix ci fail,test=develop

* Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)

Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc

* enhance error message, test=develop (#30220)

* fix error message for distribute_fpn_proposals_op (#30116)

* enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* enhance error message of nll_loss op test=develop (#30125)

* enhance error message of nll_loss op test=develop
Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
Co-authored-by: Nxiemoyuan <71377852+xiemoyuan@users.noreply.github.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NWilber <jiweibo@baidu.com>
Co-authored-by: NDouble_V <liuvv0203@163.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: Nzhang wenhui <frankwhzhang@126.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Nlijianshe02 <48898730+lijianshe02@users.noreply.github.com>

df67b317

L
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199) #30286 · e7cbc43f
由 Leo Chen 提交于 1月 12, 2021
```
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199)
```
e7cbc43f
C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

11 1月, 2021 4 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

L
[cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
由 Leo Chen 提交于 1月 11, 2021
```
[cherry-pick] Async drop scope in executor (#29714)
```
93ce7f69

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

08 1月, 2021 1 次提交

[Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35

由 Chen Weihang 提交于 1月 08, 2021

* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment

0e3a1d35

07 1月, 2021 1 次提交
- L
  
  fix xpu pe sync, test=notest (#30095) (#30114) · 85545bbc
  由 liuyuhui 提交于 1月 07, 2021
  
  85545bbc
06 1月, 2021 1 次提交

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for... · 743649b5

由 liym27 提交于 1月 06, 2021

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105)

Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor.
But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape.

This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor.
This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

743649b5

05 1月, 2021 1 次提交
- T
  add topo-aware in heter-ps (#30087) (#30117) · 7fc2ce50
  由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
  7fc2ce50

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致