提交 · a224019056b9767e827a48c72b2e8a9270e04d79 · Crayon鑫 / Paddle

19 7月, 2022 1 次提交

Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190

由 chenjian 提交于 7月 19, 2022

* add serialization for new field in event node (#43405)

* add serialization for new field in event node

* fix a bug

* add more field to memory record (#43578)

* Add infer shape in dygraph (#43822)

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* update dygraph record

* add infer shape record

* fix

* fix

* fix

* add comments

* fix a bug

* fix

* fix

* add record op info

* fix file mode

* add op input shape info

* fix dependency

a2240190

12 7月, 2022 1 次提交

add new field for event node (#43223) (#44245) · 94271bc2

由 chenjian 提交于 7月 12, 2022

* add new field for event node

* fix

* fix bug

* fix bug

* fix clang

* fix clang format

* fix code format

94271bc2

01 7月, 2022 1 次提交
- S
  
  make only win32 and 11.6 use external/cub (#44005) · 3cc6ae69
  由 Sing_chan 提交于 7月 01, 2022
  
  3cc6ae69
30 6月, 2022 4 次提交
- H
  [Cherry-pick] Apply IOU to test_parallel_executor_seresnext_base_gpu … (#43925) · fde34eb8
  由 Huihuang Zheng 提交于 6月 30, 2022
```
* [Cherry-pick] Apply IOU to test_parallel_executor_seresnext_base_gpu (#43812)
1. Fix the conflict between #43812 and current release/2.3 branch
2. test_parallel_executor_seresnext_base_gpu failed on 2 P100 GPUs with `470.82` driver.
```
  fde34eb8
- S
  
  cherry pick 43934 and not format (#43935) · 83520fd2
  由 Sing_chan 提交于 6月 30, 2022
  
  83520fd2
- W
  [Paddle Inference ]Fix emb pass for ernie3.0 (#43948) · 35abeda7
  由 Wangzheee 提交于 6月 30, 2022
```
* fix emb pass for ernie3.0

* fix emb pass for ernie3.0

* fix emb pass for ernie3.0
```
  35abeda7
- J
  
  modify graph_pattern to thread_local (#43945) · 1ea9971a
  由 JingZhuangzhuang 提交于 6月 30, 2022
  
  1ea9971a
29 6月, 2022 2 次提交

Fix elementwise_div UT by providing user defined gradients (#43536) (#43909) · 26187c27

由 Qi Li 提交于 6月 29, 2022

Cherry-pick of #43536

Backgroud in #43262

In elementwise_div UT, the numeric gradient (validation) has large relative error in comparison to analytic gradient (Paddle OP).

The default rtol for UTs is 0.005
The rtol for float32 and float64 elementwise_div OP is set to be 0.05
The rtol for float16 and bfloat16 elementwise_div OP is set to be 1.0

The relative error is too large, so this PR provides user defined gradients to test elementwise_div followed by the analytic method.

26187c27

R
cherry pick 43890 (#43892) · 69e82d83
由 ronnywang 提交于 6月 29, 2022
```
* cherry pick 43890
```
69e82d83

28 6月, 2022 4 次提交

[cherry-pick] Fix code examples (#43904) · dc12605d

由 Chen Long 提交于 6月 28, 2022

* Update api docs (#42725)

* Fix max_pool3d doc, test=document_fix (#42715)

* fix pooling doc

* fix typo test=document_fix

* fix doc typo, test=document_fix

* fix adaptive_avg_pool1d doc bug (#42721)

* fix adaptive_avg_pool1d doc bug

* fix adaptive_avg_pool1d doc bug

* fix spectral_norm en doc (#42728)

* Fix example code bugs (#42739)

* update readme test=document_fix

* fix api docs bugs test=document_fix

* fix code example bugs;test=document_fix
Co-authored-by: NLinjie Chen <40840292+linjieccc@users.noreply.github.com>
Co-authored-by: NWei Shengyu <weisy11@163.com>
Co-authored-by: NWalter <dongshl1226@hotmail.com>
Co-authored-by: Nwangna11BD <79366697+wangna11BD@users.noreply.github.com>

dc12605d

[Docs] Fix doc of kaiming initializer (#43823) (#43827) · 63458e5b

由 Jackwaterveg 提交于 6月 28, 2022

* Update kaiming.py

* Update initializer.py

* fix doc bug;test=document_fix

* fix doc;test=document_fix

* Update initializer.py

* Update kaiming.py

* for ci;test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>

63458e5b

P

Cherry-pick PR43834, support mac m1 arm compile in paddle_build (#43834) (#43872) · 61bededd
由 pangyoki 提交于 6月 28, 2022

61bededd
Z
[Inference TRT] elementwise layer support (#43851) · 17a2003d
由 zhoutianzi666 提交于 6月 28, 2022
```
* elementwise support

* commit
```
17a2003d

27 6月, 2022 2 次提交

G
[cherry-pick]Update quantization round and clip calculation methods (#43829) · ff70a269
由 Guanghua Yu 提交于 6月 27, 2022
```
* update quantization clip and round

* fix quantization clip and round Attribute

* fix typo
```
ff70a269

[Cherry-pick] Fix incompatible error for place type (#43830) · 9e776f62

由 Chen Weihang 提交于 6月 27, 2022

* Create Tensor by paddle::empty  in custom operator (#41840)

* create tensor by empty in custom op

* fix some bug

* update relu custom op demo (#43173)

* Fix incompatible error for custom op Placetype (#43749)

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

9e776f62

25 6月, 2022 2 次提交
- H
  
  Upgrade onnxruntime to 1.11.1 (#43797) · 51240331
  由 heliqi 提交于 6月 25, 2022
  
  51240331
- L
  [new-exec] lazy creating work queue (#43551) (#43768) · 0c44dd64
  由 Leo Chen 提交于 6月 25, 2022
```
* lazy creating work queue

* fix dry_run
```
  0c44dd64
24 6月, 2022 3 次提交

[cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa

由 Aganlengzi 提交于 6月 24, 2022

* Use all sitepackages path as the library/include path (#42940)

* Fix several unit tests and increase the unit tests stability (#43670)

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

* Fix 3 unittest errors (#43532)

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

* Use fixed random seed (#43659)

* for CI test_collective_sendrecv_api
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
Co-authored-by: NShijie <505749828@qq.com>

9edbe4aa

W

[cherry-pick] fix the cumsum big shape and random bug (#43777) · edff59b1
由 wawltor 提交于 6月 24, 2022

edff59b1
K
[cherry pick] fix structure infos conflict in static return_list mode (#43691) · e700ffdc
由 Kaipeng Deng 提交于 6月 24, 2022
```
* fix structure infos conflict in static return_list mode. test=develop

* fix format. test=develop

* fix format. test=develop
```
e700ffdc

23 6月, 2022 6 次提交
- L
  
  remove slowing down pass (#43750) · 096eb801
  由 lidanqing 提交于 6月 23, 2022
  
  096eb801
- Z
  
  fix set_value (#43694) (#43783) · 9d12e70c
  由 zyfncg 提交于 6月 23, 2022
  
  9d12e70c
- H
  
  Upgrade paddle2onnx to 0.9.9 (#43774) (#43775) · 4aa0515d
  由 heliqi 提交于 6月 23, 2022
  
  4aa0515d
- W
  
  [cherry pick][Inference]Enhance gpu multihead matmul v3 fuse pass (#43765) · 94bacb47
  由 WJJ1995 提交于 6月 23, 2022
  
  94bacb47
- H
  [cherry pick 2.3][Inference]Fix the ort Backend multiple input bug(#43621 #43742) (#43739) · babba557
  由 heliqi 提交于 6月 22, 2022
```
* cherry pick form develop 43621

* code format

* paddle2onnx update to 0.9.8
```
  babba557
- L
  [cherry-pick] release/2.3 elementwise_mul and matmul mkldnn fix (#43725) · a7e0cdea
  由 lidanqing 提交于 6月 23, 2022
```
* Correct elementwise quantization (#43693)

* [Bug fix] Do not quantize weights Y when matmul X and Y both other ops outputs (#43297)

* fix some matmul that X and Y both other ops outputs, do not dequantize the Y.

* fix CI format

* fix according to review
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>
```
  a7e0cdea
22 6月, 2022 12 次提交

Cherry pick 43307 (#43618) · d0bbf46c

由 ccrrong 提交于 6月 22, 2022

* add bilinear_interp_v2 converter

* update op_teller.cc

* add unittest for bilinear_interp_v2 converter

* code format

* bug fix

* code format and add unittest

* remove merged modify in op_teller.cc

* code format

* code format

* fix scale init error

d0bbf46c

X

gpu_context (#43661) · 90ae3533
由 xiaoxiaohehe001 提交于 6月 22, 2022

90ae3533
J
[Cherry-pick]to Release/2.3, Improve MSRAInitializer (#43721) · 1aafc31b
由 Jackwaterveg 提交于 6月 22, 2022
```
* fix conflict

* improve the doc
```
1aafc31b

Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df

由 Yiqun Liu 提交于 6月 22, 2022

cherry-pick #42750。

QA反馈，#42750 优化后，solov2模型性能可提升6%，故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下，该pr不在release/2.3分支中，故将#42750 中python修改同步到fluid.layers.tensor.linspace中。

4dcfc6df

Cherry-pick PR#43237 from deveop (#43685) · e90dfaf7

由 shiyutang 提交于 6月 22, 2022

* merge_release_and_dev

* merge_release_dev

* update

* Use tempfile to place the temporary files (#43237)

* tempfile_fix

* update

* fix_CI

* update_word2vec.inference.model

* remove_change_in_word2vec_book

* fix_word2vec_book

* rm_affine

* update

e90dfaf7

Z
fix the bug that _DataLoaderIterMultiProcess use time to generate the seed (#43318) (#43702) · f4c42389
由 Zhang Ting 提交于 6月 22, 2022
```
 fix the bug that _DataLoaderIterMultiProcess use time to generate the seed

cherry-pick #43318
```
f4c42389

[cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2

由 Zhang Ting 提交于 6月 22, 2022

[cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)

 [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax

cherry-pick #43635 #43681 #43474

0660d5f2

test=document_fix;cherry pick code format check upgrade to release/2.3 (#43732) · 8e6a1945

由 Sing_chan 提交于 6月 22, 2022

Only cherry pick format tool(clang-format, yapf, cmake-format) upgrade to release/2.3, lint tool such as cpplint will not move, because we are not going to fix cpplint error in release/2.3
pre_commit.sh also is moved to release/2.3 so that both PR-CI-pre-commit and PR-CI-pre-commit-23 can works.
pre install clang-format to avoid repeat installation due to pre-commit's multi-thread running.

8e6a1945

Z

fix tensor copy bug (#43299) (#43728) · 8760817a
由 zyfncg 提交于 6月 22, 2022

8760817a
L
[Cherrypick 2.3] fix decode jpeg example code (#42752) · a4c898cf
由 LielinJiang 提交于 6月 22, 2022
```
* fix decode_jpeg example code

* fix decode_jpeg example code
```
a4c898cf

set_state_dict not use state_dict hook (#43407) (#43711) · 0fb66355

由 zhangbo9674 提交于 6月 22, 2022

在 amp-o2功能开发过程中，为了支持指定网络存储数据类型的功能，添加state_dict hook功能，但是在Layer的set_state_dict是通过state_dict获取网络参数并加载的，hook接口的存在导致 set_state_dict无法加载到原本网络参数。
本pr通过增加hook控制开关，在set_state_dict中禁用hook解决该问题。

详见pr43407

0fb66355

[FIx bug]layer to 'NoneType' object has no attribute 'place' (#43597) (#43717) · 0b879318

由 zhangbo9674 提交于 6月 22, 2022

bug：
当class Layer的_buffers中有参数为None的时候，调用to()方法将会报layer to 'NoneType' object has no attribute 'place'的错误。
修复方法：
to()方法增加对_buffers中None类型参数的判断，如果为None，跳过该参数的处理。

0b879318

21 6月, 2022 2 次提交
- J
  [Cherry-pick ] to Release/2.3, Add prefetch_factor in dataloader (#43674) · af415bc2
  由 Jackwaterveg 提交于 6月 21, 2022
```
* fix usage of prefetch_factor

* add assert

* add docstring and change prefetch_factor when num_workers=0

* fix doc
```
  af415bc2
- G
  [cherry pick #43088 #40664] Add float16 to fake quantize/dequantize OP (#43689) · 9783e887
  由 Guanghua Yu 提交于 6月 21, 2022
```
* cherry pick #43088 #40664

* fix clang format
```
  9783e887

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致