提交 · 2676281769984944f2ca447a16fe9991b534be76 · 机器未来 / Paddle

09 8月, 2022 2 次提交

[Cherry-pick] Several bugs fix (#44991) · e00aa903

由 Chen Weihang 提交于 8月 08, 2022

* fix device context init error (#43910)

* Fix core so name mismatch error (#43977)

* fix core avx soname error

* remove print info

* add clip_extra (#44008)

* fix tensor stream error in custom op (#44500)

* fix custom op attr names size error (#44938)

e00aa903

C

add post layer norm (#44931) · c5f4a9cc
由 carryyu 提交于 8月 09, 2022

c5f4a9cc

05 8月, 2022 2 次提交
- Z
  
  fix conflict (#44891) · 30b66f03
  由 zhaoyingli 提交于 8月 05, 2022
  
  30b66f03
- Z
  
  commit (#44887) · 247002ec
  由 zhoutianzi666 提交于 8月 05, 2022
  
  247002ec
04 8月, 2022 3 次提交
- G
  [cherry-pick] fix QuantizeLinear pass and support reduce_max in quantization (#44872) · 24b3bbde
  由 Guanghua Yu 提交于 8月 04, 2022
```
* fix QuantizeLinear kernel and pass in QAT (#44784)

* Add Reduce Max in Quant (#44825)
Co-authored-by: NChang Xu <molixu7@gmail.com>
```
  24b3bbde
- Z
  [Paddle-TRT][cherry pick] Slice to 2.3 (#44757) · 245005d4
  由 zhoutianzi666 提交于 8月 04, 2022
```
* slice_to_2.3
```
  245005d4
- C
  [cherry pick] add cast trt convert (#44837) · 7cdce09b
  由 ccrrong 提交于 8月 04, 2022
```
* add cast trt convert

* skip cast trt convert when input dtype is bool

* code format

* fix bug

* update unittest

* fix bug
```
  7cdce09b
03 8月, 2022 1 次提交
- Y
  Adjust the relative error of QR's grad (#44785) · 627e5bd5
  由 Yulong Ao 提交于 8月 03, 2022
```
* Adjust the relative error of QR's grad (#42221)

* Fix the format
```
  627e5bd5
02 8月, 2022 2 次提交
- Y
  Pass NVIDIA_TF32_OVERRIDE to internal (#43646) (#44796) · e7547ca7
  由 Yuang Liu 提交于 8月 02, 2022
```
Co-authored-by: Ngongweibao <gongweibao@baidu.com>
```
  e7547ca7
- C
  Fix operator type record in profiler [cherry-pick PR44582] (#44654) · 6de20581
  由 chenjian 提交于 8月 02, 2022
```
* fix record event for operator type in new dygraph (#44582)

* fix new dygraph record event for op

* update unit test

* fix file mode
```
  6de20581
01 8月, 2022 1 次提交

[UT]fix test_poisson op random fail (#44763) · b71833ea

由 zhouweiwei2014 提交于 8月 01, 2022

修复poisson op单测随机挂

原因：由于随机OP的无法直接验证数值正确性，该单测随机采样100万个样本，统计落到直方图各区间的数量，计算出粗略的概率密度函数，与标准概率密度函数对比，这种测试方式会有一定误差。
当采样数量越小，误差越大，因此该PR增大采样样本数量（100万->200万），误差进一步减小在rtol范围内。

b71833ea

30 6月, 2022 2 次提交

[Cherry-pick] Apply IOU to test_parallel_executor_seresnext_base_gpu … (#43925) · fde34eb8

由 Huihuang Zheng 提交于 6月 30, 2022

* [Cherry-pick] Apply IOU to test_parallel_executor_seresnext_base_gpu (#43812)
1. Fix the conflict between #43812 and current release/2.3 branch
2. test_parallel_executor_seresnext_base_gpu failed on 2 P100 GPUs with `470.82` driver.

fde34eb8

W
[Paddle Inference ]Fix emb pass for ernie3.0 (#43948) · 35abeda7
由 Wangzheee 提交于 6月 30, 2022
```
* fix emb pass for ernie3.0

* fix emb pass for ernie3.0

* fix emb pass for ernie3.0
```
35abeda7

29 6月, 2022 1 次提交

Fix elementwise_div UT by providing user defined gradients (#43536) (#43909) · 26187c27

由 Qi Li 提交于 6月 29, 2022

Cherry-pick of #43536

Backgroud in #43262

In elementwise_div UT, the numeric gradient (validation) has large relative error in comparison to analytic gradient (Paddle OP).

The default rtol for UTs is 0.005
The rtol for float32 and float64 elementwise_div OP is set to be 0.05
The rtol for float16 and bfloat16 elementwise_div OP is set to be 1.0

The relative error is too large, so this PR provides user defined gradients to test elementwise_div followed by the analytic method.

26187c27

28 6月, 2022 1 次提交

[Docs] Fix doc of kaiming initializer (#43823) (#43827) · 63458e5b

由 Jackwaterveg 提交于 6月 28, 2022

* Update kaiming.py

* Update initializer.py

* fix doc bug;test=document_fix

* fix doc;test=document_fix

* Update initializer.py

* Update kaiming.py

* for ci;test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>

63458e5b

27 6月, 2022 2 次提交

G
[cherry-pick]Update quantization round and clip calculation methods (#43829) · ff70a269
由 Guanghua Yu 提交于 6月 27, 2022
```
* update quantization clip and round

* fix quantization clip and round Attribute

* fix typo
```
ff70a269

[Cherry-pick] Fix incompatible error for place type (#43830) · 9e776f62

由 Chen Weihang 提交于 6月 27, 2022

* Create Tensor by paddle::empty  in custom operator (#41840)

* create tensor by empty in custom op

* fix some bug

* update relu custom op demo (#43173)

* Fix incompatible error for custom op Placetype (#43749)

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

9e776f62

24 6月, 2022 2 次提交

[cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa

由 Aganlengzi 提交于 6月 24, 2022

* Use all sitepackages path as the library/include path (#42940)

* Fix several unit tests and increase the unit tests stability (#43670)

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

* Fix 3 unittest errors (#43532)

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

* Use fixed random seed (#43659)

* for CI test_collective_sendrecv_api
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
Co-authored-by: NShijie <505749828@qq.com>

9edbe4aa

K
[cherry pick] fix structure infos conflict in static return_list mode (#43691) · e700ffdc
由 Kaipeng Deng 提交于 6月 24, 2022
```
* fix structure infos conflict in static return_list mode. test=develop

* fix format. test=develop

* fix format. test=develop
```
e700ffdc

23 6月, 2022 2 次提交
- L
  
  remove slowing down pass (#43750) · 096eb801
  由 lidanqing 提交于 6月 23, 2022
  
  096eb801
- W
  
  [cherry pick][Inference]Enhance gpu multihead matmul v3 fuse pass (#43765) · 94bacb47
  由 WJJ1995 提交于 6月 23, 2022
  
  94bacb47
22 6月, 2022 7 次提交

Cherry pick 43307 (#43618) · d0bbf46c

由 ccrrong 提交于 6月 22, 2022

* add bilinear_interp_v2 converter

* update op_teller.cc

* add unittest for bilinear_interp_v2 converter

* code format

* bug fix

* code format and add unittest

* remove merged modify in op_teller.cc

* code format

* code format

* fix scale init error

d0bbf46c

J
[Cherry-pick]to Release/2.3, Improve MSRAInitializer (#43721) · 1aafc31b
由 Jackwaterveg 提交于 6月 22, 2022
```
* fix conflict

* improve the doc
```
1aafc31b

Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df

由 Yiqun Liu 提交于 6月 22, 2022

cherry-pick #42750。

QA反馈，#42750 优化后，solov2模型性能可提升6%，故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下，该pr不在release/2.3分支中，故将#42750 中python修改同步到fluid.layers.tensor.linspace中。

4dcfc6df

Cherry-pick PR#43237 from deveop (#43685) · e90dfaf7

由 shiyutang 提交于 6月 22, 2022

* merge_release_and_dev

* merge_release_dev

* update

* Use tempfile to place the temporary files (#43237)

* tempfile_fix

* update

* fix_CI

* update_word2vec.inference.model

* remove_change_in_word2vec_book

* fix_word2vec_book

* rm_affine

* update

e90dfaf7

Z
fix the bug that _DataLoaderIterMultiProcess use time to generate the seed (#43318) (#43702) · f4c42389
由 Zhang Ting 提交于 6月 22, 2022
```
 fix the bug that _DataLoaderIterMultiProcess use time to generate the seed

cherry-pick #43318
```
f4c42389

set_state_dict not use state_dict hook (#43407) (#43711) · 0fb66355

由 zhangbo9674 提交于 6月 22, 2022

在 amp-o2功能开发过程中，为了支持指定网络存储数据类型的功能，添加state_dict hook功能，但是在Layer的set_state_dict是通过state_dict获取网络参数并加载的，hook接口的存在导致 set_state_dict无法加载到原本网络参数。
本pr通过增加hook控制开关，在set_state_dict中禁用hook解决该问题。

详见pr43407

0fb66355

[FIx bug]layer to 'NoneType' object has no attribute 'place' (#43597) (#43717) · 0b879318

由 zhangbo9674 提交于 6月 22, 2022

bug：
当class Layer的_buffers中有参数为None的时候，调用to()方法将会报layer to 'NoneType' object has no attribute 'place'的错误。
修复方法：
to()方法增加对_buffers中None类型参数的判断，如果为None，跳过该参数的处理。

0b879318

21 6月, 2022 2 次提交
- J
  [Cherry-pick ] to Release/2.3, Add prefetch_factor in dataloader (#43674) · af415bc2
  由 Jackwaterveg 提交于 6月 21, 2022
```
* fix usage of prefetch_factor

* add assert

* add docstring and change prefetch_factor when num_workers=0

* fix doc
```
  af415bc2
- G
  [cherry pick #43088 #40664] Add float16 to fake quantize/dequantize OP (#43689) · 9783e887
  由 Guanghua Yu 提交于 6月 21, 2022
```
* cherry pick #43088 #40664

* fix clang format
```
  9783e887
20 6月, 2022 5 次提交
- [cherry-pick]to Release/2.3,modify scale op xpu unittest (#43657) · 6262efb5
  由 z8hanghuan 提交于 6月 20, 2022
```
* modify xpu.cmake,*test=kunlun (#41832)

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* support bilstm,*test=kunlun

* [cherry-pick]support multi_layer of bilstm,*test=kunlun

* [cherry-pick]refactor sum unit test,*test=kunlun (#43561)
```
  6262efb5
- X
  [Cherry pick] Einsum memory optimization PR #43397 (#43554) · 638b69dc
  由 xiongkun 提交于 6月 20, 2022
```
* cherry pick from #43397

* fix code
```
  638b69dc
- S
  
  fix unittest (#43609) (#43617) · 68d5c12b
  由 Shang Zhizhou 提交于 6月 20, 2022
  
  68d5c12b
- Z
  
  place all save/load path into temporary directory (#43652) · a5ccc713
  由 zhaoyingli 提交于 6月 20, 2022
  
  a5ccc713
- Z
  [Cherry-Pick] place all save/load path into temporary directory (#43316) (#43651) · 0f16ccf5
  由 zhaoyingli 提交于 6月 20, 2022
```
* place all save/load path into temporary directory

* rm no need unittest
```
  0f16ccf5
17 6月, 2022 2 次提交

Y

cherry pick 43581 (#43596) · 2eb60ddb
由 YuanRisheng 提交于 6月 17, 2022

2eb60ddb

[cherry-pick 2.3] Cherry parallel fused transformer api (#43505) · 19b87aec

由 WangXi 提交于 6月 17, 2022

* Rename dropout is test (#43098)

* replace dropout_is_test with is_test.
* improve atol on a100.

* fused_attention fused_feedforward api support Model Tensor Parallel (#42985)

* fix is_test bug in fused_feedforward. (#43508)
Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>

19b87aec

16 6月, 2022 3 次提交

[cherry pick] Unit test with tempfile to place the temporary files (#43522) · 1a660c8a

由 zhangbopd 提交于 6月 16, 2022

Use tempfile for unit test & custom op test to replace temporary files to ensure that all temporary files will be deleted normally after a single measurement, avoiding the usage of disk files.
The PR only involves single-test and op test modifications and does not affect existing functionality.
Release/2.3 branch modified in PR43521;

1a660c8a

Q
[Cherry-pick] Fix ut tempfile v23 (#43387) · 24843fcb
由 Qi Li 提交于 6月 16, 2022
```
* fix unit test temp file, test=develop (#43155)

* add cleanup code, test=develop (#43305)
```
24843fcb

[Cherry-pick] Fix numpy 1.20+ deprecation warnings (#43513) · 689e0999

由 Qi Li 提交于 6月 16, 2022

* Fix numpy 1.20+ deprecation warnings (#42929)

* Replace np.bool/np.bool8 with np.bool_

* Replace np.object with np.object_

* Replace np.complex with np.complex128

* Replace np.float with np.float64

* Replace np.int with np.int_

* Rerun pre-commit for newer pre-commit configuration

* Use builtin bool instead of np.bool_ based on the context

* fix mode dtype
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>

689e0999

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致