提交 · c0cf5cb735261d4bd48a951758dd78a745a3b9ca · PaddlePaddle / Paddle

28 6月, 2022 6 次提交

Apply IOU to test_parallel_executor_seresnext_base_gpu (#43812) · c0cf5cb7

由 Ming-Xu Huang 提交于 6月 28, 2022

1. test_parallel_executor_seresnext_base_gpu failed on 2 P100 GPUs with `470.82` driver.
```
======================================================================
FAIL: test_seresnext_with_learning_rate_decay (test_parallel_executor_seresnext_base_gpu.TestResnetGPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/paddle/paddle/build/python/paddle/fluid/tests/unittests/test_parallel_executor_seresnext_base_gpu.py", line 32, in test_seresnext_with_learning_rate_decay
    self._compare_result_with_origin_model(
  File "/opt/paddle/paddle/build/python/paddle/fluid/tests/unittests/seresnext_test_base.py", line 56, in _compare_result_with_origin_model
    self.assertAlmostEquals(
AssertionError: 6.8825445 != 6.882531 within 1e-05 delta (1.335144e-05 difference)
----------------------------------------------------------------------
```
2. To be more accuracte on evaluating loss convergence, we proposed to apply IOU as metric, instead of comparing first and last loss values.
3. As offline discussion, we also evaluated convergence on P100 and A100 in 1000 interations to make sure this UT have the same convergence property on both devices. The curves are showed below.
![A100-Single, P100-Single and Diff (1)](https://user-images.githubusercontent.com/13541238/175461920-25df6101-6dd8-4387-862c-d1c8e9299c57.png)

c0cf5cb7

F

[MLU]add mlu kernel for where_index op (#43720) · ec5f8cfd
由 fuyou765 提交于 6月 28, 2022

ec5f8cfd
【Sparse】add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec) (#43668) · 5161a047
由 zhouweiwei2014 提交于 6月 28, 2022
```
* [Sparse]add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec)

* fix CI
```
5161a047
M

[ASP] fix some bugs of asp (#43853) · 6aeb60aa
由 minghaoBD 提交于 6月 28, 2022

6aeb60aa
Z

fix squeeze2/unsqueeze2 unittest, *test=kunlun (#43859) · b34b54db
由 zhangxiaoci 提交于 6月 28, 2022

b34b54db

Add forward_gradients api and enable high-order differentiation for Jacobian/Hessian (#43354) · a97a8dd1

由 Xiaoxu Chen 提交于 6月 28, 2022

* enable Jacobian,Hessian supporting new autograd

* fix prim mode failed in PR-CI-Windows

* add forward_gradients api

* add forward_gradients api

* skip test_autograd_functional_prim in windows ci

* fix test_autograd_funciton_prim timeouot

* remove the block parameter in prim2orig method

* remove duplicate to_tensors code snippet # test=allcases

a97a8dd1

27 6月, 2022 4 次提交
- A
  [Dy2Stat]Refactor convert_shape transformer logic (#43846) · d82d5b8c
  由 Aurelius84 提交于 6月 27, 2022
```
* [Dy2Stat]Refactor convert_shape transformer logic

* clean usless unittest
```
  d82d5b8c
- W
  [Eager] Rename EagerPyLayer to PyLayer (#43696) · a5dc0a79
  由 wanghuancoder 提交于 6月 27, 2022
```
* rename eagerpylayer
```
  a5dc0a79
- A
  [CustomDevice]add custom place supports (#43813) · 7f22ef54
  由 Aganlengzi 提交于 6月 27, 2022
```
* [CustomDevice]add custom place supports

* sync format
```
  7f22ef54
- A
  
  [Dy2Stat]Enhance nonlocal machanism while nonlocal vars is empty (#43848) · 40a77319
  由 Aurelius84 提交于 6月 27, 2022
  
  40a77319
24 6月, 2022 12 次提交

Fix hang bug of TCPStore (#43724) · 4c9330d6

由 gongweibao 提交于 6月 24, 2022

* tmp fix

* init

* compile ok

* compile ok

* add vlogs

* add test

* fix termination error

* add testfile

* add

* fix window compile

* fix window compile

* fix windows compile

* fix windows compile

* fix windows compile

* fix windows compile

* fix windows compile

* fix windows compile

* fix kunlun compile

* fix compilation

* fix compilation

* fix compilation

* tmp fix

* add windows

* add windows

* add more logs

* change timeout to protected

* SB

* add

* add

* fix timeout

* add

* fix test

* fix test

* fix test

* fix ut

* fix ut

* fix ut

4c9330d6

G

fix quantization clip and round Attribute (#43764) · 491b87b4
由 Guanghua Yu 提交于 6月 24, 2022

491b87b4

[ Dy2Static ] Add closure analysis for control flow and add some unittest (#43713) · 69717717

由 xiongkun 提交于 6月 24, 2022

* add closure analysis for control flow and add some unittest

* finetune the design of FunctionScopeVisitor

* fix

* fix python check

* fix code by code review

69717717

C
add slice plugin int32 support (#43808) · af97b310
由 ccrrong 提交于 6月 24, 2022
```
* add slice plugin int32 support
```
af97b310
[Sparse] support batch compute of SparseTensor matmul/masked_matmul/softmax (#43703) · eec4e034
由 zhouweiwei2014 提交于 6月 24, 2022

eec4e034
F

[MLU]add mlu kernel for set_value op (#43687) · fa9586a7
由 fuyou765 提交于 6月 24, 2022

fa9586a7

modify xpu unittest to support fp64, *test=kunlun (#43772) · 89c783db

由 z8hanghuan 提交于 6月 24, 2022

* modify xpu unittest to support fp64, *test=kunlun

* modify xpu unittest to support fp64 for KL2, *test=kunlun

* modify xpu unittest to support fp64, *test=kunlun

* modify xpu unittest to support fp64, *test=kunlun

89c783db

C

add mlu label_smooth kernel (#43743) · 7d15f930
由 cifar10 提交于 6月 24, 2022

7d15f930

Fix incompatible error for custom op Placetype (#43749) · 03972d5a

由 Chen Weihang 提交于 6月 24, 2022

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api

03972d5a

[MLU]add mlu kernel for tril_triu (#43444) · 73e3fc96
由光明和真理提交于 6月 24, 2022

73e3fc96
C

add UTs for mlu interp_v2(nearest). (#43709) · d1a53649
由 Chenxiao Niu 提交于 6月 24, 2022

d1a53649

[Auto Parallel] Use a fast completion for data parallelism (#43585) · e64823c1

由 Yulong Ao 提交于 6月 24, 2022

* [Auto Parallel] Use a fast completion for data parallelism

* remove unuse cuSparse function

* [Auto Parallel] Fix some bugs of the fast dp completion

* [Auto Parallel] Add the cmake statements

* [Auto Parallel] Make the unittest adapt to the new interface

* [Auto Parallel] Modify the timeout of the unittest

* [Auto Parallel] Remove unnecessary comments
Co-authored-by: Nzhouwei25 <zhouwei25@baidu.com>

e64823c1

23 6月, 2022 12 次提交

N

improve LayoutAutoTune for NCHW and NHWC (#43158) · 69e99cc7
由 niuliling123 提交于 6月 23, 2022

69e99cc7
M

【Hackathon No.56 57 58 59】sparse elementwise add sub mul div (#41857) · e3d94fc5
由 Matsumoto Ruko 提交于 6月 23, 2022

e3d94fc5
T

xpu-paddlepaddle-30 [任务] dropout paddle单测, test=kunlun (#43716) · cefbf800
由 taixiurong 提交于 6月 23, 2022

cefbf800
L

Fix elementwise_div UT by providing user defined gradients (#43536) · d4b44015
由 Leo Chen 提交于 6月 23, 2022

d4b44015
Z
Support setting version for api in yaml (#43771) · 766f4dcb
由 zyfncg 提交于 6月 23, 2022
```
* move trace into api.yaml

* add trace unittest

* fix trace test

* fix generate op
```
766f4dcb
Z

add float_only for layer_to (#43760) · f80cee11
由 zhangbo9674 提交于 6月 23, 2022

f80cee11

[Dy2Stat]Support nonlocal mechanism in IF ast transformer (#43666) · f9198372

由 Aurelius84 提交于 6月 23, 2022

* [Dy2Stat]Support nonlocal mechanism in IF ast transformer

* support prune return vars in cond

* fix unittest

* fix unittest

* fix static check

f9198372

C
add cast trt converter (#43447) · b6bf8994
由 ccrrong 提交于 6月 23, 2022
```
* add cast trt converter
```
b6bf8994
S

Use fixed random seed (#43659) · 8902a414
由 Shijie 提交于 6月 23, 2022

8902a414

Fix 3 unittest errors (#43532) · a9134dc2

由 Shijie 提交于 6月 23, 2022

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

a9134dc2

Fix several unit tests and increase the unit tests stability (#43670) · c41c5e63

由 zlsh80826 提交于 6月 23, 2022

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

c41c5e63

[external reviewing] Params to int8 pass (#42625) · b8b2d6a9

由 Sylwester Fraczek 提交于 6月 22, 2022

* sylwek

prototype params to int8 pass

* trying to make warmup work

* wip

* wip

* change test to cpp test

* review fixes, refactoring

* more refactoring

* add erasevars

* change test to fixture

* rename pass

and reorder erasevars and graphsaferemovenodes

* fix

* more refactoring and fixed bug

* formatting

* remove scale count

* enfroce message too short

* remove erasevars

erasevars couldbe cauuse of memory issues

some other fixes

* add count of successfull fuses to name of new nodes

* FindVar -> GetVar and use ConvResidual pattern

* use tensor->clear() instead of new variable

* Update paddle/fluid/framework/ir/mkldnn/params_quantization_mkldnn_pass_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/fluid/framework/ir/mkldnn/params_quantization_mkldnn_pass_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/fluid/inference/tests/api/analyzer_lexical_analysis_gru_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add log (review fix)c

* review fix (2 functions to one)

* code review: Conv->QuantizeConv

* revert

* fix formatting

* remove unused functions

* add paddle enforce
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

b8b2d6a9

22 6月, 2022 6 次提交
- S
  
  fix dist lamb acc issue (#43712) · 5f33dbb6
  由 sneaxiy 提交于 6月 22, 2022
  
  5f33dbb6
- C
  add arg max trt converter support dynamic shape mode (#43473) · 292b7254
  由 ccrrong 提交于 6月 22, 2022
```
* fix arg_max converter
```
  292b7254
- W
  Enhance gpu multihead matmul v3 fuse pass (#43529) · 561d09b9
  由 WJJ1995 提交于 6月 22, 2022
```
* fixed multihead matmul fuse pass

* Add unittests

* rm scale op

* fixed code style

* fixed code style

* resolve testcase falied

* add note
```
  561d09b9
- Z
  [inference] add slice trt layer (#43648) · fcc8a87b
  由 zhoutianzi666 提交于 6月 22, 2022
```
* add fc, multihead_mul, shape tensor infer, slice
```
  fcc8a87b
- Z
  
  Fix batch csr (#43708) · d41a9373
  由 zhangkaihuo 提交于 6月 22, 2022
  
  d41a9373
- T
  Add gpups test in PR-CI-GpuPS (#43592) · fc5a85b0
  由 tianshuo78520a 提交于 6月 22, 2022
```
* test=gpups
```
  fc5a85b0

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功