提交 · 930163316e17b198ae44bbf51b775789e3a1672f · PaddlePaddle / Paddle

21 2月, 2022 5 次提交

[bf16] add bf16 kernel: elementwise_max (#39461) · 93016331

由 zhangbo9674 提交于 2月 21, 2022

* add elementwise_max & unittest

* refine cuda register and unittest

* refine unittest

* refine uinttest for bf16

* refine optest

* refine code

* refine unittest

* refine unittest

93016331

update unittests for activation ops on xpu test=kunlun (#39677) · 83dd7e47

由 houj04 提交于 2月 21, 2022

* update unittests for activation ops on xpu. test=kunlun

* update input data range. test=kunlun

* update input data range. test=kunlun

83dd7e47

[PTen]Remove infershape of Reshape OP (#39631) · 45dd4a5f

由 YuanRisheng 提交于 2月 21, 2022

* remove infershape and Xshape

* add xshape

* fix bugs when run ci

* fix bugs when run ci

* fix bugs when run infrt test

* pass converage

45dd4a5f

update unittest for reduce_prod op on xpu. test=kunlun (#39671) · f858b645

由 houj04 提交于 2月 21, 2022

* update unittest for reduce_prod op on xpu. test=kunlun

* update unittest for reduce_prod op on xpu. test=kunlun

* bugfix: use dtype instead of float32. test=kunlun

f858b645

S

fix alignment bug (#39747) · 65ced1fa
由 sneaxiy 提交于 2月 21, 2022

65ced1fa

20 2月, 2022 2 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

S
Add int16 support for several ops (#39636) · 267275d9
由 sneaxiy 提交于 2月 20, 2022
```
* add more op int16 support

* fix xpu ci
```
267275d9

19 2月, 2022 3 次提交

Z
Enabled test_matmul_v2_op for final state Eager Dygraph (#39504) · 77625d7d
由 Zhanlue Yang 提交于 2月 19, 2022
```
* Enabled test_matmul_v2_op for final state Eager Dygraph

* Fixed minor issue

* Fixed format issue
```
77625d7d

[Pten] Adjust the params of creation kernel for inference (#39573) · 4e5d6743

由 zyfncg 提交于 2月 19, 2022

* remove manual_api

* change sig map of full and empty

* fix fill_any_like_xpu_op

* fix fill_any_like_xpu_op

* fix problem of fill_any_like_xpu_op

* fix conflict

* polish code

4e5d6743

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

18 2月, 2022 9 次提交
- Z
  
  bug fix (#39630) · bbf31a4e
  由 zhaoyingli 提交于 2月 18, 2022
  
  bbf31a4e
- Z
  [AMP] support GPU BF16 amp for dygraph (#39029) · 7d6d3848
  由 zhangbo9674 提交于 2月 18, 2022
```
* support dtype param for auto_cast

* add amp_dtype for tracer

* add unsupported bf16 list

* support bf16 amp for O2

* refine python interface for bfloat16

* refine code

* refine code

* refine unittest

* refine code

* refine code

* add bf16 o1

* refine code by comment

* add gradient accumulator

* add recompute
```
  7d6d3848
- B
  Fix sharding group (#39668) · bc3ca678
  由 Baibaifan 提交于 2月 18, 2022
```
* fix_sharding_group

* fix_sharding_group
```
  bc3ca678
- B
  
  refactor the forward implementation of shape npu op (#39613) · e674af23
  由 baoachun 提交于 2月 18, 2022
  
  e674af23
- new way of unit test , *test=kunlun (#39650) · c5179772
  由 z8hanghuan 提交于 2月 18, 2022
```
* new way of unit test , *test=kunlun

* new way of ut, *test=kunlun
```
  c5179772
- Z
  [Pten] Support inplace and intermediate in C++ API (#39651) · 638aab6e
  由 zyfncg 提交于 2月 18, 2022
```
* support inplace and intermediate in yaml

* add cmake for dygraph_api
```
  638aab6e
- Q
  [MLU]add matmul and matmul_v2 op (#39539) · 229ec32a
  由 qipengh 提交于 2月 18, 2022
```
* [MLU]add matmul and matmul_v2 op

* [MLU] fix data_type and del matmul

* [MLU] fix compile error

* [MLU] fix ci_check error
```
  229ec32a
- J
  
  add flatten op pytest (#39534) · 50fb57c9
  由 joeqiao12 提交于 2月 18, 2022
  
  50fb57c9
- Z
  [MLU]add sync stream ops and broadcast pytest (#39518) · d2bd05b9
  由 zn 提交于 2月 18, 2022
```
* [MLU]add sync stream ops and broadcast pytest

* [MLU]fix broadcast pytest to add data type
```
  d2bd05b9
17 2月, 2022 8 次提交
- J
  
  add reshape2 op for mlu (#39562) · 2d2f11d1
  由 joeqiao12 提交于 2月 17, 2022
  
  2d2f11d1
- Z
  
  fix selected_rows bug in C++ API (#39658) · b72d4cb4
  由 zyfncg 提交于 2月 17, 2022
  
  b72d4cb4
- T
  
  refactoring where/where_index/scatter unittests for kunlun, *test=kunlun (#39619) · a3247ab5
  由 TTerror 提交于 2月 17, 2022
  
  a3247ab5
- H
  add softplus op for kunlun2. test=kunlun (#39555) · 9f99b591
  由 houj04 提交于 2月 17, 2022
```
* add softplus op for kunlun2. test=kunlun

* add softplus op for kunlun2. test=kunlun

* fix code style. test=kunlun

* fix code style. test=kunlun

* add more test cases. test=kunlun
```
  9f99b591
- W
  adaptive pool2d pass fix (#39600) · c1c5c1fc
  由 wenbin 提交于 2月 17, 2022
```
* first commit

* teller fix

* bug fix

* enable for pool2d only

* fix global_pooling issue

* pooling_type

* fix test
```
  c1c5c1fc
- B
  
  optimizer sharding paramters (#39581) · 18c6f40b
  由 Baibaifan 提交于 2月 17, 2022
  
  18c6f40b
- Q
  update kunlun label_smooth unitest (#39611) · 1f7f8561
  由 QingshuChen 提交于 2月 17, 2022
```
* update kunlun label_smooth unitest
*test=kunlun

* minor
*test=kunlun
```
  1f7f8561
- B
  update inference ut to support nhwc format (#39551) · b4d3597a
  由 baoachun 提交于 2月 17, 2022
```
* update inference ut to support nhwc format

* update ut and pass OpCompat

* update ut

* update ut
```
  b4d3597a
16 2月, 2022 13 次提交

[Eager] Support eager hook_for_layer (#39531) · a909bdf1

由 Weilong Wu 提交于 2月 16, 2022

* Update comment

* [Eager] Support test_imperative_hook_for_layer with _test_eager_guard()

* Polish code name style

* Fix a error name

* Polish code, make it clear and simple

a909bdf1

F

[MLU] fix TensorAdd for mlu (#39523) · 24b8f63e
由 fwenguang 提交于 2月 16, 2022

24b8f63e
T

optimize prior_box for kunlun, *test=kunlun (#39477) · e254e7c6
由 TTerror 提交于 2月 16, 2022

e254e7c6
F

[MLU] support adative pooling (#39500) · f138371c
由 fwenguang 提交于 2月 16, 2022

f138371c
0

[Dy2St]Refine AnnAssign in static_analysis (#39572) · eb3c7d00
由 0x45f 提交于 2月 16, 2022

eb3c7d00
A

Add ConditionalBlockGradInferVarType (#39585) · ff7e3590
由 Aurelius84 提交于 2月 16, 2022

ff7e3590

[bf16] pten matmul cuda kernel support bf16 (#39485) · d5a0d31a

由 Leo Chen 提交于 2月 16, 2022

* pten matmul cuda kernel support bf16

* fix pten kernel name

* add matmul_grad bf16 kernel

* add emptylike bf16 kernel

* fix compile

* suppport rocm

* fix error

* fix rocm

* add bf16 header file

* fix compile

d5a0d31a

EagerTensor to EagerVariable (#39447) · 831fd86e

由 Jiabin Yang 提交于 2月 16, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* add more test

* merge develop and refine code

831fd86e

T

refactor huber_loss/argsor unittests for kunlun, *test=kunlun (#39527) · f21d7957
由 TTerror 提交于 2月 16, 2022

f21d7957
Z

Test only trt group norm (#39561) · ac894ced
由 zlsh80826 提交于 2月 16, 2022

ac894ced

sync/geo test ok & fix heter_worker program ok (#39511) · b2986bab

由 ziyoujiyi 提交于 2月 16, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

* add heter 2stage unittest

* add heter 2stage unittest

* add heter 2stage unittest

* sync/geo test ok & fix heter_worker program ok

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

b2986bab

A

fix ut for pinv (#39566) · 0bcf1365
由 andyjpaddle 提交于 2月 16, 2022

0bcf1365
H
Update test_linalg_cond.py (#39574) · 644a894d
由 Haohongxiang 提交于 2月 16, 2022
```
add control of random seed in UT of cond
```
644a894d

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功