提交 · a1174973032cd84619ff342f695077132c5f5b53 · PaddlePaddle / Paddle

11 2月, 2022 13 次提交

L
Optimize bilinear interpolation foward (#39243) · a1174973
由 Lijunhui 提交于 2月 11, 2022
```
* bilinear_fw init

* optimize code

* pre-compute linear_interp input index
```
a1174973
J

fix prelu trt convert (#39389) · c86765ed
由 JingZhuangzhuang 提交于 2月 11, 2022

c86765ed
Z

get build time (#39368) · 72ad280b
由 zhangchunle 提交于 2月 11, 2022

72ad280b
Z

fix compilation warning on mac (#39438) · be8ab0ea
由 zhangkaihuo 提交于 2月 11, 2022

be8ab0ea

[PTen] Move grad GetExpectedPtenKernelArgs into pten (#39418) · 667bd962

由 Chen Weihang 提交于 2月 11, 2022

* move grad get expected pten kernel args

* fix reduce sum error

* fix element_sub_grad failed

* revert kernel judge change

667bd962

统一 ps 开发 - python (#39431) · 22c67d14

由 ziyoujiyi 提交于 2月 11, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

22c67d14

[Paddle Inference] support ernie quant model with interleaved (#39424) · 1c44d3e2

由 Wangzheee 提交于 2月 11, 2022

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

* support ernie quant model with interleaved

1c44d3e2

Add log for executor (#39459) · 7e52beae

由 liutiexing 提交于 2月 11, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor
Co-authored-by: Nliutiexing <liutiexing@google.com>

7e52beae

L

[new-exec] set type of op-kernel op by place (#39458) · 7392578d
由 Leo Chen 提交于 2月 11, 2022

7392578d

add print pten kernel tool (#39371) · 8803f6bb

由 Shang Zhizhou 提交于 2月 11, 2022

* test=document_fix;add print pten kernel tool

* test=document_fix

* test=document_fix

* test=document_fix

* test=document_fix

* add print_pten_kernels tool

* add print_pten_kernels tool

* fix windows complie

* notest,test=rocm_ci

* add merge tool

* add comments

8803f6bb

Add profiler node tree implementation (#39316) · f38c2e5c

由 chenjian 提交于 2月 11, 2022

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

f38c2e5c

Z
Support different dtypes of inputs for elementwise ops (#38859) · bf305033
由 Zhang Ting 提交于 2月 11, 2022
```
* improve backward performance

* support different dtypes for elementwise ops
```
bf305033

【Pten】Auto-Generate InterMeta register (#39436) · 7d6096ff

由 zyfncg 提交于 2月 11, 2022

* fix code conflict

* generate inter_meta register

* clear cache

* just try

* add sign c++ api

* polish some code

7d6096ff

10 2月, 2022 21 次提交
- 0
  [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (#39245) · 1252f4bb
  由 0x45f 提交于 2月 10, 2022
```
* refine Assign

* add UT
```
  1252f4bb
- F
  [MLU] add mlu kernel for accuracy op (#39337) · 383de295
  由 fwenguang 提交于 2月 10, 2022
```
* [MLU] add mlu kernel for accuracy op

* fix license format

* fix error message
```
  383de295
- F
  [NPU] add reduce_min (#39019) · 2b8b16d7
  由 furnace 提交于 2月 10, 2022
```
[NPU] add reduce_min
```
  2b8b16d7
- T
  share MemOptVarInfos of external variables into cinn_launch subgraph (#39209) · 35b03e1c
  由 TeFeng Chen 提交于 2月 10, 2022
```
* add a graph pass to share MemOptVarInfos of external variables into subgraph

* update pass name

* fix compile failed

* add share_mem_opt_info_to_subgraph_pass test

* share_mem_opt_info_to_subgraph_pass_test pass

* modify some codes for better style and more robust

* update cmake
```
  35b03e1c
- W
  change dtype of pooling mask to 'int32' for Paddle2ONNX (#39314) · 29d31606
  由 Wei Shengyu 提交于 2月 10, 2022
```
* change dtype of pooling mask to 'int32' for Paddle2ONNX

* empty commit to rerun ci

* fix format
```
  29d31606
- Z
  Added python-c code generation for final state Eager Dygraph (#39233) · 43f84d0f
  由 Zhanlue Yang 提交于 2月 10, 2022
```
* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Added python-c code generation for final state Eager Dygraph

* Fixed minor issue

* Fixed yaml.load() method failure

* Fixed minor issues

* Refactored Python-C Attributes Parsing Functions

* Fixed minor issue with Python-C AddFunctions

* Fixed issues from merge

* Fixed merge issues
```
  43f84d0f
- Z
  
  fix check error of ResetHolder (#39439) · f7a3389e
  由 zyfncg 提交于 2月 10, 2022
  
  f7a3389e
- C
  
  【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference (#37162) · 238f3c8e
  由 chenyanlann 提交于 2月 10, 2022
  
  238f3c8e
- H
  move Masked select to pten (#39193) · e2ad433b
  由 hong 提交于 2月 10, 2022
```
* move masked select cpu kernel

* add masked selected gpu kernel; test=develop

* fix bugs; test=develop

* bug fix; test=develop

* bug fix; test=develop

* add namespace to set mask array; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* fix ddim bug; test=develop

* fix npu op bug; test=develop

* fix xpu dependecy bug; test=develop

* move kernel args to sig.cc; test=develop
```
  e2ad433b
- W
  
  fix compile error on jetson (#39441) · 8b58862a
  由 Wilber 提交于 2月 10, 2022
  
  8b58862a
- W
  mkldnn layout issue fix (#39422) · 52d6b306
  由 wenbin 提交于 2月 10, 2022
```
* mkldnn conv fix

* definetion
```
  52d6b306
- S
  Add _get_parameter method to Lamb optimizer (#39416) · c47d6729
  由 sneaxiy 提交于 2月 10, 2022
```
* add _get_parameter func to lamb

* remove duplicate code
```
  c47d6729
- Z
  
  Refactored Python-C Attributes Parsing Functions (#39328) · 32d79bb9
  由 Zhanlue Yang 提交于 2月 10, 2022
  
  32d79bb9
- Z
  【Pten】Refactor C++ API code-gen (#39408) · 7b70b792
  由 zyfncg 提交于 2月 10, 2022
```
* refactor C++ API code-gen

* fix windows problem of C++ API
```
  7b70b792
- C
  Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (#38425) · 224bc511
  由 crystal 提交于 2月 10, 2022
```
* optimize conv1d forward

* add conv opt

* Optimize memory copy

* delete share data with

* set num_filters=512

* add nlc optimize

* Optimize num_filter=512 data on A100 and V100

* Fix the workspace_size size setting of filter
```
  224bc511
- Z
  [bf16] add bf16 kernel: squeeze & unsqueeze & stack (#39402) · 59c7aea5
  由 zhangbo9674 提交于 2月 10, 2022
```
* add squeeze unsqueeze stack

* add unittest

* add cpu kernel
```
  59c7aea5
- Z
  [bf16] add bf16 kernel: dropout & reshape & slice (#39395) · e8ac7fc3
  由 zhangbo9674 提交于 2月 10, 2022
```
* add dropout

* add reshape

* add slice

* refien slice unittest

* refine slice unittest

* add cpu bf16 kernel
```
  e8ac7fc3
- L
  [pten] update isnan registration (#39419) · 14ed2f54
  由 Leo Chen 提交于 2月 10, 2022
```
* update isnan registration

* fix compile
```
  14ed2f54
- C
  [PTen] Add standard kernel suffix set (#39404) · c7c1db33
  由 Chen Weihang 提交于 2月 10, 2022
```
* add standard_suffix_set_and_remove_reshape_with_xshape

* revert reshape change

* polish reduce name
```
  c7c1db33
- A
  
  [PluggableDevice] custom kernel supports multi cpp_dtype registering (#39385) · 63d2333e
  由 Aganlengzi 提交于 2月 10, 2022
  
  63d2333e
- Z
  Fix code conflict of empty dev_api (#39430) · 2a5d858c
  由 zyfncg 提交于 2月 10, 2022
```
* fix code conflict

* clear cache

* just try
```
  2a5d858c
09 2月, 2022 6 次提交

Z
【Pten】Adjust the Empyt dev_api (#39143) · 9d4d0c3b
由 zyfncg 提交于 2月 09, 2022
```
* adjust the Empyt dev_api

* fix merge conflict

* fix sparse_utils_kernel
```
9d4d0c3b

Fix trace conflict (#39421) · 87f4a681

由 hong 提交于 2月 09, 2022

* add trace op

* bug fix

* bug fix; test=develop

* thrust bug fix; test=develop

* remove useless register; test=develop

* fix bug; test=develop

* update trace kernel; test=develop

* move kernel args to trace_sig; test=develop

* try to fix trace kernel conflict; test=develop

87f4a681

Z
Optimize performance of softmax_fwd when axis!=-1 (#38602) · 8e1b0204
由 Zhang Zheng 提交于 2月 09, 2022
```
* Optimize performence of softmax_fwd when axis!=-1

* use functor

* support hip

* fix functor
```
8e1b0204
B

optimize sharding stage3 offload (#39397) · b292dfb8
由 Baibaifan 提交于 2月 09, 2022

b292dfb8
L
[pten] fit pten for amp (#39403) · c5affb78
由 Leo Chen 提交于 2月 09, 2022
```
* fit pten for amp

* fix typo
```
c5affb78

[Paddle-Inference] rebuild matmul pass: trt and gpu_cpu (#39369) · db7d129e

由 Wangzheee 提交于 2月 09, 2022

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

db7d129e

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功