提交 · c7c1db33a4491dfb1833ea83379fef1d134e195f · PaddlePaddle / Paddle

10 2月, 2022 3 次提交
- C
  [PTen] Add standard kernel suffix set (#39404) · c7c1db33
  由 Chen Weihang 提交于 2月 10, 2022
```
* add standard_suffix_set_and_remove_reshape_with_xshape

* revert reshape change

* polish reduce name
```
  c7c1db33
- A
  
  [PluggableDevice] custom kernel supports multi cpp_dtype registering (#39385) · 63d2333e
  由 Aganlengzi 提交于 2月 10, 2022
  
  63d2333e
- Z
  Fix code conflict of empty dev_api (#39430) · 2a5d858c
  由 zyfncg 提交于 2月 10, 2022
```
* fix code conflict

* clear cache

* just try
```
  2a5d858c
09 2月, 2022 28 次提交
- Z
  【Pten】Adjust the Empyt dev_api (#39143) · 9d4d0c3b
  由 zyfncg 提交于 2月 09, 2022
```
* adjust the Empyt dev_api

* fix merge conflict

* fix sparse_utils_kernel
```
  9d4d0c3b
- H
  Fix trace conflict (#39421) · 87f4a681
  由 hong 提交于 2月 09, 2022
```
* add trace op

* bug fix

* bug fix; test=develop

* thrust bug fix; test=develop

* remove useless register; test=develop

* fix bug; test=develop

* update trace kernel; test=develop

* move kernel args to trace_sig; test=develop

* try to fix trace kernel conflict; test=develop
```
  87f4a681
- Z
  Optimize performance of softmax_fwd when axis!=-1 (#38602) · 8e1b0204
  由 Zhang Zheng 提交于 2月 09, 2022
```
* Optimize performence of softmax_fwd when axis!=-1

* use functor

* support hip

* fix functor
```
  8e1b0204
- B
  
  optimize sharding stage3 offload (#39397) · b292dfb8
  由 Baibaifan 提交于 2月 09, 2022
  
  b292dfb8
- L
  [pten] fit pten for amp (#39403) · c5affb78
  由 Leo Chen 提交于 2月 09, 2022
```
* fit pten for amp

* fix typo
```
  c5affb78
- W
  [Paddle-Inference] rebuild matmul pass: trt and gpu_cpu (#39369) · db7d129e
  由 Wangzheee 提交于 2月 09, 2022
```
* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu
```
  db7d129e
- N
  
  Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad (#39255) · 772be4f5
  由 niuliling123 提交于 2月 09, 2022
  
  772be4f5
- 石
  
  infershaped autogen (PR #1 ), test=develop (#39405) · b3e049f8
  由石晓伟提交于 2月 09, 2022
  
  b3e049f8
- [MLU] add mlu kernel for c_comm_init op (#39364) · 1bd7a143
  由 mhhhh1 提交于 2月 09, 2022
  
  1bd7a143
- F
  
  [MLU] add gaussian_random mlu kernel (#39338) · c35b4b8e
  由 fwenguang 提交于 2月 09, 2022
  
  c35b4b8e
- F
  
  [mlu] add mlu kernel for momentum op (#39331) · f8ba12e5
  由 fwenguang 提交于 2月 09, 2022
  
  f8ba12e5
- F
  
  [mlu] add mlu kernel for elementwise_add (#39313) · d47a511a
  由 fwenguang 提交于 2月 09, 2022
  
  d47a511a
- J
  Replace EagerTensor with Tensor (#39376) · 945a3ce9
  由 Jiabin Yang 提交于 2月 09, 2022
```
* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer
```
  945a3ce9
- Z
  Add a Sparse Op to_dense (#39335) · aca86470
  由 zhangkaihuo 提交于 2月 09, 2022
```
* implement AllocateFrom

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

* sparse_csr_to_dense

* test to_sparse_coo: csr_to_coo

* fix writing error

* to_sparse_csr: dense_to_sparse_csr and sparse_coo_to_csr

* fix check shape

* fix unit test

* to_dense: sparse_coo_to_dense, sparse_csr_to_dense

* replace CUDADeviceContext by GPUContext
```
  aca86470
- Y
  
  Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39387) · 6354f81c
  由 Yiqun Liu 提交于 2月 09, 2022
  
  6354f81c
- H
  Move trace op to pten (#39227) · d7dddf94
  由 hong 提交于 2月 09, 2022
```
* add trace op

* bug fix

* bug fix; test=develop

* thrust bug fix; test=develop

* remove useless register; test=develop

* fix bug; test=develop

* update trace kernel; test=develop

* move kernel args to trace_sig; test=develop
```
  d7dddf94
- C
  [CustomOp] Fix slice bug of custom op (#39393) · 91b074a2
  由 Chen Weihang 提交于 2月 09, 2022
```
* fix slice bug of cusstom op

* add offset in check
```
  91b074a2
- L
  
  [pten] fix typo, muliply_raw -> multiply_raw (#39391) · f810d755
  由 Leo Chen 提交于 2月 09, 2022
  
  f810d755
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
- H
  update basic infrastructure (#39383) · b12e7a17
  由 hong 提交于 2月 09, 2022
```
* update basic infrastructure; support string,  suport vecotr<int>, add tensor args type index; test=develop

* remove useless code; test=develop

* fix bug; test=develop

* polish code; test=develop
```
  b12e7a17
- S
  
  add more int type support for softmax_with_cross_entropy (#39409) · eaa3fd45
  由 sneaxiy 提交于 2月 09, 2022
  
  eaa3fd45
- Z
  
  Modify the implementation of BlockYReduce to fit more scenes (#39170) · 8d87b3bc
  由 Zhang Zheng 提交于 2月 09, 2022
  
  8d87b3bc
- N
  
  Delete BASE_SIZE in elementwise_base.h (#39390) · b007a031
  由 niuliling123 提交于 2月 09, 2022
  
  b007a031
- H
  
  convert paddle model to mlir paddle dialect (#39216) · 2be20e20
  由 huzhiqiang 提交于 2月 08, 2022
  
  2be20e20
- Q
  
  [MLU]fix compile and add cncl (#39394) · a7d08db9
  由 qipengh 提交于 2月 09, 2022
  
  a7d08db9
- Z
  Add a Sparse Op: to_sparse_csr (#39333) · 76d527e1
  由 zhangkaihuo 提交于 2月 09, 2022
```
* implement AllocateFrom

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

* sparse_csr_to_dense

* test to_sparse_coo: csr_to_coo

* fix writing error

* to_sparse_csr: dense_to_sparse_csr and sparse_coo_to_csr

* fix check shape

* fix unit test

* replace CUDADeviceContext by GPUContext
```
  76d527e1
- T
  
  Fix operator== for float16 (#39400) · e606b44a
  由 Tomasz Socha 提交于 2月 09, 2022
  
  e606b44a
- H
  Move norm to pten (#39324) · ece200b3
  由 hong 提交于 2月 09, 2022
```
* add norm cpu

* update code;

* norm bug fix

* move norm op to pten; test=develop

* move norm op to pten; test=develop

* add norm util; test=develop

* fix norm npu bug; test=develop

* fix norm kernel bug; test=develop

* move kernel args to pten; test=develop

* move kernel args to pten sig; test=develop
```
  ece200b3
08 2月, 2022 9 次提交

Y

INFRT/Add pten dialect (4th PR) (#39374) · 3990e0bb
由 Yan Chunwei 提交于 2月 08, 2022

3990e0bb
S
Make Embedding layer support more int ids type (#39381) · 60f1461a
由 sneaxiy 提交于 2月 08, 2022
```
* add more int id type support for embedding

* add ut

* add more ut

* fix ci error
```
60f1461a

Add FuseOptimizerPass and test_dist_fuse_adam_pass unittest. (#39208) · ccdcfa2d

由 hlygit66666 提交于 2月 08, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Add FuseOptimizerPass and fuse_adam_pass unittest

* add sgd and momentum unittest

* add fuse_optimizer_pass

* close amp

* close amp

* update

* fix run on two cards

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

ccdcfa2d

Y

Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39388) · f71241b9
由 Yiqun Liu 提交于 2月 08, 2022

f71241b9
J
[Bug fix] Fixed handling of one of the cases in the quantization process (#39342) · e4d475ea
由 joanna.wozna.intel 提交于 2月 08, 2022
```
* Fix quantization next op findings

* Corrections according to the review
```
e4d475ea

Fix to #38126 (#39097) · f884edb9

由 Jacek Czaja 提交于 2月 08, 2022

* - 38126 potential fix

* - fix

* - build fix

* - another candidate fix

* - compilation fix

* - another fix

* - Fix to activation of NHWC being first oneDNN op in chain on oneDNN ops

* - compilation fix

* - added NHWC reotating for elementwise being first op

* - compilation fix

* - compilation fix

* - Added UT

* - cosmetic fixes

f884edb9

Update op support gpu impl (#39386) · ba882657

由 hong 提交于 2月 08, 2022

* find gpu kernel in pten factory; test=develop

* check in functional kernel first; test=develop

ba882657

ps optimize refactor (#38982) · 196dbfc2

由 ziyoujiyi 提交于 2月 08, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

196dbfc2

Z
[bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
由 zhangbo9674 提交于 2月 08, 2022
```
* add concat & split

* add concat kernel

* add concat unittest

* add split unittest
```
de0bad2a

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功