提交 · 76d527e17f8b9f1d22c5381964cb9d50206a9907 · BaiXuePrincess / Paddle

09 2月, 2022 3 次提交

Add a Sparse Op: to_sparse_csr (#39333) · 76d527e1

由 zhangkaihuo 提交于 2月 09, 2022

* implement AllocateFrom

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

* sparse_csr_to_dense

* test to_sparse_coo: csr_to_coo

* fix writing error

* to_sparse_csr: dense_to_sparse_csr and sparse_coo_to_csr

* fix check shape

* fix unit test

* replace CUDADeviceContext by GPUContext

76d527e1

T

Fix operator== for float16 (#39400) · e606b44a
由 Tomasz Socha 提交于 2月 09, 2022

e606b44a

Move norm to pten (#39324) · ece200b3

由 hong 提交于 2月 09, 2022

* add norm cpu

* update code;

* norm bug fix

* move norm op to pten; test=develop

* move norm op to pten; test=develop

* add norm util; test=develop

* fix norm npu bug; test=develop

* fix norm kernel bug; test=develop

* move kernel args to pten; test=develop

* move kernel args to pten sig; test=develop

ece200b3

08 2月, 2022 20 次提交

Y

INFRT/Add pten dialect (4th PR) (#39374) · 3990e0bb
由 Yan Chunwei 提交于 2月 08, 2022

3990e0bb
S
Make Embedding layer support more int ids type (#39381) · 60f1461a
由 sneaxiy 提交于 2月 08, 2022
```
* add more int id type support for embedding

* add ut

* add more ut

* fix ci error
```
60f1461a

Add FuseOptimizerPass and test_dist_fuse_adam_pass unittest. (#39208) · ccdcfa2d

由 hlygit66666 提交于 2月 08, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Add FuseOptimizerPass and fuse_adam_pass unittest

* add sgd and momentum unittest

* add fuse_optimizer_pass

* close amp

* close amp

* update

* fix run on two cards

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

ccdcfa2d

Y

Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39388) · f71241b9
由 Yiqun Liu 提交于 2月 08, 2022

f71241b9
J
[Bug fix] Fixed handling of one of the cases in the quantization process (#39342) · e4d475ea
由 joanna.wozna.intel 提交于 2月 08, 2022
```
* Fix quantization next op findings

* Corrections according to the review
```
e4d475ea

Fix to #38126 (#39097) · f884edb9

由 Jacek Czaja 提交于 2月 08, 2022

* - 38126 potential fix

* - fix

* - build fix

* - another candidate fix

* - compilation fix

* - another fix

* - Fix to activation of NHWC being first oneDNN op in chain on oneDNN ops

* - compilation fix

* - added NHWC reotating for elementwise being first op

* - compilation fix

* - compilation fix

* - Added UT

* - cosmetic fixes

f884edb9

Update op support gpu impl (#39386) · ba882657

由 hong 提交于 2月 08, 2022

* find gpu kernel in pten factory; test=develop

* check in functional kernel first; test=develop

ba882657

ps optimize refactor (#38982) · 196dbfc2

由 ziyoujiyi 提交于 2月 08, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

196dbfc2

Z
[bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
由 zhangbo9674 提交于 2月 08, 2022
```
* add concat & split

* add concat kernel

* add concat unittest

* add split unittest
```
de0bad2a
W
[PTEN] Update gpu_context. (#39359) · 24103cbb
由 Wilber 提交于 2月 08, 2022
```
* gpu_context..

* update

* update

* update
```
24103cbb
Y
add downpour_ctr_accessor (#39341) · 6d4d774d
由 yaoxuefeng 提交于 2月 08, 2022
```
add downpour_ctr_accessor
```
6d4d774d
Y

add ctr_double_accessor (#39377) · 6097aefb
由 yaoxuefeng 提交于 2月 08, 2022

6097aefb
N
Replace clip, bce_loss, full and full_like with elementwise (#39197) · 424700ff
由 niuliling123 提交于 2月 08, 2022
```
* Replace clip, bce_loss, full and full_like with elementwise
```
424700ff

[PTen] Support SelectedRows in execution and remove scale OpKernel and InferShape (#39351) · 41eb2595

由 Chen Weihang 提交于 2月 08, 2022

* adapt selectedrows in execution

* impl selected rows branch

* support selectedrow in infershape utils

* fix device compile failed

* fix new exe test failed

* revert some changes

41eb2595

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

C
Fix reduce_sum dtype dispatch bug on gpu (#39349) · 4d7ad277
由 Chen Weihang 提交于 2月 08, 2022
```
* fix pten reduce dispatch bug

* add cast beforce reduce

* fix test failed
```
4d7ad277
L

[bf16] change bf16 print behavior (#39370) · 96964ff8
由 Leo Chen 提交于 2月 08, 2022

96964ff8
Y

INFRT/Add infershape schedule (3rd PR) (#39290) · eacfc1eb
由 Yan Chunwei 提交于 2月 08, 2022

eacfc1eb
Z

Fixed automatic codegen issues with grad attr_map (#39358) · 65805227
由 Zhanlue Yang 提交于 2月 08, 2022

65805227
S
Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
由 sneaxiy 提交于 2月 08, 2022
```
* hack custom op

* add ut

* skip windows ci
```
5c3873f6

07 2月, 2022 6 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
- S
  
  add _ptr for tensor (#39357) · 24b2e8e6
  由 sneaxiy 提交于 2月 07, 2022
  
  24b2e8e6
- Y
  
  INFRT/Refine TensorMap (2nd PR) (#39262) · ed0990e7
  由 Yan Chunwei 提交于 2月 07, 2022
  
  ed0990e7
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
- Z
  
  Enabled final state Eager Dygraph Codegen (#39355) · e15e4ed0
  由 Zhanlue Yang 提交于 2月 07, 2022
  
  e15e4ed0
- C
  [CustomOp] Support output as input argument of kernel func (#39353) · f1f74e9e
  由 Chen Weihang 提交于 2月 07, 2022
```
* refactor custom op kernel func and utils

* add output sync

* adapte tensor* in utils

* fix windows symbol error
```
  f1f74e9e
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
04 2月, 2022 2 次提交
- Z
  【Pten】Support data transform in C++ API (#39263) · dcff7fa8
  由 zyfncg 提交于 2月 04, 2022
```
* add data_transform in pten api

* support GetKernelTypeForVar

* fix complie problem of bfloat16

* change error namespace

* add complex type transform unittest

* fix merge conflict
```
  dcff7fa8
- C
  
  remove unchanged infermeta new (#39343) · 0dccdee0
  由 Chen Weihang 提交于 2月 04, 2022
  
  0dccdee0
02 2月, 2022 3 次提交
- Z
  
  Fix fc_mkldnn format issue (#38890) · 633c71c2
  由 Zuza 提交于 2月 02, 2022
  
  633c71c2
- C
  [PTen] Remove kernel alias name (#39321) · 5dc20c27
  由 Chen Weihang 提交于 2月 02, 2022
```
* remove kernel alias name

* fix depreacted error

* fix deprecated failed

* fix mean error

* resolve conflict

* fix windows failed
```
  5dc20c27
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
30 1月, 2022 5 次提交

Z
geo memory sparse table (#39250) · 9b3b53ba
由 zhaocaibei123 提交于 1月 30, 2022
```
* geo depends

* add memory geo table

* fix
```
9b3b53ba

Add a Sparse OP:sparse_csr_to_coo (#39266) · bafea65c

由 zhangkaihuo 提交于 1月 30, 2022

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

* sparse_csr_to_dense

* test to_sparse_coo: csr_to_coo

* fix writing error

bafea65c

[PTen] Change all InferMeta functions (#39222) · 7e29cea9

由 Chen Weihang 提交于 1月 30, 2022

* change unary infermeta

* change other infermeta

* change all infermeta format

* resolve conflit

* fix test failed

* resolve reshape conflit

* fix compile failed

* adapt auto api gen

* fix reshape failed

* fix concat failed

* resolve conflict

7e29cea9

Add a Sparse OP : to_sparse_coo (#39264) · 78132fe1

由 zhangkaihuo 提交于 1月 30, 2022

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

78132fe1

L

delete FLAGS_run_pten_kernel (#39330) · 2d6d6fa1
由 Leo Chen 提交于 1月 30, 2022

2d6d6fa1

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致