提交 · 945a3ce9e7ba4d39578aa196861da0290edfeea2 · PaddlePaddle / Paddle

09 2月, 2022 5 次提交

Replace EagerTensor with Tensor (#39376) · 945a3ce9

由 Jiabin Yang 提交于 2月 09, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

945a3ce9

Move trace op to pten (#39227) · d7dddf94

由 hong 提交于 2月 09, 2022

* add trace op

* bug fix

* bug fix; test=develop

* thrust bug fix; test=develop

* remove useless register; test=develop

* fix bug; test=develop

* update trace kernel; test=develop

* move kernel args to trace_sig; test=develop

d7dddf94

C
[CustomOp] Fix slice bug of custom op (#39393) · 91b074a2
由 Chen Weihang 提交于 2月 09, 2022
```
* fix slice bug of cusstom op

* add offset in check
```
91b074a2
S

add more int type support for softmax_with_cross_entropy (#39409) · eaa3fd45
由 sneaxiy 提交于 2月 09, 2022

eaa3fd45

Move norm to pten (#39324) · ece200b3

由 hong 提交于 2月 09, 2022

* add norm cpu

* update code;

* norm bug fix

* move norm op to pten; test=develop

* move norm op to pten; test=develop

* add norm util; test=develop

* fix norm npu bug; test=develop

* fix norm kernel bug; test=develop

* move kernel args to pten; test=develop

* move kernel args to pten sig; test=develop

ece200b3

08 2月, 2022 9 次提交

S
Make Embedding layer support more int ids type (#39381) · 60f1461a
由 sneaxiy 提交于 2月 08, 2022
```
* add more int id type support for embedding

* add ut

* add more ut

* fix ci error
```
60f1461a

Add FuseOptimizerPass and test_dist_fuse_adam_pass unittest. (#39208) · ccdcfa2d

由 hlygit66666 提交于 2月 08, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Add FuseOptimizerPass and fuse_adam_pass unittest

* add sgd and momentum unittest

* add fuse_optimizer_pass

* close amp

* close amp

* update

* fix run on two cards

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Create test_dist_fuse_sgd_pass.py

* Update test_dist_fuse_adam_pass.py

* Update test_dist_fuse_momentum_pass.py

* Update test_dist_fuse_sgd_pass.py

ccdcfa2d

ps optimize refactor (#38982) · 196dbfc2

由 ziyoujiyi 提交于 2月 08, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

196dbfc2

Z
[bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
由 zhangbo9674 提交于 2月 08, 2022
```
* add concat & split

* add concat kernel

* add concat unittest

* add split unittest
```
de0bad2a
W

fix sequential bug about key error, test=develop (#39372) · 0fee0044
由 wanghuancoder 提交于 2月 08, 2022

0fee0044
B

optimize sharding stage3 (#39334) · 23d559dd
由 Baibaifan 提交于 2月 08, 2022

23d559dd
C
Fix reduce_sum dtype dispatch bug on gpu (#39349) · 4d7ad277
由 Chen Weihang 提交于 2月 08, 2022
```
* fix pten reduce dispatch bug

* add cast beforce reduce

* fix test failed
```
4d7ad277
L

[bf16] support printing bf16 tensor (#39375) · f57b21e6
由 Leo Chen 提交于 2月 08, 2022

f57b21e6
S
Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
由 sneaxiy 提交于 2月 08, 2022
```
* hack custom op

* add ut

* skip windows ci
```
5c3873f6

07 2月, 2022 5 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
- S
  
  add _ptr for tensor (#39357) · 24b2e8e6
  由 sneaxiy 提交于 2月 07, 2022
  
  24b2e8e6
- A
  Update BF16 amp list (#39304) · 0c43ce22
  由 arlesniak 提交于 2月 07, 2022
```
* amp list updated

* tests updated

* gray list updated

* amp list updated

* test updated
```
  0c43ce22
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
- C
  [CustomOp] Support output as input argument of kernel func (#39353) · f1f74e9e
  由 Chen Weihang 提交于 2月 07, 2022
```
* refactor custom op kernel func and utils

* add output sync

* adapte tensor* in utils

* fix windows symbol error
```
  f1f74e9e
04 2月, 2022 1 次提交

【Pten】Support data transform in C++ API (#39263) · dcff7fa8

由 zyfncg 提交于 2月 04, 2022

* add data_transform in pten api

* support GetKernelTypeForVar

* fix complie problem of bfloat16

* change error namespace

* add complex type transform unittest

* fix merge conflict

dcff7fa8

30 1月, 2022 5 次提交

add multinomial probability distribution (#38820) · 01f606b4

由 Xiaoxu Chen 提交于 1月 30, 2022

* add multinomial probability distribution
* fix categorical sample bug when logits less than zero
* fix categorical sample can't pass hypothesis test and entropy shape error bug

01f606b4

Z
geo memory sparse table (#39250) · 9b3b53ba
由 zhaocaibei123 提交于 1月 30, 2022
```
* geo depends

* add memory geo table

* fix
```
9b3b53ba

[PTen] Change all InferMeta functions (#39222) · 7e29cea9

由 Chen Weihang 提交于 1月 30, 2022

* change unary infermeta

* change other infermeta

* change all infermeta format

* resolve conflit

* fix test failed

* resolve reshape conflit

* fix compile failed

* adapt auto api gen

* fix reshape failed

* fix concat failed

* resolve conflict

7e29cea9

F

[MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
由 fwenguang 提交于 1月 30, 2022

aecf9967

[pten] fit get all register op kernels (#39288) · eefe5feb

由 Leo Chen 提交于 1月 30, 2022

* upgrade _get_all_register_op_kernels

* add ut

* support xpu/npu

* fix device id

* enhance TransToFluidPlace

* fix compile

eefe5feb

29 1月, 2022 6 次提交

R

fix paddle.where broadcast bug (#39182) · 92253f11
由 ronnywang 提交于 1月 29, 2022

92253f11

Symbolic Hessian (#39221) · 64e7c715

由 Tongxin Bai 提交于 1月 29, 2022

* [autograd] static Jacobian pass tests.

* [autograd] apply CR suggested changes.

* [autograd] more tests.

* [autograd] add CPUPlace in tests.

* [autograd] bug fixes.

* [autograd] reformatted.

* [autograd] adding Hessian, in progress.

* [autograd] Hessian passes. A double grad bug fixed.

* [autograd] fix renaming conflict in double backward pass.

* [autograd] polish test.s

* fix a bug when using brackets

* debug for ci

* [autograd] fixing Hessian test.

* polish format.
Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>
Co-authored-by: Nlevi131 <limaolin01@baidu.com>

64e7c715

G

fix FakeQuantAbsMax in QAT (#39307) · 34d97c57
由 Guanghua Yu 提交于 1月 29, 2022

34d97c57

Auto parallel/qkv fuse (#39080) · fdedf909

由 JZ-LIANG 提交于 1月 29, 2022

* support qkv fuse

* support qkv fuse

* update completion

* update completion

* update dist_split

* rerun ci

* is_auto_compatible added

* is_auto_compatible added

fdedf909

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836

Add FuseReluDepthwiseConvPass and unittest (#39105) · 9d6e8202

由 hlygit66666 提交于 1月 29, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Update test_dist_fuse_relu_depthwise_conv_pass.py

9d6e8202

28 1月, 2022 7 次提交
- Z
  
  Added Eager Dygraph support for user_defined_grads (#39309) · 76103c88
  由 Zhanlue Yang 提交于 1月 28, 2022
  
  76103c88
- Z
  
  recovery code (#39287) · 45f9c9eb
  由 zhangkaihuo 提交于 1月 28, 2022
  
  45f9c9eb
- R
  
  fix optimizer docs (#39297) · ffa7ff9c
  由 Roc 提交于 1月 28, 2022
  
  ffa7ff9c
- F
  [PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886
  由 Fan Zhang 提交于 1月 28, 2022
```
* [PSLIB] Add Metrics Module, Support User-defined Add Metric

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* modify role_maker

* update CMakeLists.txt
```
  2e6be886
- B
  
  fix_stage2_minimize (#39285) · 90f44c6f
  由 Baibaifan 提交于 1月 28, 2022
  
  90f44c6f
- Z
  
  Auto-geneate kernel signature in C++ API (#39281) · fc5fa0de
  由 zyfncg 提交于 1月 28, 2022
  
  fc5fa0de
- W
  Resolve unit-test timeout issues (#39292) · 543f3dea
  由 Weilong Wu 提交于 1月 28, 2022
```
* implement AllocateFrom

* fix PR-CI-Coverage timeout in 120s
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
```
  543f3dea
27 1月, 2022 2 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功