提交 · 196dbfc282908d036c5aed28a0cdcc796de0e2b4 · OPTHREE / Paddle

08 2月, 2022 7 次提交

ps optimize refactor (#38982) · 196dbfc2

由 ziyoujiyi 提交于 2月 08, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

196dbfc2

Z
[bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
由 zhangbo9674 提交于 2月 08, 2022
```
* add concat & split

* add concat kernel

* add concat unittest

* add split unittest
```
de0bad2a
W

fix sequential bug about key error, test=develop (#39372) · 0fee0044
由 wanghuancoder 提交于 2月 08, 2022

0fee0044
B

optimize sharding stage3 (#39334) · 23d559dd
由 Baibaifan 提交于 2月 08, 2022

23d559dd
C
Fix reduce_sum dtype dispatch bug on gpu (#39349) · 4d7ad277
由 Chen Weihang 提交于 2月 08, 2022
```
* fix pten reduce dispatch bug

* add cast beforce reduce

* fix test failed
```
4d7ad277
L

[bf16] support printing bf16 tensor (#39375) · f57b21e6
由 Leo Chen 提交于 2月 08, 2022

f57b21e6
S
Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
由 sneaxiy 提交于 2月 08, 2022
```
* hack custom op

* add ut

* skip windows ci
```
5c3873f6

07 2月, 2022 5 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
- S
  
  add _ptr for tensor (#39357) · 24b2e8e6
  由 sneaxiy 提交于 2月 07, 2022
  
  24b2e8e6
- A
  Update BF16 amp list (#39304) · 0c43ce22
  由 arlesniak 提交于 2月 07, 2022
```
* amp list updated

* tests updated

* gray list updated

* amp list updated

* test updated
```
  0c43ce22
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
- C
  [CustomOp] Support output as input argument of kernel func (#39353) · f1f74e9e
  由 Chen Weihang 提交于 2月 07, 2022
```
* refactor custom op kernel func and utils

* add output sync

* adapte tensor* in utils

* fix windows symbol error
```
  f1f74e9e
04 2月, 2022 1 次提交

【Pten】Support data transform in C++ API (#39263) · dcff7fa8

由 zyfncg 提交于 2月 04, 2022

* add data_transform in pten api

* support GetKernelTypeForVar

* fix complie problem of bfloat16

* change error namespace

* add complex type transform unittest

* fix merge conflict

dcff7fa8

30 1月, 2022 5 次提交

add multinomial probability distribution (#38820) · 01f606b4

由 Xiaoxu Chen 提交于 1月 30, 2022

* add multinomial probability distribution
* fix categorical sample bug when logits less than zero
* fix categorical sample can't pass hypothesis test and entropy shape error bug

01f606b4

Z
geo memory sparse table (#39250) · 9b3b53ba
由 zhaocaibei123 提交于 1月 30, 2022
```
* geo depends

* add memory geo table

* fix
```
9b3b53ba

[PTen] Change all InferMeta functions (#39222) · 7e29cea9

由 Chen Weihang 提交于 1月 30, 2022

* change unary infermeta

* change other infermeta

* change all infermeta format

* resolve conflit

* fix test failed

* resolve reshape conflit

* fix compile failed

* adapt auto api gen

* fix reshape failed

* fix concat failed

* resolve conflict

7e29cea9

F

[MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
由 fwenguang 提交于 1月 30, 2022

aecf9967

[pten] fit get all register op kernels (#39288) · eefe5feb

由 Leo Chen 提交于 1月 30, 2022

* upgrade _get_all_register_op_kernels

* add ut

* support xpu/npu

* fix device id

* enhance TransToFluidPlace

* fix compile

eefe5feb

29 1月, 2022 7 次提交

R

fix paddle.where broadcast bug (#39182) · 92253f11
由 ronnywang 提交于 1月 29, 2022

92253f11

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

Symbolic Hessian (#39221) · 64e7c715

由 Tongxin Bai 提交于 1月 29, 2022

* [autograd] static Jacobian pass tests.

* [autograd] apply CR suggested changes.

* [autograd] more tests.

* [autograd] add CPUPlace in tests.

* [autograd] bug fixes.

* [autograd] reformatted.

* [autograd] adding Hessian, in progress.

* [autograd] Hessian passes. A double grad bug fixed.

* [autograd] fix renaming conflict in double backward pass.

* [autograd] polish test.s

* fix a bug when using brackets

* debug for ci

* [autograd] fixing Hessian test.

* polish format.
Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>
Co-authored-by: Nlevi131 <limaolin01@baidu.com>

64e7c715

G

fix FakeQuantAbsMax in QAT (#39307) · 34d97c57
由 Guanghua Yu 提交于 1月 29, 2022

34d97c57

Auto parallel/qkv fuse (#39080) · fdedf909

由 JZ-LIANG 提交于 1月 29, 2022

* support qkv fuse

* support qkv fuse

* update completion

* update completion

* update dist_split

* rerun ci

* is_auto_compatible added

* is_auto_compatible added

fdedf909

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836

Add FuseReluDepthwiseConvPass and unittest (#39105) · 9d6e8202

由 hlygit66666 提交于 1月 29, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Update test_dist_fuse_relu_depthwise_conv_pass.py

9d6e8202

28 1月, 2022 7 次提交
- Z
  
  Added Eager Dygraph support for user_defined_grads (#39309) · 76103c88
  由 Zhanlue Yang 提交于 1月 28, 2022
  
  76103c88
- Z
  
  recovery code (#39287) · 45f9c9eb
  由 zhangkaihuo 提交于 1月 28, 2022
  
  45f9c9eb
- R
  
  fix optimizer docs (#39297) · ffa7ff9c
  由 Roc 提交于 1月 28, 2022
  
  ffa7ff9c
- F
  [PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886
  由 Fan Zhang 提交于 1月 28, 2022
```
* [PSLIB] Add Metrics Module, Support User-defined Add Metric

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* modify role_maker

* update CMakeLists.txt
```
  2e6be886
- B
  
  fix_stage2_minimize (#39285) · 90f44c6f
  由 Baibaifan 提交于 1月 28, 2022
  
  90f44c6f
- Z
  
  Auto-geneate kernel signature in C++ API (#39281) · fc5fa0de
  由 zyfncg 提交于 1月 28, 2022
  
  fc5fa0de
- W
  Resolve unit-test timeout issues (#39292) · 543f3dea
  由 Weilong Wu 提交于 1月 28, 2022
```
* implement AllocateFrom

* fix PR-CI-Coverage timeout in 120s
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
```
  543f3dea
27 1月, 2022 8 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

[PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215

由 Aganlengzi 提交于 1月 27, 2022

* [Demo] custom kernel based on pten kernel

* merge and npu custom work well

* del comments

* delete other code

* fix CUDAContext

* fix not found small_vector.h

* support NPU

* fix NPUContext

* fix DeviceContext support

* add UT

* fix call

* add UT

* fix

* fix for comments and ut

* add MACRO control

* fix multi input output

* support env CUSTOM_DEVICE_ROOT

* deal with special cases

* fix for Windows

* try coverage with test_custom_kernel_dot.py

* fix test_custom_kernel_dot

* fix test_custom_kernel_dot

* fix merge

* fix merge

* fix CI

* update

* merge and fix

* remove WITH_CUSTOM_KERNEL

* fix merge

* merge and fix

* fix ut

* fix ut for mac

* add more UT

* add more UT

* fix

a8879215

fix UT test_lr_scheduler random fail (#39254) · 7e6a2190
由 zhouweiwei2014 提交于 1月 27, 2022

7e6a2190

Update passes in quant2_int8_mkldnn_pass (#38912) · 0e235e58

由 joanna.wozna.intel 提交于 1月 27, 2022

* Upadate pass in quant2_int8_mkldnn_pass

* Back to the previous scale_matmul order

* Change place of cpu_quantize_placement_pass

0e235e58

W
fix shuffle_channel_detect_pass (#39242) · af9ddeb7
由 wenbin 提交于 1月 27, 2022
```
* shuffle channel pass

* add ut

* timeout fix

* makefile fix
```
af9ddeb7
C
【Auto Parallel】Update Planner (#39201) · f2226441
由 caozhou 提交于 1月 27, 2022
```
* update planner

* update unitest

* update dist matmul

* update auto converter
```
f2226441

optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb

由 QingshuChen 提交于 1月 27, 2022

* optimize kunlun/xpu softmax_with_cross_entropy add add unitest
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

2b9bb8bb

OPTHREE / Paddle 与 Fork 源项目一致

OPTHREE / Paddle
与 Fork 源项目一致