提交 · 35f949b5ca6dbc984a53fe99b78f9f3973a7aaf2 · Crayon鑫 / Paddle

27 1月, 2022 10 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

fix UT test_lr_scheduler random fail (#39254) · 7e6a2190
由 zhouweiwei2014 提交于 1月 27, 2022

7e6a2190
W
fix shuffle_channel_detect_pass (#39242) · af9ddeb7
由 wenbin 提交于 1月 27, 2022
```
* shuffle channel pass

* add ut

* timeout fix

* makefile fix
```
af9ddeb7
C
【Auto Parallel】Update Planner (#39201) · f2226441
由 caozhou 提交于 1月 27, 2022
```
* update planner

* update unitest

* update dist matmul

* update auto converter
```
f2226441

optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb

由 QingshuChen 提交于 1月 27, 2022

* optimize kunlun/xpu softmax_with_cross_entropy add add unitest
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

2b9bb8bb

C
【Auto Parallel】update dist param grad for pass (#38941) · cac6f408
由 caozhou 提交于 1月 27, 2022
```
* update dist param grad for pass

* update unitest

* update unitests

* fix conflict
```
cac6f408

[Paddle-Inference]: fix concat slice (#39096) · f080e8d5

由 Wangzheee 提交于 1月 27, 2022

* Paddle-Inference:fix_concat_slice

* Paddle-Inference:fix_concat_slice

* Paddle-Inference:fix_concat_slice

* Paddle-Inference:fix_concat_slice

* [Paddle-Inference]: fix concat slice

* [Paddle-Inference]: fix concat slice

* [Paddle-Inference]: fix concat slice

f080e8d5

H
Take/Put_along_axis more input size support (#39072) · 41a64351
由 huangxu96 提交于 1月 27, 2022
```
Support the cases that the indices shape size is larger than the arr shape size
```
41a64351
Z
[Optimizer] Add master weight for opt state_dict (#39121) · 3e6950d5
由 zhangbo9674 提交于 1月 27, 2022
```
* add master weight for opt state_dict

* check empty of master weight

* strict gpu test

* refine unittest
```
3e6950d5

26 1月, 2022 9 次提交

Add FuseBatchNormAddActPass and unittest. (#39178) · 801159ce

由 hlygit66666 提交于 1月 26, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* add FuseBatchNormAddActPass and unittest

* Update test_dist_fuse_bn_add_act_pass.py

* solve conflict

801159ce

[Eager] Support imperative selected_rows_to_lod_tensor and the opposite case (#39223) · 787980b1

由 Weilong Wu 提交于 1月 26, 2022

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Selected_Rows inherits from TensorBase

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again

* Use paddle/pten/core/enforce and polish code

* Support imperative selected_rows_to_lod_tensor

* Polish code

787980b1

Q
[MLU]Add conv2d op (#39110) · 71634a61
由 qipengh 提交于 1月 26, 2022
```
* [MLU]Add conv2d op

* [MLU]fix comment

* [MLU]adapt NCHW of conv2d op
```
71634a61
Y

update uts p2 (#39232) · 83d0d853
由 yaozhixin 提交于 1月 26, 2022

83d0d853
Y

update uts p3 (#39214) · eb45bb4e
由 yaozhixin 提交于 1月 26, 2022

eb45bb4e
L
Optimize layer norm forward when cols is 1024. (#39167) · 01d04be6
由 Li Min 提交于 1月 26, 2022
```
* Optimize layer_norm fwd when cols is 1024.
```
01d04be6
Y

update uts p1 (#39210) · 6efb9f59
由 yaozhixin 提交于 1月 26, 2022

6efb9f59

add sigmoid cross entropy with logits to kl2 (#38915) · fd44de58

由 houj04 提交于 1月 26, 2022

* add sigmoid cross entropy with logits to kl2. test=kunlun

* add sigmoid cross entropy with logits to kl2. test=kunlun

* follow comments. test=kunlun

fd44de58

J

sum op (#39165) · 55d6b87c
由 joeqiao12 提交于 1月 26, 2022

55d6b87c

25 1月, 2022 12 次提交
- H
  Add FuseBatchNormActPass and unittest. (#39176) · 09104d02
  由 hlygit66666 提交于 1月 25, 2022
```
* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Add fuse_bn_act_pass unittest

* rm others

* add fuse_bn_act_pass
```
  09104d02
- Z
  [inference] update trt convert reduce op&ut,test=develop (#39088) · 80753755
  由 Zhang Jun 提交于 1月 25, 2022
```
* [inference] update convert reduce op&ut,test=develop

* update

* update

* update

* add int32 support

* add int32 support

* add comments

* trt < 7.0 do not support int32

* test=develop

* update

* test=develop
```
  80753755
- J
  [MLU]add mlu kernel for fill_constant op (#39069) · 6e871dbc
  由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for fill_constant op

* delete device_context DEPS
```
  6e871dbc
- F
  
  fix:the axis must be 1(channel), when the dims of bias is 1 (#39052) · f07b8cbe
  由 feng_shuai 提交于 1月 25, 2022
  
  f07b8cbe
- F
  
  [MLU]add mlu batch_norm kernel pytest (#39071) · 55164761
  由 fwenguang 提交于 1月 25, 2022
  
  55164761
- J
  [MLU]add mlu kernel for split and concat (#39020) · ac3dc0bb
  由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for concat and split op

* delete device_context DEPS
```
  ac3dc0bb
- Y
  
  [fleet_executor] Dist model run method Implementation (#39194) · 20e23e1b
  由 Yuang Liu 提交于 1月 25, 2022
  
  20e23e1b
- H
  [Dygraph] Support param groups in grad_clip (#39175) · b0cca48e
  由 Haohongxiang 提交于 1月 25, 2022
```
* support param groups in grad_clip

* update

* modify for review
```
  b0cca48e
- T
  
  fix test_refactor_op_xpu, *test=kunlun (#39168) · 55418d3f
  由 TTerror 提交于 1月 25, 2022
  
  55418d3f
- N
  
  [pnorm] fix bug in fp16 & optimize memory (#39011) · 3825b40f
  由 Noel 提交于 1月 25, 2022
  
  3825b40f
- C
  【Auto Parallel】Update reshard for complete (#39073) · 529f1425
  由 caozhou 提交于 1月 25, 2022
```
* update reshard for newest completion

* update unitest

* merge newest
```
  529f1425
- Z
  
  Fixed CI failure with test_eager_trace_op (#39164) · 1efeec1d
  由 Zhanlue Yang 提交于 1月 25, 2022
  
  1efeec1d
24 1月, 2022 8 次提交

[autograd] static Jacobian pass tests. (#39007) · d43655ba

由 Tongxin Bai 提交于 1月 24, 2022

* [autograd] static Jacobian pass tests.

* [autograd] apply CR suggested changes.

* [autograd] more tests.

* [autograd] add CPUPlace in tests.

* [autograd] bug fixes.

* [autograd] reformatted.

d43655ba

S

fix test allreduce tests (#39166) · c00303ec
由 sneaxiy 提交于 1月 24, 2022

c00303ec
J

supports string var lod_level property (#39077) · f0a40630
由 Jiaqi Liu 提交于 1月 24, 2022

f0a40630
Z

unify compare functor (#39024) · def81b4f
由 Zhang Ting 提交于 1月 24, 2022

def81b4f
B

Add sharding stage3 offload (#38989) · 46823104
由 Baibaifan 提交于 1月 24, 2022

46823104
B

fix sharding stage2 unittest (#39112) · f4623876
由 Baibaifan 提交于 1月 24, 2022

f4623876

support sparse of adam, *test=kunlun (#38483) · e106901e

由 z8hanghuan 提交于 1月 24, 2022

* support sparse of adam, *test=kunlun

* add pre-commit-config.yaml

* support sparse of adam in KL2,*test=kunlun

* support sparse of adam in KL2, *test=kunlun

* modify xpu.cmake, *test=kunlun

* support sparse of adam, rm some wait, *test=kunlun

* support sparse of adam, rm some wait, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

e106901e

Refactored python-level trace_op to call through _C_ops instead of... · c3796061

由 Zhanlue Yang 提交于 1月 24, 2022

Refactored python-level trace_op to call through _C_ops instead of Tracer::TraceOp, under eager_mode (#38338)

* Replaced core.ops with _C_ops

* Refactored python-level trace_op to call through _C_ops instead of Tracer::TraceOp, under eager_mode

* Modified trace_op interface

* Refactored trace_op logic for eager mode

* Added Eager Dygraph support for OpTest

* Fixed ci issues

* Fixed CI failures

* Fixed Coverage CI Issues

* Fixed XPU CI Issues

c3796061

23 1月, 2022 1 次提交

Support test_imperative apply and Add a setter for EagerTensor (#39016) · 8c5c1046

由 Weilong Wu 提交于 1月 23, 2022

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* eager test case

* support inference test

* refine test and fix initializer failed

* modify eagertensor patch method

* add eagertensor.clear_grandint, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* support create varbase and fix retain grad error

* call monkey_patch_varbase in _test_eager_guard, test=develop

* fix windows error

* split clear_gradient to clear_gradient and zero_grads, test=develop

* refine, test=develop

* refine, test=develop

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge

* eager

* fix some eager logic, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* patch_tensor_method_func, test=develop

* refine, test=develop

* eager test case, test=develop

* refine, test=develop

* eager, test=develop

* eager, test=develop

* eager optimizer, test=develop

* eager optimizer, test=develop

* eager test_imperative_optimizer_v2, test=develop

* eager, test=develop

* refine, test=develop

* refine, test=develop

* eager, test=develop

* add resize in share buffer to, test=develop

* eager, test=develop

* fix _share_buffer_to, test=develop

* refine, test=develop

* refine, test=develop

* support eager for dataloader,test=develop

* Exposed EagerTensor's set func to implement set_value func

* Rename set to _set_value, Supplement the corresponding test case

* fix test concat dev api build failed

* fix conflict

* fix conflict

* Use extern to Polish code
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>

8c5c1046

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致