提交 · 65f7fa0dbeccc5be8e6f9a6cfad422fff60659ea · BaiXuePrincess / Paddle

27 12月, 2021 14 次提交

Z
Refine clip_by_global_norm (#38209) · 65f7fa0d
由 zhangbo9674 提交于 12月 27, 2021
```
* refine clip

* delete unused code

* refine logic for clip
```
65f7fa0d
S
[BugFix]Fix bug in pfp16 in DataParallel (#38378) · e8e47581
由 ShenLiang 提交于 12月 27, 2021
```
* fix bug in pfp16

* fix hip

* fix hip
```
e8e47581
B

update mkldnn matmul_transpose_reshape fuse pass ut (#38467) · 9cfdae91
由 baoachun 提交于 12月 27, 2021

9cfdae91

add matmulv2_transpose_reshape_pass ut (#37416) · f664a533

由 baoachun 提交于 12月 27, 2021

* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update ut

* update ut

f664a533

fix renorm (#38459) · b0c7144a

由 seemingwang 提交于 12月 27, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import

* add renorm to init.py
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

b0c7144a

L
add device-agnostic stream class (#38391) · 6b5e33b4
由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
6b5e33b4
S

refine float16 implementation (#38439) · 78375990
由 sneaxiy 提交于 12月 27, 2021

78375990
S

refine CUDA Graph (#38401) · 5f7e4a21
由 sneaxiy 提交于 12月 27, 2021

5f7e4a21

Support multi-outputs feature for broadcast ops (#38329) · 89d38f55

由 limingshu 提交于 12月 27, 2021

* No harm to KP

* Pass the compile stage

* change the WriteData function

* fix template bugs and pass ctest of current elementwise

* for passing partial template specialization of tempalte function in CI-ROCm

* To make 'WriteData' funtion flexible.

* a less harmful way to support multi-output

* a less harmful way to support multi-output

89d38f55

C

remove npu related impl (#38428) · f1d56b77
由 Chen Weihang 提交于 12月 26, 2021

f1d56b77
C
[PTen] Move cast kernel impl (#38382) · 1fb734d7
由 Chen Weihang 提交于 12月 26, 2021
```
* rename to api to copy_to

* revert needless change

* polish format
```
1fb734d7
B

add attr check for infer in batch_norm_act mkldnn fuse pass (#38443) · 04527ee3
由 baoachun 提交于 12月 27, 2021

04527ee3
G

gelu using normcdf for cudnn (#38450) · 37022482
由 Guoxia Wang 提交于 12月 27, 2021

37022482
Z
[AMP] Fix amp.decorate bug: parameters for non leaf layers cannot be decotated (#38402) · 5d902954
由 zhangbo9674 提交于 12月 27, 2021
```
* fix bug

* refine code

* refine code

* refine code
```
5d902954

26 12月, 2021 5 次提交
- C
  [PTen] Move copy kernel impl (#38421) · 73819658
  由 Chen Weihang 提交于 12月 26, 2021
```
* add register general kernel marco

* move copy kernel impl

* revert needless change

* polish details

* fix xpu compil faild

* fix xpu compile failed

* polish format
```
  73819658
- C
  
  auto parse kernel deps by include (#38438) · e5c7ca48
  由 Chen Weihang 提交于 12月 26, 2021
  
  e5c7ca48
- Z
  
  improve forward performace (#38279) · acef85b2
  由 Zhang Ting 提交于 12月 26, 2021
  
  acef85b2
- C
  Fix renorm op include error and format error (#38451) · e6c3f64f
  由 Chen Weihang 提交于 12月 25, 2021
```
* remove needless header

* remove needless header

* adjust header order
```
  e6c3f64f
- Z
  [Unify Tensors PR #2] Replaced pten::LoD with paddle::framework::LoD (#38275) · bbe879fc
  由 Zhanlue Yang 提交于 12月 26, 2021
```
* Replaced pten::LoD with paddle::framework::LoD

* Overrided CPUVector with CUDAVector

* Refactored paddle::framework::Vector
```
  bbe879fc
24 12月, 2021 21 次提交

C

add is dense tensor method (#38424) · 6ff3596e
由 Chen Weihang 提交于 12月 24, 2021

6ff3596e

add nansum api to math (#38137) · 6554cc10

由 wangguanqun 提交于 12月 24, 2021

* add nansum api

* delete layerhelper

* add nansum to all and tensor_method_func

* update doc

* update doc

* update doc

6554cc10

renorm op (#38130) · 6982871d

由 seemingwang 提交于 12月 24, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

6982871d

T
add gradient unittest and update code example for max/min (#38393) · ee69f437
由 Tao Luo 提交于 12月 24, 2021
```
* add gradient unittest and update code example for max/min

* update docs

* remove _get_reduce_all_value
```
ee69f437
Z

[AMP] Add multi_precision for sgd (#38231) · a4d07bb9
由 zhangbo9674 提交于 12月 24, 2021

a4d07bb9

[pten] combine reduce_cuda codes (#38328) · 08941eda

由 chentianyu03 提交于 12月 24, 2021

* combine reduce_cuda codes

* support float16 in pten redcue_mean

* replace ReduceCudaKernel impl with pten reduce impl

* mv reduce funcs into reduce_cuda_impl

* rm unsed codes and headers

* mv GetReduceDim into reduce_cuda_impl

* recover GetReduceDim in reduce_op.h

* add new dispatch macro

* fix pool op output not inited and cause transform to pten::denseTensor error

* fix output tensor not initialized error

* rename new dispatch macro and format code style

* rm reduce_functor_op.h file

08941eda

L

set env for test_standalone_executor (#38430) · 5ab6ebaf
由 Leo Chen 提交于 12月 24, 2021

5ab6ebaf
J

[Auto Paralle] partitioner refactor (#37853) · c4fdb057
由 JZ-LIANG 提交于 12月 24, 2021

c4fdb057
Z

new API inner&outer (#37706) · b463dff4
由 zhiboniu 提交于 12月 24, 2021

b463dff4

[Unify Tensors PR ] Replaced pten::Allocation with... · 42cf2bee

由 Zhanlue Yang 提交于 12月 24, 2021

[Unify Tensors PR #1] Replaced pten::Allocation with shared_ptr<memory::Allocation> for Storage (#38301)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

42cf2bee

Z
[heterps]move pre-init id logic from common_sparse_table to sparse_geo_table (#38173) · 52329f6f
由 zmxdream 提交于 12月 24, 2021
```
* remove pre-init id in common_sparse_tabl.cc
```
52329f6f
add new API/OP:paddle.Tensor.exponential_ (#38256) · 33185000
由 zhouweiwei2014 提交于 12月 24, 2021
```
* add new API/OP:paddle.Tensor.exponential_

* fix CI
```
33185000
[MLU]add mlu op interface (#38241) · c396ee65
由努力努力在努力丶提交于 12月 24, 2021
```
* [MLU]add mlu op interface

* [MLU]fix alpha of activation op
```
c396ee65
Y
add pull gpups sparse op (#37124) · 572b3e90
由 yaoxuefeng 提交于 12月 24, 2021
```
 add pull gpups sparse op
```
572b3e90
B

fix share buffer to (#38407) · 9409ff6b
由 Baibaifan 提交于 12月 24, 2021

9409ff6b
王

[infrt] fix infrt script bug and function error. test=develop (#38384) · 4b3d5195
由王明冬提交于 12月 24, 2021

4b3d5195
C

add register general kernel marco (#38409) · fc0a50aa
由 Chen Weihang 提交于 12月 23, 2021

fc0a50aa
Z

Add new API cholesky_solve (#38167) · 39f7c41f
由 zhiboniu 提交于 12月 24, 2021

39f7c41f
add new API/OP: paddle.poisson (#38117) · bcf86e5c
由 zhouweiwei2014 提交于 12月 24, 2021
```
* add new API/OP:paddle.poisson

* fix comment
```
bcf86e5c

[Dy2stat]Fix error when calling sublayer's non-forward func in dy2stat (#37296) · 7339a124

由 0x45f 提交于 12月 24, 2021

* fix error when calling sublayer's non-forward func in dy2stat

* fix circular import using an inelegant way

* deal with parameters

* remove param_guard in __call__

* remove comment

* fix error when jit.load

* rename block var

* remove wrong code

* add unit test

7339a124

A
[Dy2Stat]Consider InputSpec.name to calculate Cachekey hash id (#38273) · 8e6d5d2b
由 Aurelius84 提交于 12月 24, 2021
```
* Consider InputSpec.name to calculate Cachekey hash id

* fix function
```
8e6d5d2b

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致