提交 · 5620214e5aba507dbe4084597be3e46eb2cb8705 · BaiXuePrincess / Paddle

30 12月, 2021 9 次提交

Y
[Auto parallel] Make sure the id semantics of every var and op unique (#38132) · 5620214e
由 Yulong Ao 提交于 12月 30, 2021
```
* [Auto parallel] Make the id of var and op unique

* [Auto Parallel] Rename back dist_context to distop_context
```
5620214e

Add cpu kernel of new api : lstsq (#38585) · ccf99b66

由 Haohongxiang 提交于 12月 30, 2021

* add cpu kernel of lstsq

* update

* modify code style

* modify unittest

* remove support for complex

ccf99b66

Support test imperative basic with fixed retain grad interface (#38548) · 2421a25a

由 Jiabin Yang 提交于 12月 30, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed

* support create varbase and fix retain grad error

* fix windows error

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

2421a25a

Added Conv2D BF16 BWD oneDNN kernel (#38507) · ed8ba011

由 jakpiase 提交于 12月 30, 2021

* working test for padding only

* added full conv2d grad kernel

* removed some trash

* minor change

* Ci fix

* format fix

ed8ba011

Z

[PSCore]Fix test fleet base 2 (#38588) · 04496d89
由 zmxdream 提交于 12月 30, 2021

04496d89

[PTen] Remove offset in storage (#38472) · a504ff3f

由 Chen Weihang 提交于 12月 29, 2021

* remove offset in storage

* revert api change

* fix custom op slice bug

* fix mutable_data error

a504ff3f

add ExponentialFamily and Dirichlet probability distribution (#38445) · 00cddf07

由 Xiaoxu Chen 提交于 12月 30, 2021

* extend Distribution baseclass for supporting multivariant distribution and prob method

* add ExponentialFamily base class and entropy using Bregman divergence

* add dirichlet probability distribution

00cddf07

add dirichlet random sample op in cpu and gpu kernel (#38244) · c5bf09bb

由 Xiaoxu Chen 提交于 12月 30, 2021

* add dirichlet sample op and cpu backend kernel

* add Dirichlet op cuda kernel  (#6)

* add dirichlet op hip kernel
Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>

c5bf09bb

Fix the bug of batch_norm and batch_norm_grad op. (#38288) · cc83c95f

由 Leo Guo 提交于 12月 30, 2021

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list.

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun
Co-authored-by: NZibin <guozibin@baidu.com>

cc83c95f

29 12月, 2021 9 次提交
- L
  
  add _nvprof_range interface (#38572) · ea01e790
  由 Leo Chen 提交于 12月 29, 2021
  
  ea01e790
- S
  [BugFix]Fix bug in obtaining parameters_buffers in layers (#38563) · ecb8c184
  由 ShenLiang 提交于 12月 29, 2021
```
* fix bug of dp in pfp16

* fix topo
```
  ecb8c184
- fix random OP failed (#38564) · 2fb1fc0d
  由 zhouweiwei2014 提交于 12月 29, 2021
  
  2fb1fc0d
- Z
  [AMP] Add BatchNorm_1D_2D_3D skip for paddle.amp.decorate (#38541) · 2ebc8f77
  由 zhangbo9674 提交于 12月 29, 2021
```
* add bn_1d_2d_3d for fp16 decorate

* add unittest
```
  2ebc8f77
- J
  [Auto Parallel] Sharding Pass (#38502) · e3faf345
  由 JZ-LIANG 提交于 12月 29, 2021
```
* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod
```
  e3faf345
- Y
  
  add top k v2 operator, test=kunlun (#38434) · d22f92ad
  由 ykkk2333 提交于 12月 29, 2021
  
  d22f92ad
- S
  
  fix reduce_max/reduce_min bug (#38476) · 995332ef
  由 Shang Zhizhou 提交于 12月 29, 2021
  
  995332ef
- H
  add timeout for matmul_xx_fuse_pass (#38544) · 20403fe9
  由 heliqi 提交于 12月 29, 2021
```
* del mkldnn options of baseline

* add timeout for matmul_scale_fuse_pass

* add timeout for matmul
```
  20403fe9
- T
  add argsort/scatter for kunlun (#38345) · 4643baa7
  由 TTerror 提交于 12月 29, 2021
```
* add argsort/scatter for kunlun

* update test_scatter

* update xpu.cmake

* update xpu.cmake

* fix scatter
```
  4643baa7
28 12月, 2021 12 次提交

Z

add new API: paddle.cov (#38392) · 85f5d264
由 zhiboniu 提交于 12月 28, 2021

85f5d264
B

update seq_concat_fc_fuse_pass ut (#38538) · 706d2c08
由 baoachun 提交于 12月 28, 2021

706d2c08

Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4

由 From00 提交于 12月 28, 2021

* fix reshape move storage error

* remove needless set type

* alloc tensor by shared storage

* Utilize StreamSafeCUDAAllocator to support fast GC in new executor

* Fix compile error for Windows and ROCm

* Fix compile error for Windows

* Modify UT stream_safe_cuda_alloc_test

* Modify UT stream_safe_cuda_alloc_test

* Rewrite fast GC

* Rewrite fast GC

* Fix compile error for BOOST_GET_CONST

* Fix compile error for BOOST_GET_CONST

* Changes default stream for StreamSafeCUDAAllocator

* Fix a small CI error

* Remove some redundant code

* Fix conflict

* Fix compile error for ROCm

* Fix Windoes CI error

* Fix CI error

* Remove some unnecessary code

* Fix CI error

* Add UT for fast GC

* Fix CI error

* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile

* Use RWLock in GetAllocator

* Fix CI error
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

0c7153a4

add matmul_to_mul matmul_v2_to_mul matmul_v2_to_matmul test case (#37645) · bed71992

由 heliqi 提交于 12月 28, 2021

* add matmul_to_mul matmul_v2_to_mul matmul_v2_to_matmul test case

* modify skip func to ignore_pass_case func

* rebuild CI

* rebuild CI

* add test_map_xx_pass timeout

* add test_map_xx_pass timeout

* merge from develop

* add timeout notest;test=coverage

* Cmakelist add timeout

* add timeout

* add attr of matmul_v2

* add trt skip

* delete trt config

* add skip,  mul diff on 3080

bed71992

Support test basic of Var and Layer (#38426) · 1fb80a6a

由 Jiabin Yang 提交于 12月 28, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed

* support create varbase and fix retain grad error

* fix windows error

* support test code coverage

* support test code coverage

* support test code coverage
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

1fb80a6a

W

fix ci problem (#38474) · 2e4cb279
由 Wilber 提交于 12月 28, 2021

2e4cb279

Add API and op for take_along_axis (#38396) · 3310f519

由 huangxu96 提交于 12月 28, 2021

* add API and op for take_along_axis

* fix compile dependency problem and add example code and doc

* add unitest

* delete some code for CI coverage

* fix code style problem

* fix as review

3310f519

T
Add Amax and Amin API (#38417) · 340dfb26
由 Tao Luo 提交于 12月 28, 2021
```
* add amax/amin

* support axis is list
```
340dfb26

add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7

由 houj04 提交于 12月 28, 2021

* add reduce_prod_xpu. fix reduce_mean_xpu bug.

* iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun

78836bb7

add mul_lstm_fuse_pass ut (#37795) · 1db61c3e

由 baoachun 提交于 12月 28, 2021

* add mul_lstm_fuse_pass ut

* update mul_lstm_fuse_pass ut

* update ut

* update ut

* update ut

* add CPU ut cmake setting

* update ut

1db61c3e

Z
add pass base unittest (#38504) · ee5f3641
由 zhaoyingli 提交于 12月 28, 2021
```
* add pass base unittest

* update gpt model
```
ee5f3641

Fix scatter_op fp16 perf problem. (#38499) · 33ce249f

由 Li Min 提交于 12月 28, 2021

* Fix scatter_op fp16 perf problem.

* Add scatter into black list.

* Add scatter into black list for dygraph.

33ce249f

27 12月, 2021 7 次提交
- fix english doc of some API (#38468) · 5b6b88ab
  由 zhouweiwei2014 提交于 12月 27, 2021
  
  5b6b88ab
- S
  
  fix bugs in fp16 for dp (#38405) · 1ab5c511
  由 ShenLiang 提交于 12月 27, 2021
  
  1ab5c511
- P
  fix accumulator bug when multiple inplace OPs are executed continuously (#38406) · 113c8b93
  由 pangyoki 提交于 12月 27, 2021
```
* fix accumulator bug

* fix unittest
```
  113c8b93
- Z
  Refine clip_by_global_norm (#38209) · 65f7fa0d
  由 zhangbo9674 提交于 12月 27, 2021
```
* refine clip

* delete unused code

* refine logic for clip
```
  65f7fa0d
- B
  
  update mkldnn matmul_transpose_reshape fuse pass ut (#38467) · 9cfdae91
  由 baoachun 提交于 12月 27, 2021
  
  9cfdae91
- B
  add matmulv2_transpose_reshape_pass ut (#37416) · f664a533
  由 baoachun 提交于 12月 27, 2021
```
* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update ut

* update ut
```
  f664a533
- Z
  [AMP] Fix amp.decorate bug: parameters for non leaf layers cannot be decotated (#38402) · 5d902954
  由 zhangbo9674 提交于 12月 27, 2021
```
* fix bug

* refine code

* refine code

* refine code
```
  5d902954
24 12月, 2021 3 次提交

add nansum api to math (#38137) · 6554cc10

由 wangguanqun 提交于 12月 24, 2021

* add nansum api

* delete layerhelper

* add nansum to all and tensor_method_func

* update doc

* update doc

* update doc

6554cc10

renorm op (#38130) · 6982871d

由 seemingwang 提交于 12月 24, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

6982871d

T
add gradient unittest and update code example for max/min (#38393) · ee69f437
由 Tao Luo 提交于 12月 24, 2021
```
* add gradient unittest and update code example for max/min

* update docs

* remove _get_reduce_all_value
```
ee69f437

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致