提交 · f4d9adc7d08030f806fafb398b9fef8cc21c2d8f · 机器未来 / Paddle

22 4月, 2021 2 次提交

support save/load binary format tensor. (#32211) · f4d9adc7

由 WeiXin 提交于 4月 22, 2021

* support save/load binary format tensor

* Fix error when create cudaplace

* Fix error when create cudaplace

* Fix error when create cudaplace

* get devive context from pool.

* move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.

* improve coverage.

* improve coverage.

* polish API

* deal with conflict

* disable save/load large file in unnittest

* split unnittest.

f4d9adc7

T

Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
由 tianshuo78520a 提交于 4月 22, 2021

e58c705b

21 4月, 2021 10 次提交

A

Add Bfloat16 support on Ampere GPU with CUDA 11 (#32132) · bf0ec9b8
由 AshburnLee 提交于 4月 21, 2021

bf0ec9b8

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

C

Update the error info for quantizaion (#32273) · 3da2c7f3
由 cc 提交于 4月 21, 2021

3da2c7f3

optimize get-feat function of graph engine (#32261) · 2b68d20b

由 seemingwang 提交于 4月 21, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2b68d20b

L
[NPU] register npu finalize on exit (#32390) · 8e4c1936
由 Leo Chen 提交于 4月 21, 2021
```
* [NPU] register finalize on exit

* fix
```
8e4c1936

remove thrust include files (#32395) · ab6f8745

由 wuhuanzhou 提交于 4月 21, 2021

* remove thrust includes, test=develop

* fix compilation error, test=develop

* fix compilation of truncated_gaussian_random_op, test=develop

ab6f8745

L

[Kunlun]add collective ops for multi XPU cards training and add Kunlun multi XPU cards CI (#32302) · 2194ad15
由 liuyuhui 提交于 4月 21, 2021

2194ad15

石

flush denormal in the tracer op, test=develop (#32350) · 9ff85561

由石晓伟提交于 4月 21, 2021

* flush denormal in the tracer op, test=develop

* add cmake dependencies, test=develop

* add a macro, test=develop

* fix the windows case, test=develop

9ff85561

J

Added bilinear and nearest interp v2 oneDNN FP32 kernels (#32312) · 5d19f8d8
由 jakpiase 提交于 4月 21, 2021

5d19f8d8
J

Added oneDNN reduce_op GRAD kernel (#32280) · ead83422
由 jakpiase 提交于 4月 21, 2021

ead83422

20 4月, 2021 4 次提交
- T
  [Optimize]SparseKV speedup and memory save (#32048) · 5e7e7c9f
  由 tangwei12 提交于 4月 20, 2021
```
Change-Id: Ie35a09772e46f7d90cb68ca82c1d18b9201d1abe

* large scale kv store optimize

Change-Id: I582cc661afdaa20749ec7493eae1b88c32b967f7

* replace std::unorded_map with roundrobin map

Change-Id: I48ee0efef38853876c92d982cdfcac6603c52c88

* remove license

* fix cpp lint

Change-Id: Ia21fafa65adc09bb9094f7dbc987e31d5af2686e
```
  5e7e7c9f
- W
  
  move REGISTER_OP_CUDA_KERNEL into cpp with eigen, test=develop (#32114) · f6f59e50
  由 wuhuanzhou 提交于 4月 20, 2021
  
  f6f59e50
- T
  [heterps] optimize build task (#32358) · c09d6453
  由 Thunderbrook 提交于 4月 20, 2021
```
* build task cost

* return pool
```
  c09d6453
- C
  
  add log to analyse mkldnn models (#32342) · f0cc1883
  由 cc 提交于 4月 20, 2021
  
  f0cc1883
19 4月, 2021 3 次提交

A
add npu check nan and inf (#32340) · 1e3a94be
由 An Improved PeleeNet Algorithm with Feature Pyramid Networks for Image Detection 提交于 4月 19, 2021
```
add npu check nan and inf (#32340)
```
1e3a94be

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

J

Add BF16 Constant Initializer and support for other initializer (#31935) · 76cb83e8
由 joanna.wozna.intel 提交于 4月 19, 2021

76cb83e8

18 4月, 2021 1 次提交
- Z
  
  Unify the implementation of elementwise operation of same dimensions (#32148) · 2c182583
  由 Zhang Zheng 提交于 4月 18, 2021
  
  2c182583
17 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Add model parallel support in dygraph (#32248) · 66d46221
  由 ShenLiang 提交于 4月 17, 2021
```
* add model parallel support in dygraph
```
  66d46221
16 4月, 2021 2 次提交
- 1
  
  test=develop, fix index_wrapper's cmake depends(#32314) · 03c9ecd9
  由 123malin 提交于 4月 16, 2021
  
  03c9ecd9
- C
  support ernie trt-int8 for inference (#32232) · 6da043eb
  由 ceci3 提交于 4月 16, 2021
```
* support ernie trt-int8 for inference

* fix reshape
```
  6da043eb
15 4月, 2021 9 次提交

1
tree-based-model (#31696) · a8c3a902
由 123malin 提交于 4月 15, 2021
```
* add index_dataset and index_sampler for tree-based model
```
a8c3a902
A

Correct typos (#32288) · 825d4957
由 AshburnLee 提交于 4月 15, 2021

825d4957

[ROCM] bugfix for unit tests (#32258) · 90133d24

由 furnace 提交于 4月 15, 2021

* [ROCM] bugfix for test_conv_transpose_nn_grad

* [ROCM] bugfix for test_batch_norm_op_v2

* [ROCM] bugfix for test_empty_like_op

* [ROCM] bugfix for test_conv_transpose_nn_grad

90133d24

heterps support pscore (#32093) · 9f8c8f96

由 Thunderbrook 提交于 4月 15, 2021

* pscore support heterps

* fleet cmake

* fleet wrapper

* macro

* solve conflict

* solve conflict

* add unitest

* paddle enforce

* unitest

* unitest

* unitest

9f8c8f96

X

support int for nearest_interp, test=develop (#32270) · 668a0d3b
由 xiaoting 提交于 4月 15, 2021

668a0d3b
J
【Deepmd Support】add IsInitialized and tanh double grad (#32188) · cfdde0ec
由 Jiabin Yang 提交于 4月 15, 2021
```
* add IsInitialized

* rm additional log and add tanh double grad

* rename is_initialized
```
cfdde0ec

Fix some error message (#32169) · f946ba61

由 Kqnonrime 提交于 4月 15, 2021

* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

f946ba61

Customizable Python Layer in Dygraph (#32130) · 29f65225

由 WeiXin 提交于 4月 14, 2021

* custom python backward

* polish up the code

* polish up the code

* polish up the code.

* Fix code format and comments.

* Delete redundant files.

* add unnittest.

* edit unnittest.

* edit unnittest.

* Remove redundant header files.

* Improve coverage and remove redundant code.

* support saving for backward.

* polish code according to comments.

* Add support type for PyLayer.

* Modify the DOC.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish Doc.

* polish code and make the code robust.

* Modify the code format.

29f65225

【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197) · e6bc358d

由 zhang wenhui 提交于 4月 15, 2021

* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)
Co-authored-by: Nroot <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly
Co-authored-by: Noyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation
Co-authored-by: Noyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac699.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
Co-authored-by: Nroot <xiayanming@baidu.com>
Co-authored-by: Noyjxer <1728722986@qq.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: NOleNet <olenet@126.com>
Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
Co-authored-by: Noyxuan-11 <963650125@qq.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

e6bc358d

14 4月, 2021 8 次提交
- Y
  
  Optimize the bec_loss op to avoid copy input back to CPU. (#32265) · 69d80274
  由 Yiqun Liu 提交于 4月 14, 2021
  
  69d80274
- A
  
  Optimize of backward of log_softmax when axis is -1 and dim_size <= 1024 (#32180) · 5dc0a6eb
  由 AshburnLee 提交于 4月 14, 2021
  
  5dc0a6eb
- J
  
  Added oneDNN reduce_op FWD kernel (#31816) · 3a804a0e
  由 jakpiase 提交于 4月 14, 2021
  
  3a804a0e
- A
  adds new CPU kernel for SGD op supporting BF16 data type (#32162) · 3ac6c189
  由 Adam Osewski 提交于 4月 14, 2021
```
* Initial draft for SGD BG16 kernel.

* Unit tests for SGD with BF16 data type.

* Add VLOG message to SGD BF16 op CPU kernel.

* Enhance error messages and error types.

* Refactor SGD op kernels to leverage some common code.

* Make easier to add new kerne invoke code.

* Fix SGD op kernel for sparse grad.

* Unify quotes style.

* Fix error for ROCM compilation.

* Use specialized PADDLE_ENFORCE_xx functions.
```
  3ac6c189
- C
  
  add marco cond for multi function (#32239) · 7b9fcaca
  由 Chen Weihang 提交于 4月 14, 2021
  
  7b9fcaca
- X
  
  softmax reconstruction and optimization (#31821) · 63abd500
  由 xingfeng01 提交于 4月 14, 2021
  
  63abd500
- P
  [Paddle-TRT] Add check for TRT runtime dynamic shape (#32155) · 8552a182
  由 Pei Yang 提交于 4月 14, 2021
```
* add check for runtime dynamic shape

* add unittest

* add lower bound case

* adjust timeout of new ut to 120s
```
  8552a182
- C
  Add inner register backward hook method for Tensor (#32171) · 7ba85aca
  由 Chen Weihang 提交于 4月 14, 2021
```
* add register backward hook method

* add leaf grad accumullated test
```
  7ba85aca

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致