提交 · ccf5709d3f9958dccedf1b79e3c834fc2398b9c2 · PaddlePaddle / Paddle

09 4月, 2021 4 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

S

fix unittest timeour (#32161) · a73cb679
由 Shang Zhizhou 提交于 4月 09, 2021

a73cb679
A
[Dy2Stat] Fix undefined var used in For (#32153) · 4636d136
由 Aurelius84 提交于 4月 09, 2021
```
* fix undefind var in For

* fix code style
```
4636d136
A
[Dy2Stat] Support DictCmp and zip grammer (#32159) · 55730d95
由 Aurelius84 提交于 4月 09, 2021
```
* support DictCmp and zip grammar

* fix code style
```
55730d95

08 4月, 2021 3 次提交
- C
  Add LayerDict class (#31951) · e45c3fa5
  由 chentianyu03 提交于 4月 08, 2021
```
* add layerdict class

* add docs and test cases for LayerDict class

* remove the arguments type in function define

* add update inputs type check
```
  e45c3fa5
- J
  
  4D Hybrid Parallelism (#32134) · 54344964
  由 JZ-LIANG 提交于 4月 08, 2021
  
  54344964
- S
  
  fix bug (#32135) · 72302033
  由 ShenLiang 提交于 4月 08, 2021
  
  72302033
07 4月, 2021 4 次提交

D
add uint8 type for flatten op (#32120) · 297290a8
由 danleifeng 提交于 4月 07, 2021
```
* add uint8 type for flatten;test=develop
```
297290a8

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

J

[3D-parallelism] Hybrid Model Parallelism (#32074) · 1e60a0c4
由 JZ-LIANG 提交于 4月 07, 2021

1e60a0c4
C
update the TraceLayer.save_inference_model method with add file suffix automatically (#31989) · 10af966a
由 CtfGo 提交于 4月 07, 2021
```
As the title
```
10af966a

06 4月, 2021 3 次提交
- Z
  fix test of affine_grid with rocm (#32047) · 78af100c
  由 zhulei 提交于 4月 06, 2021
```
* fix test of affine_grid with rocm

* fix test of affine_grid with rocm
```
  78af100c
- S
  [Hybrid Parallel] Add Topology for hybrid communicate (#32011) · 2e82b6c8
  由 ShenLiang 提交于 4月 06, 2021
```
* support hyparallel, add topology

* fix utest
```
  2e82b6c8
- R
  
  [ROCM] fix the backward maxpool (#32030) · a3b08bad
  由 ronnywang 提交于 4月 06, 2021
  
  a3b08bad
02 4月, 2021 3 次提交

J

[3D-Parallel:Sharding] Optimizations for supporting ERNIE 3.0 training (#31884) · 69c874fd
由 JZ-LIANG 提交于 4月 02, 2021

69c874fd

support save/load single tensor (#31756) · 43367e4b

由 WeiXin 提交于 4月 02, 2021

* support save/load single tensor

* compatibility modification according to unnittest

* Some python2.7 don't have 'copyreg' modules

* Handle a syntax error.

* Dealing with compatibility problems on Mac.

* Dealing with compatibility problems on Mac.

* edit unittest to improve coverage.

* Modify the code according to the review comments

* Reduce redundant code.

* support for static graph loading dygraph state_dict

* edit code according to CI

* edit unittest

* edit unnittest

* delete redundant file

* edit code according to Comments

* edit english doc

* edit english doc

* edit English DOC.

* get/set_tensor->get/set_value; return_numpy=False

* get/set_tensor->get/set_value; return_numpy=False

* edit unnittest

* edit unnittest

* polish code.

43367e4b

graph engine (#31226) · 94736d60

由 seemingwang 提交于 4月 02, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>

94736d60

01 4月, 2021 7 次提交

S
Support control flow in DataParallel (#31625) · 8460698b
由 ShenLiang 提交于 4月 01, 2021
```
* support control flow

* supoort sync_parameters_buffers

* fix the bug of sparse embedding
```
8460698b

add custom init grad for backward function (#31540) · 83b953f5

由 chentianyu03 提交于 4月 01, 2021

* add custom init grad for backward function

* add custom init grad for backward function

* handle when the grad_tensor is none

* handle when the grad_tensor is none

* fix the args type error on windows platform

* modify the args order and doc

* format code

* add grad_tensor to xpu

* modify the grad_tensor type check

* add paddle.backward api to support multi tensors gradient compute

* add paddle.backward api to support multi tensors gradient compute

* add paddle.atuograd module and backward api

* change tensor.backward func args

* modify tensor backward api

* remove create_graph intputs args

* add doc and examplex code for backward api

* when have the same tensor, throw error

* modify test Init func args

* modify the execute.Init func args in test files

* add paddle.autograd package in setup.py.in

* modify error msg, remove _run_backward method in class Tensor

* add test cases for backward api

83b953f5

T
LOG CLEAN (#31819) · 0589ed21
由 tangwei12 提交于 4月 01, 2021
```
* upgrade vlog

* train from dataset fetch optimize
```
0589ed21

[Paddle-TRT] add anchor generator op plugin (#31730) · b807e408

由 zlsh80826 提交于 4月 01, 2021

* add anchor generator op plugin

* add anchor generator unit_test

* remove dbg info

* remove redundant line

* replace assertion with paddle enforce

* dynamic plugin replaces assertion with paddle enforce

* anchor generator support dynamic shape on spatial axis

* anchor generator test with fp16, dynamic shape

* add anchor generator test all

* add back main

* reduce test input size to not exceed the timelimit of ci

* change super to InferencePassTest for python2 compatibility

* reuse paddle operator anchor generator

* move creator construct to header with default

* add cuda ifdef

* reduce line

* change super to InferencePassTest for python2 compatibility

* fix anchor generator fp16 serialize setting

* split unittest from test_all

* restrict anchor generator input format before version 7234

* anchor generator only support greater than trt7.1

* change min_graph_size to 2

* min_graph size to 3 if dynamic shape

* reduce dynamic shape size to avoid trt search tactic too long to exceed time limit

* remove anchor from fetch list

* anchor generator support all trt version

* fix memory not allocated but if serialized

b807e408

Z

Support uint8_t for fill_constant_op (#31911) · 980227f9
由 Zhang Zheng 提交于 4月 01, 2021

980227f9
K
new group (#31682) · 07741593
由 kuizhiqing 提交于 4月 01, 2021
```
* new group

* ci compatible fix

* assert nccl
```
07741593

Refactor and simplify hook design & add Tensor.register_hook API (#31775) · dbeb3ea4

由 Chen Weihang 提交于 3月 31, 2021

* refactor and simplify hook design

* fix reducer add hook error

* add Tensor.register_hook basic impl

* refine prepare data impl

* revert prepare data change

* support register_hook for Tensor

* add hook test in model

* polish tests and doc example

* fix double grad test failed

* remove reduce hook func

* fix set empty error

* polish code by comments

* change reduce_hook to mutable_hook

* remove useless tmp_ins

* fix shape code format error

* fix shape code format error

dbeb3ea4

31 3月, 2021 4 次提交

Update eigen version to f612df27 (#31832) · 495e7f9c

由 wuhuanzhou 提交于 3月 31, 2021

* update eigen version to f612df27, test=develop

* fix compilation error, test=develop

* remove patch command in eigen, test=develop

* fix compilation error caused by call Eigen function with float16 and bfloat16, test=develop

* fix unittest error, test=develop

* fix unittest error caused by precision, test=develop

* remove patch files used by old version eigen, test=develop

495e7f9c

T

fix some bug in transformer training in xpu (#31918) · 52b05bac
由 taixiurong 提交于 3月 31, 2021

52b05bac
W
support minus-int idx to LayerList (#31750) · 5394194e
由 Wenyu 提交于 3月 31, 2021
```
* support minus-int idx to LayerList
* update layerlist test
```
5394194e

[ROCM] Add ROCm support for warpctc op (#31817) · ef8323d4

由 furnace 提交于 3月 31, 2021

* bugfix for warpctc

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix WARPCTC_WITH_HIP invalid

* Add logs to find out why can not dlopen libwarpctc.so

* fix warpctc commit id

* fix unit test test_warpctc_op

* Optime failed log for dlopen

* Optime failed log for dlopen

* Delete extra changes

* fix warpctc commit id

* fix warpctc commit id

* Add is_compiled_with_rocm for test_warpctc_op

* fix warpctc commit id

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* fix code style problems

ef8323d4

30 3月, 2021 8 次提交

L

[dynamic setitem] Fix bug of dynamic setitem: Decerease axes to do right broadcast (#31960) · 57d4288a
由 liym27 提交于 3月 30, 2021

57d4288a
J

Added int8 kernel for oneDNN LSTM op (#31894) · 6dca7a1d
由 jakpiase 提交于 3月 30, 2021

6dca7a1d
Z

fix bug when dtype of to_tensor is core.VarType (#31931) · 245252b8
由 Zhou Wei 提交于 3月 30, 2021

245252b8
W

add exclusive for test_conv2d_op, test=develop (#31936) · fe284868
由 wangguanzhong 提交于 3月 30, 2021

fe284868

add deprecated for softmax_with_cross_entropy (#31722) · 73a6fa3e

由 chajchaj 提交于 3月 30, 2021

* add deprecated for softmax_with_cross_entropy, test=develop

* test for deprecated in english doc, test=develop

* test deprecated for softmax_with_cross_entropy in english doc, test=develop

* fix readme and English doc for cross_entropy, test=develop

* rm test for softmax_with_cross_entropy deprecated, test=develop

* update readme for CrossEntropyLoss, test=develop

* fix readme format, test=develop

* fix readme format, test=develop

* fix readme format for cross_entropy, test=develop

* add softmax_switch and fix softlabel for cross_entropy, test=develop

* 1)recovery softmax_with_cross_entropy in fluid 2) change softmax_switch to use_softmax 3) add example for softlabel for cross_entropy, test=develop

* fix Example number for cross_entropy, test=develop

* fix code format, test=develop

* fix for CI-Coverage, test=develop

* fix for CI-Coverage, test=develop

* fix ci-coverage for Non-ASCII character '\xe2' in file, test=develop

* fix ci-coverage for Non-ASCII character '\xe2' in nn.layer.loss.py, test=develop

* update description for doc when use_softmax=Fasle, test=develop

* fix some docs and code example for cross_entropy, test=develop

* delete redundant description for soft_label parameter of cross_entropy, test=develop

* fix some comment for test_cross_entropy_loss.py, test=develop

73a6fa3e

S
fix batchnorm when inpu dims < 3 (#31933) · 8084b759
由 Shang Zhizhou 提交于 3月 30, 2021
```
* fix batchnorm when inpu dims < 3

* add unittest for batchnorm dims = 2
```
8084b759

[Paddle-TRT] yolobox (#31755) · 64ee255f

由 zlsh80826 提交于 3月 30, 2021

* yolobox converter and plugin

* yolobox unittest

* add dynamic shape restriction

* fix git merge log

64ee255f

A
Fix segment Fault from set_value (#31891) · c4b60efa
由 Aurelius84 提交于 3月 30, 2021
```
* Avoid raising warning while import paddle

* fix segment fault of set_value

* fix code style
```
c4b60efa

29 3月, 2021 4 次提交

L

Fix bug of set_value op：Decerease axes to do right broadcast (#31875) · 525c32e3
由 liym27 提交于 3月 29, 2021

525c32e3
R

[ROCM] added a cudnn switch of conv2d for rocm platform (#31836) · 123949eb
由 ronnywang 提交于 3月 29, 2021

123949eb

[Paddle-TRT] roi_align_plugin (#31732) · e3a38d79

由 zlsh80826 提交于 3月 29, 2021

* add roi_align_plugin

* add roi align unit_test

* add roi align serialization

* remove roi align static plugin because of batch dim issue

* refine roi align unittest and add fp16/serialization

* add trt roi align condition to op_teller

* refine error message

* remove unnecessary reshape layer

e3a38d79

[Paddle-TRT] trt affine channel converter (#31628) · bfb5cf55

由 zlsh80826 提交于 3月 29, 2021

* trt affine channel converter

* add trt affine channel base test

* add trt affine channel NHWC

* remove asterisk for python2 compatibility

* trt affine channel converter

* add trt affine channel base test

* add trt affine channel NHWC

* remove asterisk for python2 compatibility

* fix rebase

* move LodTensor to Tensor

* add dbg info

* affine channel converter only support NCHW

* scale,bias are parameters, use create_parameters api

* reduce test input size to not exceed the timelimit of ci

* refine affine channel unittest and add serialization/dynamic test

* change super to InferencePassTest for python2 compatibility

* change super to InferencePassTest for python2 compatibility

* fix affine channel fp16 serialize setting

bfb5cf55

PaddlePaddle / Paddle 大约 2 年 前同步成功

PaddlePaddle / Paddle
大约 2 年前同步成功