提交 · de9474309d0559631308727df201825627a8eb3c · 机器未来 / Paddle

23 4月, 2021 15 次提交
- Y
  Ut test conv3d op timeout (#32216) · de947430
  由 YUNSHEN XIE 提交于 4月 23, 2021
```
* remove ut from parallel_ut_rule caused by timeout

* remove timeout ut from parallel_ut_rule file

* move convert_model2dot_ernie to TWO_PARALLEL_JOB list
```
  de947430
- A
  Polish ParallelExectuor constructor into small functions (#32191) · faa8c703
  由 Aurelius84 提交于 4月 23, 2021
```
* Refine Constructor logic of ParallelExecutor

* refine function name

* refine code comment
```
  faa8c703
- L
  [NPU] refactor check_finite_and_scale npu kernel (#32407) · 39a59dcf
  由 Leo Chen 提交于 4月 23, 2021
```
* refactor_check_finite_and_scale_npu_kernel

* fix compile

* add alloc_float_status op

* add alloc_float_status op

* add FloatStatus for check_finite_and_unscale

* refine code

* remove unneccessary logic

* refine for fleet
```
  39a59dcf
- C
  
  ernie int8 support trt6 (#32424) · a01b5109
  由 ceci3 提交于 4月 23, 2021
  
  a01b5109
- W
  move semantic checks to op_teller (#32279) · 7c38114f
  由 wenbin 提交于 4月 23, 2021
```
* move semantic checks to op_teller

* more ops

* more ops

* revert block related change

* part1

* revert activation

* remove if

* remove const_cast

* reslove conflict

* remove const_cast

* delete useless var

* replace vlog(1) with vlog(3), replace assert with PADDLE_ENFORCE

* down to 19 files
```
  7c38114f
- Z
  
  update 2.0 public api in optimizer (#31944) · 1b83de2e
  由 zhiboniu 提交于 4月 23, 2021
  
  1b83de2e
- Z
  fix Windows CI MP compile and environment install script and openblas CI (#32378) · 7a681f0b
  由 Zhou Wei 提交于 4月 23, 2021
```
* fix Windows CI MP compile and environment install script

* clear Windows CI environment

* clear Windows CI environment

* clear Windows CI environment
```
  7a681f0b
- B
  solve hccl communicate conflict (#32447) · 0e74eea2
  由 Baibaifan 提交于 4月 23, 2021
```
solve hccl communicate conflict (#32447)
```
  0e74eea2
- L
  add c_concat and c_split ops (#32486) · 2b108a04
  由 lilong12 提交于 4月 23, 2021
```
* add c_concat op
```
  2b108a04
- S
  
  add lstm support on xpu test=kunlun (#32436) · b6f8ccd2
  由 shanliang1992 提交于 4月 23, 2021
  
  b6f8ccd2
- W
  
  add WITH_STRIP=ON in paddle_build.sh, test=develop (#32450) · 51bcd97d
  由 wuhuanzhou 提交于 4月 23, 2021
  
  51bcd97d
- S
  
  disable utest (#32474) · 1dc83932
  由 ShenLiang 提交于 4月 23, 2021
  
  1dc83932
- R
  
  [ROCM] add cuda kenrel for batch_norm_op (#32393) · 7879477f
  由 ronnywang 提交于 4月 23, 2021
  
  7879477f
- L
  
  [NPU] Fix bug that epsilon become 0 using power (#32469) · 49773f36
  由 Leo Chen 提交于 4月 23, 2021
  
  49773f36
- K
  Fix seven error message (#32397) · 203ac4f3
  由 Kqnonrime 提交于 4月 23, 2021
```
* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

* fix seven error message

* fix error

* fix error

* fix error

* fix error
```
  203ac4f3
22 4月, 2021 15 次提交

Y

Add `paddle.set_grad_enabled` (#31794) · f8ca5a9d
由 Yang Zhang 提交于 4月 22, 2021

f8ca5a9d
W
support int32 and int64 kernel for clip operator (#32373) · c3328288
由 wuyefeilin 提交于 4月 22, 2021
```
support int32 and int64 kernel for clip operator 
```
c3328288
L

[NPU] remove ascend_parser for WITH_ASCEND_CL (#32451) · a1a527fb
由 Leo Chen 提交于 4月 22, 2021

a1a527fb
H

fix doc for adamw (#32438) · c4815707
由 hutuxian 提交于 4月 22, 2021

c4815707
Y

Add fleet get_loss_scaling doc and update alert message (#32419) · d03b0b16
由 Yuang Liu 提交于 4月 22, 2021

d03b0b16

import sequence_* API to new namespace (#32089) · f12c943a

由 Feiyu Chan 提交于 4月 22, 2021

* import sequence_* API to new namespace

* fix typos, remove alias marking

* update sample code

* fix sample code

* fix docstring for sequence_mask

f12c943a

Z

Modify some contents for elementwise op impl (#32414) · 890d6bc0
由 Zhang Zheng 提交于 4月 22, 2021

890d6bc0
W
modify conv2d_transpose docs (#32410) · 1064f2b8
由 wangxinxin08 提交于 4月 22, 2021
```
* modify conv2d_transpose docs
```
1064f2b8
Z

fix type(x)=paddle.VarBase to paddle.Tensor (#32364) · bec4b167
由 zhiboniu 提交于 4月 22, 2021

bec4b167
S
[HybridParallel] Add ClipGradByGlobalNorm & check_finite_and_unscale in Dygraph (#32354) · 7ea999fd
由 ShenLiang 提交于 4月 22, 2021
```
* add clip/check

* add amp & clip grad in dygraph

* add logging
```
7ea999fd
F
add glu in nn.functional (#32096) · b2ee8380
由 Feiyu Chan 提交于 4月 22, 2021
```
add glu in nn.functional
```
b2ee8380
W

strip after compilation (#32145) · e727820d
由 wuhuanzhou 提交于 4月 22, 2021

e727820d

fix count problem (#32415) · 73d0b0e9

由 seemingwang 提交于 4月 22, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

73d0b0e9

support save/load binary format tensor. (#32211) · f4d9adc7

由 WeiXin 提交于 4月 22, 2021

* support save/load binary format tensor

* Fix error when create cudaplace

* Fix error when create cudaplace

* Fix error when create cudaplace

* get devive context from pool.

* move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.

* improve coverage.

* improve coverage.

* polish API

* deal with conflict

* disable save/load large file in unnittest

* split unnittest.

f4d9adc7

T

Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
由 tianshuo78520a 提交于 4月 22, 2021

e58c705b

21 4月, 2021 10 次提交

A

Add Bfloat16 support on Ampere GPU with CUDA 11 (#32132) · bf0ec9b8
由 AshburnLee 提交于 4月 21, 2021

bf0ec9b8
C
[HotFix] Add support for optimizer with varbase input (#32362) · b47dd158
由 Chen Weihang 提交于 4月 21, 2021
```
* add support for optimizer with varbase input

* refine cond

* fix failed unittest

* add test for coverage
```
b47dd158

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

Y

Do not define and save reserve_space for inference. (#32375) · bc90916e
由 Yiqun Liu 提交于 4月 21, 2021

bc90916e
H

fix bug in amp O2 (#32343) · 4be3b057
由 huangxu96 提交于 4月 21, 2021

4be3b057
A

[CustomOp]Fix MAC3-CI random failed with XXX_setup.py(#32369) · 7bae5e9a
由 Aurelius84 提交于 4月 21, 2021

7bae5e9a
A

[CustomOP]Support find include/c++/v1 include dirs automatically (#32404) · 661a1f6f
由 Aurelius84 提交于 4月 21, 2021

661a1f6f
C

Update the error info for quantizaion (#32273) · 3da2c7f3
由 cc 提交于 4月 21, 2021

3da2c7f3
Y

add get_loss_scaling to fleet (#32401) · 37bb3342
由 Yuang Liu 提交于 4月 21, 2021

37bb3342

optimize get-feat function of graph engine (#32261) · 2b68d20b

由 seemingwang 提交于 4月 21, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2b68d20b

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致