提交 · bc90916e65e4ccc473f9271fd0fe6cb256c2ed12 · BaiXuePrincess / Paddle

21 4月, 2021 17 次提交

Y

Do not define and save reserve_space for inference. (#32375) · bc90916e
由 Yiqun Liu 提交于 4月 21, 2021

bc90916e
H

fix bug in amp O2 (#32343) · 4be3b057
由 huangxu96 提交于 4月 21, 2021

4be3b057
A

[CustomOp]Fix MAC3-CI random failed with XXX_setup.py(#32369) · 7bae5e9a
由 Aurelius84 提交于 4月 21, 2021

7bae5e9a
A

[CustomOP]Support find include/c++/v1 include dirs automatically (#32404) · 661a1f6f
由 Aurelius84 提交于 4月 21, 2021

661a1f6f
C

Update the error info for quantizaion (#32273) · 3da2c7f3
由 cc 提交于 4月 21, 2021

3da2c7f3
Y

add get_loss_scaling to fleet (#32401) · 37bb3342
由 Yuang Liu 提交于 4月 21, 2021

37bb3342

optimize get-feat function of graph engine (#32261) · 2b68d20b

由 seemingwang 提交于 4月 21, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2b68d20b

L
[NPU] register npu finalize on exit (#32390) · 8e4c1936
由 Leo Chen 提交于 4月 21, 2021
```
* [NPU] register finalize on exit

* fix
```
8e4c1936

remove thrust include files (#32395) · ab6f8745

由 wuhuanzhou 提交于 4月 21, 2021

* remove thrust includes, test=develop

* fix compilation error, test=develop

* fix compilation of truncated_gaussian_random_op, test=develop

ab6f8745

L

[Kunlun]add collective ops for multi XPU cards training and add Kunlun multi XPU cards CI (#32302) · 2194ad15
由 liuyuhui 提交于 4月 21, 2021

2194ad15

石

flush denormal in the tracer op, test=develop (#32350) · 9ff85561

由石晓伟提交于 4月 21, 2021

* flush denormal in the tracer op, test=develop

* add cmake dependencies, test=develop

* add a macro, test=develop

* fix the windows case, test=develop

9ff85561

J

Added bilinear and nearest interp v2 oneDNN FP32 kernels (#32312) · 5d19f8d8
由 jakpiase 提交于 4月 21, 2021

5d19f8d8
G

add test=develop (#32380) · 4898c38d
由 gongweibao 提交于 4月 21, 2021

4898c38d
I

Modify the exit code of mac CI approval error (#32389) · a2cbbe83
由 iducn 提交于 4月 21, 2021

a2cbbe83

add retry on gcda_clean.py (#32318) · 229f9308

由 YUNSHEN XIE 提交于 4月 21, 2021

* add retry on gcda_clean.py

* add exit code for paddle_coverage.sh

* fix format error

* fix format error

229f9308

J

Added oneDNN reduce_op GRAD kernel (#32280) · ead83422
由 jakpiase 提交于 4月 21, 2021

ead83422
X
remove fluid for auto_checkpoint. (#32157) · 1593ee25
由 xiemoyuan 提交于 4月 21, 2021
```
* remove fluid for auto_checkpoint.

* fix bug.
```
1593ee25

20 4月, 2021 9 次提交
- T
  [Optimize]SparseKV speedup and memory save (#32048) · 5e7e7c9f
  由 tangwei12 提交于 4月 20, 2021
```
Change-Id: Ie35a09772e46f7d90cb68ca82c1d18b9201d1abe

* large scale kv store optimize

Change-Id: I582cc661afdaa20749ec7493eae1b88c32b967f7

* replace std::unorded_map with roundrobin map

Change-Id: I48ee0efef38853876c92d982cdfcac6603c52c88

* remove license

* fix cpp lint

Change-Id: Ia21fafa65adc09bb9094f7dbc987e31d5af2686e
```
  5e7e7c9f
- F
  add paddle.nn.unfold #32297 (#32298) · 186682fe
  由 FNRE 提交于 4月 20, 2021
```
* add paddle.nn.unfold
* update Parameters of Unfold
```
  186682fe
- J
  [Sharding]: update config DOC (#32299) · e3489013
  由 JZ-LIANG 提交于 4月 20, 2021
```
* sharding: update config DOC

* update pipeline config

* sharding update doc
```
  e3489013
- W
  
  save/load program (#32336) · e0a52fd7
  由 WeiXin 提交于 4月 20, 2021
  
  e0a52fd7
- W
  
  move REGISTER_OP_CUDA_KERNEL into cpp with eigen, test=develop (#32114) · f6f59e50
  由 wuhuanzhou 提交于 4月 20, 2021
  
  f6f59e50
- T
  [heterps] optimize build task (#32358) · c09d6453
  由 Thunderbrook 提交于 4月 20, 2021
```
* build task cost

* return pool
```
  c09d6453
- Y
  fix the bug that the error message is not displayed on mac ci (#32367) · 0dd28b8c
  由 YUNSHEN XIE 提交于 4月 20, 2021
```
* test for mac task,notest,test=mac_py3

* fix the bug that the error message is not displayed
```
  0dd28b8c
- W
  
  support `numpy.array/asarray(tensor) -> ndarray`, test=develop (#32300) · 43926c80
  由 Wenyu 提交于 4月 20, 2021
  
  43926c80
- C
  
  add log to analyse mkldnn models (#32342) · f0cc1883
  由 cc 提交于 4月 20, 2021
  
  f0cc1883
19 4月, 2021 6 次提交

A
add npu check nan and inf (#32340) · 1e3a94be
由 An Improved PeleeNet Algorithm with Feature Pyramid Networks for Image Detection 提交于 4月 19, 2021
```
add npu check nan and inf (#32340)
```
1e3a94be

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

S
[Hybrid Parallel] Support dp & mp in dygraph (#32323) · ffd40860
由 ShenLiang 提交于 4月 19, 2021
```
* support dp & mp
```
ffd40860

Fix sublayer (#31824) · 4d69eeaa

由 Jiabin Yang 提交于 4月 19, 2021

* fix sublayer error with include_sublayers=False

* add ut

* refactor include_sublayers related api

* fix ut

* fix ut of transformer

* fix ut of transformer

* remove useless code

* change sublayer api

* polish code

* add test for include_self=True

4d69eeaa

J

Add BF16 Constant Initializer and support for other initializer (#31935) · 76cb83e8
由 joanna.wozna.intel 提交于 4月 19, 2021

76cb83e8

update `get_api_md5`, using the real api name as the map's key (#32224) · 21dc044a

由 Ren Wei (任卫) 提交于 4月 19, 2021

* get_api_md5 should prefer use the real name rather than the alias names

* case for ArgSpec style. update the unittests

test=document_fix

21dc044a

18 4月, 2021 1 次提交
- Z
  
  Unify the implementation of elementwise operation of same dimensions (#32148) · 2c182583
  由 Zhang Zheng 提交于 4月 18, 2021
  
  2c182583
17 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Add model parallel support in dygraph (#32248) · 66d46221
  由 ShenLiang 提交于 4月 17, 2021
```
* add model parallel support in dygraph
```
  66d46221
16 4月, 2021 2 次提交
- 1
  
  test=develop, fix index_wrapper's cmake depends(#32314) · 03c9ecd9
  由 123malin 提交于 4月 16, 2021
  
  03c9ecd9
- C
  support ernie trt-int8 for inference (#32232) · 6da043eb
  由 ceci3 提交于 4月 16, 2021
```
* support ernie trt-int8 for inference

* fix reshape
```
  6da043eb
15 4月, 2021 4 次提交

Update hapi to support AMP (#31417) · fabdb43c

由 Jiaqi Liu 提交于 4月 15, 2021

* make hapi support amp, and add unittest

* make unittest only support GPU

* update parameters for amp in hapi.Model

* update hapi.Model.prepare interface, and update unittest

* fix test_model.py unittest bug

* add grad clear in dygraph

* use_fp16_guard defaults to True, which could avoid nan

* add input check, and add internal doc link to low level api

* update doc, and decrease the sample num of dataset to avoid timeout

* make hapi amp param  support str 'O1' or 'O2'

* resume calling , modify the code of the check part

* upgrade the usage of Fleet API, and disable 'pure_fp16' param

fabdb43c

1
tree-based-model (#31696) · a8c3a902
由 123malin 提交于 4月 15, 2021
```
* add index_dataset and index_sampler for tree-based model
```
a8c3a902
A

Correct typos (#32288) · 825d4957
由 AshburnLee 提交于 4月 15, 2021

825d4957

[ROCM] bugfix for unit tests (#32258) · 90133d24

由 furnace 提交于 4月 15, 2021

* [ROCM] bugfix for test_conv_transpose_nn_grad

* [ROCM] bugfix for test_batch_norm_op_v2

* [ROCM] bugfix for test_empty_like_op

* [ROCM] bugfix for test_conv_transpose_nn_grad

90133d24

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致