提交 · 8854786aebd5c6fbf87eaba1d022f11ef40359c5 · Crayon鑫 / Paddle

18 5月, 2021 1 次提交
- T
  unit double (#32902) · 29bbeb07
  由 Thunderbrook 提交于 5月 18, 2021
```
* unit double

* unit double
```
  29bbeb07
17 5月, 2021 2 次提交
- S
  [HybridParallel]Fix precision problem of model parallel (#32897) · c809530e
  由 ShenLiang 提交于 5月 17, 2021
```
* fix precision of mp

* fix bug of seed

* fix dp

* print group
```
  c809530e
- A
  BugFix with ParseInputDataType from LodTensorArray (#32918) · 5f1c07da
  由 Aurelius84 提交于 5月 17, 2021
```
* BugFix with ParseInputDataType from LodTensorArray

* BugFix with ParseInputDataType from LodTensorArray
```
  5f1c07da
13 5月, 2021 1 次提交
- B
  
  solved some npu bugs (#32793) · c3ae0d40
  由 Baibaifan 提交于 5月 13, 2021
  
  c3ae0d40
12 5月, 2021 2 次提交
- L
  
  [NPU] Support async copy for TensorFromVector with event (#32563) · 85512d60
  由 liym27 提交于 5月 12, 2021
  
  85512d60
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
11 5月, 2021 2 次提交
- X
  
  fix rccl bug (#32808) · 93fce181
  由 xiayanming 提交于 5月 11, 2021
  
  93fce181
- S
  Support control flow in DataParallel (#32826) · 298f210d
  由 ShenLiang 提交于 5月 11, 2021
```
* fix find_unused_parameters default value
```
  298f210d
10 5月, 2021 1 次提交
- T
  [pslib] pslib with cmake (#32800) · fbbc3394
  由 Thunderbrook 提交于 5月 10, 2021
```
* pslib with cmake

* heter util

* vlog

* heter server test

* add dtor

* cmake
```
  fbbc3394
08 5月, 2021 3 次提交
- D
  【heterps】support cuda11 for heterps; add profiler in oneps (#32640) · beab9563
  由 danleifeng 提交于 5月 08, 2021
```
* add trainprofiler for heterps in oneps; test=develop

* add set_use_ps_gpu; test=develop
```
  beab9563
- H
  
  bugfix: parallel_executor for xpu should use BindThreadedSSAGraphExecutor (#32792) · e8e4a9ca
  由 houj04 提交于 5月 08, 2021
  
  e8e4a9ca
- L
  Add raw program meta optimizer (#32597) · c1c18b08
  由 lilong12 提交于 5月 08, 2021
```
* add raw program, test=develop
```
  c1c18b08
07 5月, 2021 1 次提交
- Z
  Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on Windows (#32583) · 7610c2b4
  由 Zhou Wei 提交于 5月 07, 2021
```
* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to

* fix CI
```
  7610c2b4
06 5月, 2021 1 次提交
- G
  
  Fix bugs of pipeline on ascend. (#32737) · c5ae21f4
  由 gongweibao 提交于 5月 06, 2021
  
  c5ae21f4
30 4月, 2021 1 次提交
- X
  
  add flag to check_kernel launch (#32692) · 109fdf14
  由 XiangGao 提交于 4月 30, 2021
  
  109fdf14
29 4月, 2021 3 次提交
- C
  
  normalized custom operator impl (#32666) · 7a73692b
  由 Chen Weihang 提交于 4月 29, 2021
  
  7a73692b
- C
  
  skip fuse repeated fc when the fc with weight padding (#32648) · b7ddd7d7
  由 cc 提交于 4月 29, 2021
  
  b7ddd7d7
- P
  
  specify multihead_matmul_fuse_pass_v3 QK path (#32659) · 8ccf549b
  由 Pei Yang 提交于 4月 29, 2021
  
  8ccf549b
28 4月, 2021 3 次提交

由 denglin-github 提交于 4月 28, 2021

* Add dlnne engine runtime

* Fix log

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Fix CMakeList format error

* Add copyright message

* Fix dlnne CMakeList.txt

* Add some paddlepaddle_pass to support more networks

* Fix some format bug

* Add delete dropout_op pass

* Fix some format bug

* Fix format bug

abcb3f54

[PsCore] solve Brpc dep (#32632) · 4ead9a5a

由 Thunderbrook 提交于 4月 28, 2021

* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)"

This reverts commit 809ac036.

* brpc dep

4ead9a5a

J
[oneDNN] Added clearing oneDNN cache per executor (#32499) · ba610761
由 Jacek Czaja 提交于 4月 28, 2021
```
* - Added clearing oneDNN per executor

* - Executor is nt always having FLAGS_use_mkldnn set to true
```
ba610761

27 4月, 2021 2 次提交
- T
  Revert "[PsCore] optimize performance of large kv (#32535)" (#32599) · 809ac036
  由 tianshuo78520a 提交于 4月 27, 2021
```
This reverts commit 4b7242b0.
```
  809ac036
- X
  Check for cuda errors immediately after kernel launch (#32557) · 19eefef4
  由 XiangGao 提交于 4月 27, 2021
```
Co-authored-by: NYang Zhang <yangzhang@live.com>
```
  19eefef4
26 4月, 2021 3 次提交
- T
  [PsCore] optimize performance of large kv (#32535) · 4b7242b0
  由 Thunderbrook 提交于 4月 26, 2021
```
* optimize pull sparse

* optimize pull sparse

* change macro

* format
```
  4b7242b0
- Y
  Unset ReserveSpace of batch_norm for inference program. (#32493) · 202b0eaf
  由 Yiqun Liu 提交于 4月 26, 2021
```
* Unset ReserveSpace for inference program.

* Support training from an inference program.
```
  202b0eaf
- 石
  
  python inference supports custom operators, test=develop (#32533) · 40e51b25
  由石晓伟提交于 4月 26, 2021
  
  40e51b25
25 4月, 2021 3 次提交
- P
  [Paddle-TRT] Fix AI-Rank BERT emb_eltwise_layernorm input order (#32482) · fba46ea3
  由 Pei Yang 提交于 4月 25, 2021
```
* fix airank bert emb order

* move input num check to converter

* add input num check

* add unused var check white list
```
  fba46ea3
- M
  
  add silu op, test=develop (#32384) · 2f351ed5
  由 minghaoBD 提交于 4月 25, 2021
  
  2f351ed5
- L
  Fix the bug in mp (#31996) · 976fe6f9
  由 lilong12 提交于 4月 25, 2021
```
* update
```
  976fe6f9
23 4月, 2021 2 次提交
- A
  Polish ParallelExectuor constructor into small functions (#32191) · faa8c703
  由 Aurelius84 提交于 4月 23, 2021
```
* Refine Constructor logic of ParallelExecutor

* refine function name

* refine code comment
```
  faa8c703
- B
  solve hccl communicate conflict (#32447) · 0e74eea2
  由 Baibaifan 提交于 4月 23, 2021
```
solve hccl communicate conflict (#32447)
```
  0e74eea2
22 4月, 2021 1 次提交

support save/load binary format tensor. (#32211) · f4d9adc7

由 WeiXin 提交于 4月 22, 2021

* support save/load binary format tensor

* Fix error when create cudaplace

* Fix error when create cudaplace

* Fix error when create cudaplace

* get devive context from pool.

* move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.

* improve coverage.

* improve coverage.

* polish API

* deal with conflict

* disable save/load large file in unnittest

* split unnittest.

f4d9adc7

21 4月, 2021 4 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

C

Update the error info for quantizaion (#32273) · 3da2c7f3
由 cc 提交于 4月 21, 2021

3da2c7f3

remove thrust include files (#32395) · ab6f8745

由 wuhuanzhou 提交于 4月 21, 2021

* remove thrust includes, test=develop

* fix compilation error, test=develop

* fix compilation of truncated_gaussian_random_op, test=develop

ab6f8745

J

Added bilinear and nearest interp v2 oneDNN FP32 kernels (#32312) · 5d19f8d8
由 jakpiase 提交于 4月 21, 2021

5d19f8d8

20 4月, 2021 2 次提交
- T
  [heterps] optimize build task (#32358) · c09d6453
  由 Thunderbrook 提交于 4月 20, 2021
```
* build task cost

* return pool
```
  c09d6453
- C
  
  add log to analyse mkldnn models (#32342) · f0cc1883
  由 cc 提交于 4月 20, 2021
  
  f0cc1883
19 4月, 2021 2 次提交

A
add npu check nan and inf (#32340) · 1e3a94be
由 An Improved PeleeNet Algorithm with Feature Pyramid Networks for Image Detection 提交于 4月 19, 2021
```
add npu check nan and inf (#32340)
```
1e3a94be

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致