提交 · 6ef5d3436f615908d2be75d09bdca1f3bc2023d8 · PaddlePaddle / Paddle

20 9月, 2022 1 次提交
- R
  [NPU] fix run_program_op, test=develop (#46122) · db97773b
  由 ronnywang 提交于 9月 20, 2022
```
* [NPU] fix run_program_op, test=develop

* [NPU] fix matmul_v2 in cann502, test=develop
```
  db97773b
13 9月, 2022 1 次提交
- P
  add log while running New Executor, Old Executor and ParallelExecutor and change log level (#45814) · f639bc69
  由 pangyoki 提交于 9月 13, 2022
```
* optimize executor log

* delete log in new exe

* add log for old executor

* use LOG_FIRST_N(INFO, 1)
```
  f639bc69
01 8月, 2022 1 次提交

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

19 7月, 2022 1 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
07 7月, 2022 1 次提交

[IPU] support dy2static for IPU merge code (#43770) · 6984fbca

由 Allen Guo 提交于 7月 07, 2022

* feat(): dynamic_to_static support for ipu.

* fix(): format fix.

* fix format

* fix cpplint error

* use phi::errors

* fix format

* fix format

* fix(): add api to restore patched function.

* fix(): identity_loss uses cpu place as expected kernel type.

* doc(): add IPU dy2static related docs.

* fix(): combine test cases.

* fix format

* fix comment

* fix format

* apply comment

* fix compiling

* fix(): align docs.

* fix(): fix identity_loss function docs.

* fix(): adjust mean and sum in identity_loss.

* fix(): minor docs.

* move API to paddle.incubate.identity_loss

* fix UT
Co-authored-by: Nzhaorui chen <zhaoruic@graphcore.ai>

6984fbca

30 6月, 2022 1 次提交
- R
  Remove boost::variant for FetchResultType (#43932) · f720e231
  由 Ruibiao Chen 提交于 6月 30, 2022
```
* Remove boost::variant for FetchResultType

* Fix pybind errors
```
  f720e231
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
09 6月, 2022 1 次提交

Add nproc_per_node for DistributedFusedLamb (#43295) · 6678def9

由 sneaxiy 提交于 6月 09, 2022

* add nproc_per_node for DistributedFusedLamb

* fix nproc_per_node communicator bug

* fix ring_id = 1 init bug

* fix ci

* fix test_parallel_executor_mnist.py

6678def9

05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
07 4月, 2022 1 次提交
- L
  Profile Executors (#41100) · dfb47986
  由 liutiexing 提交于 4月 07, 2022
```
* Profile Executors

* update

* fix ut

* fix names

* update

* update
```
  dfb47986
21 2月, 2022 1 次提交

Update record interface using part2 (#39694) · c984cd85

由 chenjian 提交于 2月 21, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

c984cd85

15 2月, 2022 2 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
18 1月, 2022 1 次提交

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

10 1月, 2022 1 次提交

[Unify Tensors PR ] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
13 10月, 2021 1 次提交
- H
  Remove RunFromCinn in PE because We Will Call CinnRunner in Compute of SubgraphOp (#36385) · e051bba0
  由 Huihuang Zheng 提交于 10月 13, 2021
```
Remove RunFromCinn method in PE because We Will Call CinnRunner in Compute method of SubgraphOp
```
  e051bba0
11 10月, 2021 1 次提交

Add use_cinn Flag and RunFromCinn in PE (#36107) · 5690666c

由 Huihuang Zheng 提交于 10月 11, 2021

Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN.

Also add:

Replace PaddlePaddle graph with a CINN graph in a pass
PE Method to feed data and run the graph by CINN

5690666c

08 10月, 2021 1 次提交

Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1

由 Zeng Jinle 提交于 10月 08, 2021

* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage

f9591bb1

17 9月, 2021 1 次提交
- Z
  
  change to PADDLE_DEFINE_EXPORTED (#35841) · d22914fd
  由 Zeng Jinle 提交于 9月 17, 2021
  
  d22914fd
29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

15 7月, 2021 1 次提交
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
08 5月, 2021 1 次提交
- H
  
  bugfix: parallel_executor for xpu should use BindThreadedSSAGraphExecutor (#32792) · e8e4a9ca
  由 houj04 提交于 5月 08, 2021
  
  e8e4a9ca
23 4月, 2021 1 次提交
- A
  Polish ParallelExectuor constructor into small functions (#32191) · faa8c703
  由 Aurelius84 提交于 4月 23, 2021
```
* Refine Constructor logic of ParallelExecutor

* refine function name

* refine code comment
```
  faa8c703
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part5), test=develop (#31014) · c8fac5ee
  由 Qi Li 提交于 2月 26, 2021
  
  c8fac5ee
20 2月, 2021 2 次提交

C
Remove PE special profiler (#30886) · 6b3371e0
由 Chengmo 提交于 2月 20, 2021
```
* remove pe special profiler

* add profiler info
```
6b3371e0

add squeeze_op/unsqueeze_op on kunlun;fix conv op and parallel... · d5323dab

由 TTerror 提交于 2月 20, 2021

add squeeze_op/unsqueeze_op on kunlun;fix conv op and parallel executor;optimize lookup_table op (#31056)

* add squeeze_op/unsqueeze_op on kunlun; fix conv op and parallel executor on kunlun; optimize lookup_table op on kunlun

* update squeeze/unsqueeze op

d5323dab

18 1月, 2021 1 次提交
- L
  
  [Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317) · 843dc3cd
  由 liuyuhui 提交于 1月 18, 2021
  
  843dc3cd
07 1月, 2021 1 次提交
- H
  Refine PADDLE_ENFORCE Error Messages. test=develop (#30149) · 54bf3f5a
  由 Huihuang Zheng 提交于 1月 07, 2021
```
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
```
  54bf3f5a
04 1月, 2021 1 次提交
- W
  
  Optimization grad merge performance (#29784) · ee16006b
  由 WangXi 提交于 1月 04, 2021
  
  ee16006b
28 12月, 2020 1 次提交
- L
  
  [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) · 3d1741b7
  由 liuyuhui 提交于 12月 28, 2020
  
  3d1741b7
26 12月, 2020 1 次提交
- L
  
  [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) · 4427df37
  由 liuyuhui 提交于 12月 26, 2020
  
  4427df37
16 12月, 2020 1 次提交
- L
  
  [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) · f13c3a9c
  由 liuyuhui 提交于 12月 16, 2020
  
  f13c3a9c
08 12月, 2020 1 次提交
- L
  
  fix unittest on windows, test=develop (#29365) · 03b42d9f
  由 LoveAn 提交于 12月 08, 2020
  
  03b42d9f
20 11月, 2020 1 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
21 9月, 2020 1 次提交

[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112) · aba759ba

由 Leo Chen 提交于 9月 21, 2020

* support use add instead of sum to do gradient accumulation

* add inplace addto pass

* add grad_add op and inplace addto pass

* remove debug code

* code refine

* fix bug when sereral sum ops inserts at same op_idx

* fix Flags type

* add addto attribute for conv3d

* fix ut

* code clean

* fix type

aba759ba

21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功