提交 · 2ea56c9043d065420ad7bc8d2256468ba6603dd8 · BaiXuePrincess / Paddle

28 4月, 2022 1 次提交

[cherry-pick] Optimize performance of dygraph (#42196) (#42329) · 2ea56c90

由 zyfncg 提交于 4月 28, 2022

* Optimize performance of dygraph (v4)  (#42196)

* optimize performance of dygraph

* optimize performance of dygraph and elementwise_add

* optimize the trace op

* fix bug

* fix bug

* fix unittest bug

* fix code format

* fix cherry-pick problem

2ea56c90

05 3月, 2022 1 次提交
- C
  
  support add infershape for no grad op (#40182) · 94f03dc2
  由 Chen Weihang 提交于 3月 05, 2022
  
  94f03dc2
20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
09 12月, 2021 1 次提交
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
26 5月, 2021 1 次提交

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

30 4月, 2021 1 次提交
- X
  
  add flag to check_kernel launch (#32692) · 109fdf14
  由 XiangGao 提交于 4月 30, 2021
  
  109fdf14
27 4月, 2021 1 次提交
- X
  Check for cuda errors immediately after kernel launch (#32557) · 19eefef4
  由 XiangGao 提交于 4月 27, 2021
```
Co-authored-by: NYang Zhang <yangzhang@live.com>
```
  19eefef4
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part5), test=develop (#31014) · c8fac5ee
  由 Qi Li 提交于 2月 26, 2021
  
  c8fac5ee
19 11月, 2020 1 次提交

use forward declarations for framework.pb.h (#28494) · 5aec7dbe

由 wanghuancoder 提交于 11月 19, 2020

* use forward declarations for framework.pb.h, test=develop

* use forward declarations for framework.pb.h, test=develop

5aec7dbe

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

15 9月, 2020 1 次提交

Polish framework error message part 6 (#27257) · dafb0e3b

由 Chen Weihang 提交于 9月 15, 2020

* polish framework error msg part 6

* polish lossed item

* fix failed unittest

* polish by reviewer comments

dafb0e3b

21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

29 11月, 2019 1 次提交

Add dygraph execution context (#20157) · ac854670

由 hong 提交于 11月 29, 2019

* add_dygraph_execution_context

* add dygraph infershape context and execution context; test=develop

* fix imperative bug; test=develop

* remove inputs outputs interface from execution context,
because it have same function with inputNames;
test=develop

* remove tracer_test ctest; test=develop

* fix split op bug; test=develop

* fix unitests bug; test=develop

* fix distribute test bug; test=develop

* fix ngraph compile bug; test=develop

* fix grad maker bug; test=develop

* fix load op bugs; test=develop

* fix operator.cc construct bug; test=develop

* remove useless name find in operator; test=develop

* add tracer_test; test=develop

* fix concat, split bug; test=develop

* remove tracer_test unitest; test=develop

* fix attribute check bug; test=develop

* add test code to fix converage; test=develop

* remove useless code, change check backward input in engin; test=develop

* unlock var type infer shape;test=develop

* add ShareAllLoD api; test=develop

* add dygraph infershape context unitest; test=develop

* remove increase and decrease lod in dygraph; test=develop

* addd override; test=develop

* fix increase descrease lod; test=develop

* fix paddle_enforce; test=develop

* disable lod op dygraph check; test=develop

* fix paddle enforce error; test=develop

* add comment for op_registry and OperatorBase; test=develop

* optimize the comment of op_registry; test=develop

* fix format of comment; test=develop

* fix format of comment; test=develop

* optimize the format of comment; test=develop

* optimize the format of the comment; test=develop

* optimize comment of op_registry; test=develop

ac854670

31 10月, 2019 1 次提交

GradMaker for dygraph (#19706) · 8c4573a3

由 hong 提交于 10月 31, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* optimize grad maker; test=develop

* optimize grad maker

* test

* grad make optim; test=develop

* fix unittest bugs; test=develop

* add dygraph grad op maker and split_op

* grad op maker refactor; test=develop

* add dygraph grad maker; test=develop

* fix op deformable_conv_v1_op bug; test=develop

* fix deformable_conv prroi pool bugs;

* fix new op grad op maker bug; test=develop

* fix split by ref bug; test=develop

* fix dygraph auto prune bug; test=develop

* fix test_trace bug; test=develop

* fix fused emb seq pool bug; test=develop

* remove useless code in op_desc file; test=develop

* remove useless code, StrVarBaseNode; test=develop

* fix review issues; test=develop

* fix rank_loss grad maker; test=develop

* remove flag in VarBase; test=develop

* fix distributed_notify_op compile bug ; test=develop

* fix reshape op double grad; test=develop

* fix expand as op; test=develop

* add impertive type_defs.h for demo_train; test=develop

* fix inference lib cmake; test=develop

* fix inference lib; test=develop

* fix infernce_lib; test=develop

* fix inference cmake; test=develop

* fix inference lib; test=develop

* fix inference lib; test=develop

* remove condition dygraph grad maker, modify local name; test=develop

* fix split grad maker bug; test=develop

* fix pyramid_op bug; test=develop

* change travis time out limit; test=develop

* restore travis; test=develop

* change timeout limit; test=develop

8c4573a3

19 8月, 2019 1 次提交
- C
  Fix REGISTER_OP_WITHOUT_GRADIENT (#19251) · 8a89ca94
  由 chengduo 提交于 8月 19, 2019
```
* fix REGISTER_OP_WITHOUT_GRADIENT
test=develop
```
  8a89ca94
25 2月, 2019 1 次提交
- L
  Enable function coverage for U8/S8 ConvMKLDNNOpKernel · 4acc5220
  由 liangan1 提交于 2月 25, 2019
```
test=develop
```
  4acc5220
26 12月, 2018 1 次提交
- P
  fix test issues on windows · 01c00b07
  由 peizhilin 提交于 12月 26, 2018
```
test=develop
```
  01c00b07
18 12月, 2018 2 次提交
- P
  include the mkl fix only · b601f2de
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  b601f2de
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2
11 12月, 2018 1 次提交
- X
  fix clang · 1735022a
  由 Xin Pan 提交于 12月 11, 2018
```
test=develop
```
  1735022a
05 12月, 2018 1 次提交
- X
  allow customize kernel selection · 41c28d54
  由 Xin Pan 提交于 12月 05, 2018
```
test=develop
```
  41c28d54
22 11月, 2018 1 次提交

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

21 11月, 2018 1 次提交
- P
  
  clean up the pre-definitions on windows · 6e66fadb
  由 peizhilin 提交于 11月 21, 2018
  
  6e66fadb
24 9月, 2018 1 次提交
- D
  
  "fix link error" (#13545) · 97636a9f
  由 dzhwinter 提交于 9月 24, 2018
  
  97636a9f
12 9月, 2018 1 次提交
- D
  
  add demo · c3e1fb5a
  由 dzhwinter 提交于 9月 12, 2018
  
  c3e1fb5a
02 9月, 2018 1 次提交
- D
  
  switch to 9.2 · 75681c0a
  由 dzhwinter 提交于 9月 02, 2018
  
  75681c0a
25 8月, 2018 1 次提交
- D
  
  more platform is done · d7f98f37
  由 dzhwinter 提交于 8月 25, 2018
  
  d7f98f37
03 7月, 2018 1 次提交
- Y
  Remove Op::Clone method · 4e4438a8
  由 yuyang18 提交于 7月 03, 2018
```
It is used by NetOp before.
```
  4e4438a8
02 7月, 2018 3 次提交
- Y
  Add register kernel functor and shrink reshape op · 82866d4a
  由 yuyang18 提交于 7月 02, 2018
```
* Shrink reshape_op library size
* User can register a standard C++ functor as a op kernel
```
  82866d4a
- Y
  
  Make Kernel registed as a function · 3b00ed81
  由 yuyang18 提交于 7月 02, 2018
  
  3b00ed81
- Y
  
  Polish reshape op · 1ce478f1
  由 yuyang18 提交于 7月 02, 2018
  
  1ce478f1
07 6月, 2018 2 次提交

split reduce op into multiple libraries, accelerate the compiling (#11029) · d48172f2

由 dzhwinter 提交于 6月 07, 2018

* "split into multiple .ccl"

* "refine file structure"

* "refine files"

* "remove the cmakelist"

* "fix typo"

* "fix typo"

* fix ci

d48172f2

Mkldnn layout (#11040) · 3ff9ba0e

由 mozga-intel 提交于 6月 07, 2018

* Add MKLDNN layout support in Paddle

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.

* Add MKLDNN layout support in activation OP

* Don't populate layout from input to output when kMKLDNN in

* Refine pool mkldnn op kernel

* MKLDNN layout

* Remove the inferitance from tensor file

* MKLDNN layout: refactoring

* Remove additional #define to register new operator

* Prepare mkldnn tests to work with layout

3ff9ba0e

18 4月, 2018 1 次提交
- Y
  
  remove REGISTER_OP and REGISTER_OP_EX · 68d96385
  由 Yang Yang 提交于 4月 17, 2018
  
  68d96385
17 4月, 2018 1 次提交
- Y
  
  first commit · dafe06af
  由 Yang Yang 提交于 4月 13, 2018
  
  dafe06af
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致