提交 · 049dd853c998405492331a198101d9297236341f · Crayon鑫 / Paddle

21 6月, 2021 1 次提交

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

01 6月, 2021 1 次提交

replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0

由 chentianyu03 提交于 6月 01, 2021

* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug

06c63ca0

12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part6), test=develop (#31015) · 28b356b9
  由 Qi Li 提交于 2月 26, 2021
  
  28b356b9
22 12月, 2020 1 次提交
- J
  [oneDNN] Tensor copy fix to oneDNN tensors (#29771) · 7b33720c
  由 Jacek Czaja 提交于 12月 22, 2020
```
* - Tensor copy fix to oneDNN tensors

* - Fixes after review
```
  7b33720c
04 12月, 2020 1 次提交

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

15 10月, 2020 1 次提交
- Z
  
  fix bug of tensor copy of CUDAPinnedPlace (#27966) · 2ac6c6c3
  由 Zhou Wei 提交于 10月 15, 2020
  
  2ac6c6c3
13 10月, 2020 1 次提交

Refine the format of printing tensor (#27673) · 049696bf

由 Leo Chen 提交于 10月 13, 2020

* add sumary feature

* refine printting tensor

* add sci_mode

* add sample code

* fix indent error

* fix _format_item

* polish code

* support item indent

* add ut

* set place for ut

* fix py2 issue

* fix ut

049696bf

09 10月, 2020 1 次提交
- J
  - Fix to 27398 (#27770) · 631c1f30
  由 Jacek Czaja 提交于 10月 09, 2020
```
test=develop

- compilation fix

test=develop
```
  631c1f30
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

16 9月, 2020 1 次提交
- C
  Polish framework error message part 7 (#27266) · 4f9d6529
  由 Chen Weihang 提交于 9月 16, 2020
```
* polish framework error message part 7

* fix typo

* polish by reviewes comment
```
  4f9d6529
27 8月, 2020 1 次提交
- Z
  fix bug that can't print int8_t (#26712) · 8071d230
  由 Zhou Wei 提交于 8月 27, 2020
```
fix bug that can't print int8_t 
```
  8071d230
24 8月, 2020 1 次提交
- J
  Add isfinite v2 op (#26344) · 199b0c7c
  由 Jack Zhou 提交于 8月 24, 2020
```
add the isnan, isfinite, isinf api for the paddle 2.0
```
  199b0c7c
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

15 8月, 2020 1 次提交

expose and unify the Tensor concepts to the user (#25978) · 6de463d3

由 Zhou Wei 提交于 8月 15, 2020

* expose and unify the Tensor concepts to the user

* expose tensor to user

* add copy place for Tensor

* add copy place for Tensor

* add note

* add macro PADDLE_WITH_CUDA

* remove RUN_TYPE=DIST

* fix some error

6de463d3

29 7月, 2020 1 次提交

Simplify BufferedReader to improve DataLoader performance (#25648) · 1b3081b1

由 Chen Weihang 提交于 7月 29, 2020

* simplify buffered reader to improve DataLoader performance

* fix 22 failed unittests

* fix cuda pinned context condition

* fix test_reader_reset failed

* fix two failed unittests

* change unittest place

* polish error messaage

* polish cast op GetExpecctedKernelType

* remove debug info in unittest

1b3081b1

11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

27 4月, 2020 2 次提交

[dy2static] Add print transformer and unify print format (#24068) · 9b851ba2

由 Chen Weihang 提交于 4月 27, 2020

* add print transformer & unify print format, test=develop

* remove using of dygraph_to_static_func, test=develop

* remove python stdout capture, test=develop

* fix compatibility problems for PY2, test=develop

* fix detail error, test=develop

* fix type analysis bug, test=develop

* fix print tuple compatible error in PY2, test=develop

* replace get_func to declarative, test=develop

* fix detail bug, test=develop

* fix some detail problems, test=develop

* change visit_call in print transformer, test=develop

9b851ba2

Y

Add the implementation of inverse (#23310) · ecfddebb
由 Yiqun Liu 提交于 4月 27, 2020

ecfddebb

17 3月, 2020 1 次提交
- A
  
  Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695)" (#22985) · 5842ae67
  由 Adam 提交于 3月 17, 2020
  
  5842ae67
11 3月, 2020 1 次提交
- A
  
  Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695) · 056edf39
  由 Adam 提交于 3月 11, 2020
  
  056edf39
12 12月, 2019 1 次提交

memory leak for cpu (#21174) · 9ad940fd

由 tangwei12 提交于 12月 12, 2019

* add fake init for the trainer, fix large memory hold in the trainer
* do not merge recv vars from a remote endpoint, test=develop
* add recv and save op, merge slice var in one op, save memory
* remove hsigmoid with pull sparse, test=develop

9ad940fd

02 12月, 2019 1 次提交
- W
  
  fix the correctness of memcpy profiling result test=develop (#21458) · d4776ec0
  由 wangchaochaohu 提交于 12月 02, 2019
  
  d4776ec0
28 11月, 2019 2 次提交
- Z
  Use system allocator in OpTest (#21335) · 09696d5d
  由 Zeng Jinle 提交于 11月 28, 2019
```
* use system allocator in unittests, test=develop

* fix op bugs, test=develop

* fix tensor copy bug when src and dst are the same, test=develop
```
  09696d5d
- Z
  
  fix lod_reset bug, test=develop (#21392) · b97fc16d
  由 Zeng Jinle 提交于 11月 28, 2019
  
  b97fc16d
14 10月, 2019 1 次提交

Dlpack support (#20039) · 12e4be03

由 633WHU 提交于 10月 14, 2019

* support dlpack to tensor and implement python interface test=develop

* add unittest for _to_dlpack and from_dlpack test=develop

12e4be03

03 9月, 2019 1 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
14 8月, 2019 1 次提交
- C
  Use CUDAPinnedPlace in buffered_reader (#19112) · c70a97f4
  由 chengduo 提交于 8月 14, 2019
```
Use CUDAPinnedPlace in buffered_reader
```
  c70a97f4
24 5月, 2019 1 次提交
- W
  add __str__ method for tensor and lodtensor to support print test=dev… (#17588) · 6724a652
  由 wopeizl 提交于 5月 24, 2019
```
* add __str__ method for tensor and lodtensor to support print test=develop
```
  6724a652
28 3月, 2019 1 次提交

[MKL-DNN] Tensor modifications revert (#16462) · 26323274

由 Jacek Czaja 提交于 3月 28, 2019

* Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)"

This reverts commit 13816dd4.
Apart from enabling transformer for MKL-DNN

* Revert "- MKL-DNN pooling updated to set_prim_desc"

This reverts commit c63f6b20.

Conflicts:
	paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc

* Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)"

test=develop

This reverts commit dec9cf53.

* - concat compilation fix

- lint

test=develop

- Lint fixes

test=develop

- Lint fixes

test=develop

- Fix Transpose MKLDNN op

test=develop

26323274

19 3月, 2019 2 次提交

Z
add allocator flags · 22715487
由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
22715487

[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233) · 13816dd4

由 Jacek Czaja 提交于 3月 19, 2019

* - Fix to crash of Transformer when mkldnn is to be used

Desc: TensorCopy was not setting MKLDNN primitive descriptor when layout was to be kMKLDNN

test=develop

* - Enable transformer for mkl-dnn

test=develo

* - Compilation fix

test=develop

* - Removed manual selection of MKL-DNN ops to be used in Transformer test

test=develop

13816dd4

11 3月, 2019 1 次提交

Revert "Revert "Add Event for TensorCopy"" (#16035) · ad80bde8

由 chengduo 提交于 3月 11, 2019

* Revert "Revert "Add Event for TensorCopy" (#16022)"

This reverts commit e2da3a5b.

* use default stream
test=develop

ad80bde8

04 3月, 2019 3 次提交
- C
  Revert "Add Event for TensorCopy" (#16022) · 92438f61
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  92438f61
- C
  Add Event for TensorCopy (#15953) · 06f3c857
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  06f3c857
- C
  Revert "Add Event for TensorCopy" (#16022) · e2da3a5b
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  e2da3a5b
01 3月, 2019 1 次提交
- C
  Add Event for TensorCopy (#15953) · 7235fd66
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  7235fd66

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致