提交 · 890638cf7dc47e101d4b78c1b58fa326fc4261c7 · Crayon鑫 / Paddle

08 9月, 2021 1 次提交
- L
  [NPU] release gil before op run (#35370) · db6242e9
  由 Leo Chen 提交于 9月 08, 2021
```
* release gil before op run

* support npu grad test

* fix op_test
```
  db6242e9
26 5月, 2021 1 次提交

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

21 5月, 2021 1 次提交
- C
  replace complex64/128 with complex template in cast Op (#33019) · 79d918d9
  由 chentianyu03 提交于 5月 21, 2021
```
* replace complex in set tensor from and to numpy

* replace complex template in cast op
```
  79d918d9
20 5月, 2021 1 次提交

Add complex template type (#32857) · 738bf20e

由 chentianyu03 提交于 5月 20, 2021

* add complex template file

* add numtraits for complex template

* add complex template type register

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* make TensorCheckerVisitor support complex type

* fix operator= error

* add complex template

* add complex template type

* add complex template type to pyarray transform

* add complex template type to pyarray transform

* remove complex type for dlpack register

* set dlpack supprot complex type

* set dlpack supprot complex type

* set dlpack supprot complex type

* remove explict for complex constructor

* add complex unit test file

738bf20e

19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part6), test=develop (#31015) · 28b356b9
  由 Qi Li 提交于 2月 26, 2021
  
  28b356b9
04 2月, 2021 1 次提交
- W
  
  fix xpu dygraph place (#30868) · 6e3856d3
  由 WangXi 提交于 2月 04, 2021
  
  6e3856d3
03 2月, 2021 1 次提交
- W
  
  【kunlun】dygraph supports multi xpu card training (#30671) · b1026f64
  由 WangXi 提交于 2月 03, 2021
  
  b1026f64
20 1月, 2021 1 次提交

add some RecordEvent, for dygraph timeline (#30299) · d1b25ed9

由 wanghuancoder 提交于 1月 20, 2021

* add some RecordEvent, for dygraph timeline, test=develop

* change GpuMemcpySync to memory::Copy, test=develop

* fix compile problem, test=develop

* fix compile problem, test=develop

* fix, test=develop

* fix, test=develop

d1b25ed9

13 1月, 2021 1 次提交

Set expected place in child thread for dataloader to avoid costing cuda memory... · 3d015f1c

由 Leo Chen 提交于 1月 13, 2021

Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)

* set expected place in child thread for dataloader

* set device id when set tensor from numpy

* revert tensor_py change

* add compile guard

* fix ci

* fix bug

3d015f1c

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

30 10月, 2020 1 次提交
- 石
  update the version of pybind, test=develop (#28284) · d9b5f126
  由石晓伟提交于 10月 30, 2020
```
* update version pybind to v2.4.3, test=develop

* update unittests, test=develop
```
  d9b5f126
26 9月, 2020 1 次提交
- J
  
  Add conv2d bfloat16 support (#27325) · b0ee1405
  由 joanna.wozna.intel 提交于 9月 26, 2020
  
  b0ee1405
03 9月, 2020 1 次提交
- J
  
  Add bfloat16 data type (#25402) · 95e1434b
  由 joanna.wozna.intel 提交于 9月 03, 2020
  
  95e1434b
25 8月, 2020 1 次提交

optimized transformation form tensor to numpy (#26447) · c1f5df52

由 wanghuancoder 提交于 8月 25, 2020

* optimized transformation form tensor to numpy, test=develop

* optimized transformation form tensor to numpy, pass pre-commit, test=develop

* modify fetchophandle zerocopy to deepcopy in PE&CUP, test=develop

* modify py:array construct, test=develop

* fix _fetch_var to use deep copy, test=develop

c1f5df52

21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

15 8月, 2020 1 次提交

expose and unify the Tensor concepts to the user (#25978) · 6de463d3

由 Zhou Wei 提交于 8月 15, 2020

* expose and unify the Tensor concepts to the user

* expose tensor to user

* add copy place for Tensor

* add copy place for Tensor

* add note

* add macro PADDLE_WITH_CUDA

* remove RUN_TYPE=DIST

* fix some error

6de463d3

19 6月, 2020 1 次提交
- C
  
  polish tensor set error messag, test=develop (#25113) · b23801a2
  由 Chen Weihang 提交于 6月 19, 2020
  
  b23801a2
08 6月, 2020 1 次提交

Refine error message in pybind folder (#24886) · 6190023a

由 Leo Chen 提交于 6月 08, 2020

* refine err_msg of pybind.cc, test=develop

* refine err_msg in tensor_py.h, test=develop

* refine error msg, test=develop

* fix test_exception, test=develop

* follow comments, test=develop

6190023a

11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

09 3月, 2020 1 次提交
- Z
  Fix model int8 quant fail, test=develop (#22891) · a020a257
  由 zhaoyuchen2018 提交于 3月 09, 2020
```
As model fails when enable int8 quant, so disable allocate memory in cpu
for small variable.
```
  a020a257
27 2月, 2020 1 次提交

Refine adam op to improve performance, test=develop (#22346) · 72dde4ab

由 zhaoyuchen2018 提交于 2月 27, 2020

* Refine adam op, test=develop

* Fuse kernels together to reduce cpu time.

* Refine paddle enforce, test=develop

* Remove some comments, test=develop

* Refine code,test=develop

* Refine cuda kernel, test=develop

* Refine code according to comments, test=develop

72dde4ab

04 2月, 2020 1 次提交

Support int16 for Tensor (#22423) · 822e5b36

由 Leo Chen 提交于 2月 04, 2020

* add int16 support, test=develop

* add test, test=develop

* fix typo, test=develop

* fix dtype error in slice, test=develop

822e5b36

09 12月, 2019 1 次提交

Refine VarBase init function (#21587) · 4f81d1bd

由 Leo Chen 提交于 12月 09, 2019

* refine init function, test=develop

* add tests, test=develop

* remove extern, which may cause symbol error in gcc-4.8, test=develop

4f81d1bd

05 12月, 2019 1 次提交

Split VarBase from Python Variable for Dygraph (#21359) · cdd46d7e

由 Leo Chen 提交于 12月 05, 2019

* test=develop, fix docker with paddle nccl problem

* don't expose numerous Tensor.set(), test=develop

* fix condition, test=develop

* fix float16 bug, test=develop

* feed should be Tensor or np.array, not Variable or number, test=develop

* use forcecast to copy numpy slice to new array, test=develop

* remove float16-uint16 hacking, test=develop

* add variable method to varbase and refactor to_variable to support return varbase

* support kwargs in varbase constructor

* add VarBase constructor to support default python args

* refine varbase initial method

* reset branch

* fix ut for change VarBase error info to PaddleEnforce

* cherry is parameter change before

* overload isinstance to replace too many change of is_variable

* rm useless files

* rm useless code merged by git

* test=develop, fix some ut failed error

* test=develop, fix test_graph_wrapper

* add some tests, test=develop

* refine __getitem__, test=develop

* add tests, test=develop

* fix err_msg, test=develop

cdd46d7e

27 11月, 2019 1 次提交

Support numpy bridge (enabled by default in dygraph mode) (#20983) · d5ff79e5

由 Youwei Song 提交于 11月 27, 2019

* add numpy bridge

* fix template compile

* add unittest, add default
test=develop

* fix unittest
test=develop

* fix unittest
test=develop

* zero_copy=True for to_variable,
test=develop

* bug fix
test=develop

* disable deprecated NumPy API
test=develop

* use better design of NumpyAllocator
test=develop

* fix Py_None check
test=develop

* reset c++ tracer when jump out dygraph guard
test=develop

* refine PADDLE_ENFORCE_xx format
test=develop

* bug fix of tracer switch
test=develop

* update decref
test=develop

d5ff79e5

01 11月, 2019 2 次提交

L

tensor.set() supports array list and remove unused code, test=develop (#20959) · 2c3c579b
由 Leo Chen 提交于 11月 01, 2019

2c3c579b

Update Tensor.set() to support float16 (#19964) · 9974e407

由 Leo Chen 提交于 11月 01, 2019

* don't expose numerous Tensor.set(), test=develop

* fix condition, test=develop

* fix float16 bug, test=develop

* feed should be Tensor or np.array, not Variable or number, test=develop

* use forcecast to copy numpy slice to new array, test=develop

* remove float16-uint16 hacking, test=develop

9974e407

10 5月, 2019 1 次提交

Double backward of conv2d. (#17211) · e32c9888

由 qingqing01 提交于 5月 10, 2019

* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
    - Now use it in conv2d_grad_grad.
    - Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables，return None in Python.

e32c9888

06 5月, 2019 1 次提交
- Z
  Fix tensor_py.h (#17195) · c5eeecca
  由 Zeng Jinle 提交于 5月 06, 2019
```
* fix tensor_py,test=develop

* change class name,test=develop
```
  c5eeecca
30 4月, 2019 1 次提交

Fix mem leak when converting Tensor to numpy array (#17182) · 5dfe2ab9

由 Zeng Jinle 提交于 4月 30, 2019

* fix mem leak when converting Tensor to numpy array
test=develop

* remove unused unittest,test=develop

* follow comments, test=develop

* fix dygraph bug,test=develop

5dfe2ab9

22 4月, 2019 1 次提交

Speed unit testing. (#16978) · ea42e431

由 qingqing01 提交于 4月 22, 2019

* Speed affine_channel_op unit testing
* Add check in tensor_py
* Fix ONLY_CPU Compiling

ea42e431

27 3月, 2019 1 次提交
- W
  Tensor index (#16223) · c300b1ba
  由 wopeizl 提交于 3月 27, 2019
```
* extend the slice function for python
test=develop
```
  c300b1ba
12 12月, 2018 1 次提交
- Y
  Change tensor uses proto::VarType::type · 9bd70a1e
  由 Yu Yang 提交于 12月 11, 2018
```
test=develop
```
  9bd70a1e
10 12月, 2018 1 次提交
- T
  add HasProtoAttr function in op_desc.h, clean node.h · 067ed70f
  由 Tao Luo 提交于 12月 10, 2018
```
test=develop
```
  067ed70f
03 12月, 2018 1 次提交
- S
  
  fix bug · c47c451a
  由 sneaxiy 提交于 12月 03, 2018
  
  c47c451a
24 11月, 2018 1 次提交
- M
  Change the include files because the version changes of pybind11 · 81994e84
  由 minqiyang 提交于 11月 24, 2018
```
test=develop
```
  81994e84
19 10月, 2018 1 次提交
- S
  
  fix pinned allocator · 2002e71d
  由 sneaxiy 提交于 10月 19, 2018
  
  2002e71d
02 10月, 2018 1 次提交
- Y
  
  Add comments and polish code style · 15076c32
  由 Yu Yang 提交于 10月 02, 2018
  
  15076c32

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致