提交 · 6d26b332d9fee77f16a8655c8ead3f21f2805975 · Crayon鑫 / Paddle

01 3月, 2022 1 次提交

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

11 2月, 2022 1 次提交
- F
  [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) · d25a7f9e
  由 Feiyu Chan 提交于 2月 11, 2022
```
* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`
```
  d25a7f9e
25 1月, 2022 1 次提交

[Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338

由 Weilong Wu 提交于 1月 25, 2022

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again

2bafd338

17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

16 9月, 2020 1 次提交
- W
  update the error message check for the some ops · 4e8582fe
  由 wawltor 提交于 9月 16, 2020
```
update the error message check for the some ops
```
  4e8582fe
11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

16 1月, 2020 1 次提交
- W
  
  remove unused code test=develop (#22327) · 1e932ecc
  由 wangchaochaohu 提交于 1月 17, 2020
  
  1e932ecc
29 11月, 2019 1 次提交

Fix Cond Bug for Nested Control Flow (#21340) · 630be319

由 Huihuang Zheng 提交于 11月 29, 2019

* Commit before merging develop

test=develop

* Backup after working with Huihuang logs

* Commit before deleting Huihuang debug loggings

* Commit before debug

test=develop

* Fix bug commit

test=develop

* Backup of fixing bugs

test=develop

* Clean up code

test=develop

* Fix a bug in sum_op

test=develop

630be319

15 10月, 2019 1 次提交
- Z
  Fix sum op fails as no memory in tensor(#20602) · 8314e64a
  由 zhaoyuchen2018 提交于 10月 15, 2019
```
test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  8314e64a
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

22 8月, 2019 1 次提交

Enhance OpTest to check the consistency of operators when using and not using inplace (#19101) · a9d5fc51

由 Leo Chen 提交于 8月 22, 2019

* add pybind interface to get all inplace ops, test=develop

* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop

* handle corner cases in op_test, test=develop

* support outputs without tensor holder_, like XShape in reshape_op, test=develop

* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop

* use reshape_grad instead of reshape in FlattenGradOp, test=develop

* fix error debug dims info for variables like XShape, test=develop

* change computational order in sum_op to relieve computation difference using inplace, test=develop

* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop

* follow sneaxiy's comments, test=develop

* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop

a9d5fc51

11 7月, 2019 1 次提交

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

16 6月, 2019 1 次提交

Update backward appending stragety to support double backward and fix some bug. (#18104) · 80d2e66f

由 qingqing01 提交于 6月 16, 2019

* Update backward.py:
     - If there is no input grad var in all outputs of previous ops, do not append this op into graph.
     - Only apply this stragety when double backward.
* Update some double backward op.
* Update sum_op to judge whether a tensor is empty by numel or IsInitialized().

80d2e66f

08 5月, 2019 1 次提交

Optimize the cuda implementation of sum_op (#17283) · 6b84688b

由 Yiqun Liu 提交于 5月 08, 2019

* Optimize the cuda implementation of sum_op, which add two lod_tensors inplace.
test=develop

* Use eigen to add to tensors.
test=develop

6b84688b

07 5月, 2019 1 次提交

optimize sum op (#16820) · 32b62c25

由 zhaoyuchen2018 提交于 5月 07, 2019

* optimize sum op

fuse multi eigen kernel calls into one cuda kernel.
refine code

test=develop.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code according to comments.

test=develop

* refine code

delete sum_op_gpu.h
test=develop

* Fix test error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code in format.

test=develop.

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

32b62c25

11 12月, 2018 1 次提交
- Y
  Fix Eigen macro when using GPU · 7604b1ad
  由 Yu Yang 提交于 12月 11, 2018
```
The macro should be defined by compiler rather than by source.

test=develop
```
  7604b1ad
07 11月, 2018 1 次提交

Add fp16 backward support (#14202) · a9b5d42d

由 chengduo 提交于 11月 07, 2018

* add fp16 backward support
test=develop

* add sum_op fp16 test

* disable test_dist_save_load
test=develop

* add check_grad for sum

* add unit test for softmax_grad fp16
test=develop

* add scale_op unit test

* add mul_grad_op unit test for fp16

* add cross_entropy_grad and eman_grad unit test for fp16
test=develop

* fix cross_entropy unit test

* add pool2d fp16 unit test

* refine conv2d fp16 unit test
test=develop

* refine activation unit test
test=develop

* fix ci
test=develop

* follow zhihong's comment, copy from https://github.com/PaddlePaddle/Paddle/pull/12796
test=develop

a9b5d42d

17 8月, 2018 1 次提交
- D
  Revert ""cherry picked operators changes" (#12184)" (#12747) · 4069262f
  由 dzhwinter 提交于 8月 17, 2018
```
This reverts commit bf3c3496.
```
  4069262f
16 8月, 2018 1 次提交

"cherry picked operators changes" (#12184) · bf3c3496

由 dzhwinter 提交于 8月 16, 2018

* "cherry picked operators changes"

* "remove duplicated code"

* "add constant setter"

* "add get expected kernel"

* "fix ci"

* "add fill constant"

bf3c3496

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

23 11月, 2017 1 次提交
- Y
  Feature/support int64 for sum (#5832) · c077a6d5
  由 Yu Yang 提交于 11月 23, 2017
```
* Support int64 for sum op

* Refine code
```
  c077a6d5
27 10月, 2017 1 次提交

Gradient check use graph (#5027) · be00b0c4

由 Yu Yang 提交于 10月 26, 2017

* Simplize Gradient Check

* Stash

* Extract apply_backward_pass to backward.py

Rename apply_backward_pass to append_backward_ops

* Use graph API to check gradient

* Fix ci

* Fix CI

* Fix backward for double precision

* Stash

* Fix CI

* Fix ci

* Ignore GRU test

* Ignore xe op

* Fix CI

* Fix softmax with xe gradient

The correct equation should be IG = OG * (d_softmax_with_xe())

* Fix typo

* Fix merge error

* Disable LRN

be00b0c4

03 10月, 2017 2 次提交
- Y
  
  Fix CRLF in sum_op.cu · ff1bfded
  由 Yu Yang 提交于 10月 02, 2017
  
  ff1bfded
- Y
  
  Simplify SumOp Kernel · adec0d30
  由 Yu Yang 提交于 10月 02, 2017
  
  adec0d30
05 9月, 2017 1 次提交
- Q
  
  refactor operator python test and add sum operator · f314330c
  由 qijun 提交于 9月 05, 2017
  
  f314330c
04 9月, 2017 1 次提交
- L
  
  remove scatter_op.cu/gather_op.cu as they support only_cpu now · 740c8ba1
  由 Luo Tao 提交于 9月 04, 2017
  
  740c8ba1
26 8月, 2017 1 次提交
- Z
  
  fix problems · bfeecfd3
  由 zchen0211 提交于 8月 25, 2017
  
  bfeecfd3
25 8月, 2017 1 次提交
- Z
  
  scatter check in · c5e28dd1
  由 zchen0211 提交于 8月 24, 2017
  
  c5e28dd1
07 8月, 2017 1 次提交
- D
  
  "remove alias to more operators" · 6b23b91c
  由 dongzhihong 提交于 8月 07, 2017
  
  6b23b91c
04 8月, 2017 1 次提交
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
31 7月, 2017 1 次提交
- Q
  
  add EIGEN_USE_GPU macro to op.cu file · 61f94f00
  由 qijun 提交于 7月 31, 2017
  
  61f94f00
25 7月, 2017 1 次提交
- Y
  Add type_alias to import framework into ops · efc119b4
  由 Yu Yang 提交于 7月 25, 2017
```
Make implement an operator less noisy.
```
  efc119b4
19 7月, 2017 1 次提交
- Q
  Add sgd op (#2950) · e3b27d19
  由 Qiao Longfei 提交于 7月 19, 2017
```
* a simplest SGD op
```
  e3b27d19

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致