提交 · 0b54d54fd847da881116f3c8628ec449c5c0d5d3 · Crayon鑫 / Paddle

11 7月, 2020 1 次提交

Fix index overflow bug of the CUDA kernel loop increment (#25435) · 0b54d54f

由 Chen Weihang 提交于 7月 11, 2020

* fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop

* replace old macro & for condition, test=develop

* polish details, test=develop

0b54d54f

03 6月, 2020 1 次提交

Support gradient accumulation of fp16 in imperative mode (#24823) · b67ded04

由 Leo Chen 提交于 6月 03, 2020

* support gradient accumulation of fp16 in imperative mode, test=develop

* enhance coverage test, test=develop

* follow comments, test=develop

b67ded04

25 6月, 2019 1 次提交

Sequence mask support tensor (#18249) · df2eee71

由 Hongyu Liu 提交于 6月 25, 2019

* sequnce mask support max length tensor input; test=develop

* add rnn_impl.py; test=develop

* add basic gru lstm unittest; test=develop

* fix api spec; test=develop

* fix sequence_mask op bug;
test=develop
test=document_preview

* change +-*x to elmentwise_op; test=develop

* add mkl flag; test=develop

* fix rnn impl bug; test=develop

* update api spec; test=develop

* fix doc bug; test=develop

* fix lstm bugs; test=develop

df2eee71

12 12月, 2018 1 次提交
- Y
  Change tensor uses proto::VarType::type · 9bd70a1e
  由 Yu Yang 提交于 12月 11, 2018
```
test=develop
```
  9bd70a1e
11 12月, 2018 1 次提交
- Y
  Fix Eigen macro when using GPU · 7604b1ad
  由 Yu Yang 提交于 12月 11, 2018
```
The macro should be defined by compiler rather than by source.

test=develop
```
  7604b1ad
31 8月, 2018 1 次提交
- D
  Feature/template (#13093) · ab1097cd
  由 dzhwinter 提交于 8月 31, 2018
```
* remove template operator

* "fix compile"

* "fix ci"

* "fix ci"
```
  ab1097cd
27 8月, 2018 1 次提交
- Q
  Support data type int8_t . (#12841) · 1f09bc32
  由 qingqing01 提交于 8月 27, 2018
```
* Support int8 type.
```
  1f09bc32
10 5月, 2018 1 次提交
- Y
  
  matmul support float16/double · 27197290
  由 yuyang18 提交于 5月 10, 2018
  
  27197290
04 5月, 2018 1 次提交
- Y
  
  Clean and extract blas · ef6ea790
  由 Yu Yang 提交于 5月 04, 2018
  
  ef6ea790
03 5月, 2018 1 次提交
- Y
  
  Clean MatMul · 815d8884
  由 Yu Yang 提交于 5月 03, 2018
  
  815d8884
28 4月, 2018 1 次提交
- Y
  
  Refactor GEMM in blas · c888e016
  由 Yu Yang 提交于 4月 28, 2018
  
  c888e016
25 4月, 2018 1 次提交
- Y
  Fix batch_gemm bugs · 2a06e307
  由 Yu Yang 提交于 4月 25, 2018
```
stride should be int64_t, not int
```
  2a06e307
14 4月, 2018 1 次提交
- K
  
  fix unused var error (#9908) · 92913027
  由 Kexin Zhao 提交于 4月 13, 2018
  
  92913027
13 4月, 2018 1 次提交
- K
  
  fix cuda 7.5 compile error (#9885) · 617e790a
  由 Kexin Zhao 提交于 4月 12, 2018
  
  617e790a
11 4月, 2018 1 次提交
- K
  Fix cuda 7.5 error with cublas GEMM (#9811) · 7ed457e7
  由 Kexin Zhao 提交于 4月 10, 2018
```
* fix gemm error for cuda 7.5

* fix version number
```
  7ed457e7
07 4月, 2018 1 次提交
- K
  Update the cuda API and enable tensor core for GEMM (#9622) · d00bd9eb
  由 Kexin Zhao 提交于 4月 06, 2018
```
* change from hgemm to gemmEx

* fix cpplint
```
  d00bd9eb
17 3月, 2018 1 次提交
- K
  
  initial commit · 39c676e2
  由 Kexin Zhao 提交于 3月 16, 2018
  
  39c676e2
16 3月, 2018 1 次提交
- Y
  
  Finish adaption for backward. · bf3f56e8
  由 yangyaming 提交于 3月 15, 2018
  
  bf3f56e8
12 3月, 2018 1 次提交
- K
  
  address comments · 3b44b849
  由 Kexin Zhao 提交于 3月 11, 2018
  
  3b44b849
09 3月, 2018 1 次提交

Add float16 GEMM math function on GPU (#8695) · 90215b78

由 kexinzhao 提交于 3月 08, 2018

* test cpu float16 data transform

* add isnan etc

* small fix

* fix containsNAN test error

* add data_type transform GPU test

* add float16 GPU example

* fix error

* fix GPU test error

* initial commit

* fix error

* small fix

* add more gemm fp16 tests

* fix error

* add utility function

90215b78

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
03 2月, 2018 1 次提交
- C
  
  Add layer norm [GPU] · 76e188e5
  由 chengduoZH 提交于 2月 02, 2018
  
  76e188e5
27 12月, 2017 1 次提交
- Q
  
  Update the CUDA kernel. · 19367389
  由 qingqing01 提交于 12月 26, 2017
  
  19367389
26 12月, 2017 1 次提交
- Q
  
  Optimize the rowwise add function. · 32d881be
  由 qingqing01 提交于 12月 26, 2017
  
  32d881be
25 12月, 2017 2 次提交
- Q
  remove unused place (#6972) · efd37269
  由 QI JUN 提交于 12月 25, 2017
```
* remove unused place

* fix ci
```
  efd37269
- D
  
  GPUPlace to CUDAPlace (#6960) · 0d2235aa
  由 dzhwinter 提交于 12月 25, 2017
  
  0d2235aa
18 12月, 2017 1 次提交
- Q
  add more place test and rename Cudnn to CUDNN (#6621) · 93a2d9c5
  由 QI JUN 提交于 12月 18, 2017
```
* add more place_test and rename Cudnn to CUDNN

* fix ci
```
  93a2d9c5
14 12月, 2017 1 次提交

"derived cudnnDevice context" (#6585) · 0e9b393b

由 dzhwinter 提交于 12月 14, 2017

* "derived cudnnDevice context"

* "leave remove cudnn handle from CUDADeviceContext"

* "fix math function error"

0e9b393b

12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

27 11月, 2017 1 次提交
- Y
  
  implement forward · 1abd3b3a
  由 Yancey1989 提交于 11月 27, 2017
  
  1abd3b3a
23 11月, 2017 1 次提交
- D
  
  Fix lstm_op and gru_op in debug mode. · 7fb1f7a2
  由 dangqingqing 提交于 11月 23, 2017
  
  7fb1f7a2
16 11月, 2017 1 次提交

feature/while_grad_op (#5554) · 18f0c40a

由 Yang Yang(Tony) 提交于 11月 16, 2017

* first commit

* Python API for while op

* Python Unittest for simple while_op forward

* fix out to be list

* Fix UT

* VarType

* Fix several bugs

* Fix bug

* Fix bug

* Fix Bug

* Fix bug

* Fix unittest

* Remove debug log

* Add comments

* add PADDLE_ENFORCE

* while_grad_op first commit

* Add `BlockDescBind::FindRecursiveOrCreateVar()` and fix bugs

* not sure how to setdim of while outputs

* push for test

* add executor vlog

* fix bug of while_op cond

* Several enhancement for code

1. Backward always infer shape & infer var type. Since there are RENAME
variables will be created when creating backward operator, but their
shape & var types are not inferenced.
2. Never use SomePtr-> directly, since every pointer could be nullptr if
it is a function return value. Add `detail::Ref` to cast pointer to
reference safely.
3. Enhance error message for backward.
4. Infer data type of variable in `sum` and `tensor_write`

* Fix bugs of while_op gradient

* Fix several bugs of while_op grad

* fix fill zeros like

* fix 3 >= 3

* fix place holder shouldn't be null

* fail on sum op

* Fix SumOp of TensorList

* clean up

* pass while test

* fix test_array_write_read

* pass sum op

* Support int/int64 for fill_constant_batch_size_like

* Fix compile

18f0c40a

14 11月, 2017 1 次提交
- D
  
  Move RowwiseAdd functor to math_funcion and Add ColwiseSum functor. · 26736576
  由 dangqingqing 提交于 11月 14, 2017
  
  26736576
13 11月, 2017 1 次提交
- D
  
  Resume unit testing. · e9082bb7
  由 dangqingqing 提交于 11月 13, 2017
  
  e9082bb7
11 11月, 2017 2 次提交

D

Use G++ to compile some cu operators. · f5e36765
由 dangqingqing 提交于 11月 11, 2017

f5e36765

Fixing duplicate struct name TensorSetConstant. (#5532) · 58b4c9af

由 emailweixu 提交于 11月 10, 2017

TensorSetConstant struct is used both in math_function.cc and math_function.cu. Somehow the release version can correctly handle it. But in debug version, set_constant_with_place() in math_function.cu uses the TensorSetConstant in math_function.cc and causes crash.

58b4c9af

08 11月, 2017 2 次提交
- Y
  
  Fix CI · 0708a155
  由 Yu Yang 提交于 11月 07, 2017
  
  0708a155
- Y
  
  Add `op::math::set_constant` without template · aadb0981
  由 Yu Yang 提交于 11月 07, 2017
  
  aadb0981

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致