提交 · ff7af219f19a57dd03001767ca34ff7d261ad709 · BaiXuePrincess / Paddle

16 6月, 2020 1 次提交

由 hutuxian 提交于 6月 16, 2020

* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.

5822862d

20 4月, 2020 1 次提交

Optimize the error messages of paddle CUDA API (#23816) · 78170037

由 Zhou Wei 提交于 4月 20, 2020

* Optimize the error messages of paddle CUDA API, test=develop

* fix the error messages of paddle CUDA API, test=develop

* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop

* remove build_ex_string,test=develop

* merge conflict,test=develop

78170037

04 3月, 2020 1 次提交

Add flags to limit gpu memory (#22793) · d41d802b

由 Zeng Jinle 提交于 3月 04, 2020

* add recorded cuda memory apis, fix typo, test=develop

* add more ut, test=develop

* follow comments, test=develop

* fix py35 incompatible issues, test=develop

d41d802b

09 1月, 2020 1 次提交
- 石
  
  [Feature] Lite subgraph (#22114) · ad0dfb17
  由石晓伟提交于 1月 09, 2020
  
  ad0dfb17
08 1月, 2020 1 次提交

Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6

由 zhaoyuchen2018 提交于 1月 08, 2020

stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

3d4f2aa6

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

06 11月, 2019 1 次提交
- Z
  
  refine error message of allocator again, test=develop (#21023) · a710ccc0
  由 Zeng Jinle 提交于 11月 06, 2019
  
  a710ccc0
01 11月, 2019 1 次提交
- W
  
  gpu info query refine test=develop (#20904) · 7695b713
  由 wangchaochaohu 提交于 11月 01, 2019
  
  7695b713
12 10月, 2019 1 次提交
- W
  enable cpu machine to run paddle in gpu lib · 751812a6
  由 Wilber 提交于 10月 12, 2019
```
enable cpu machine to run paddle model in gpu lib
```
  751812a6
16 8月, 2019 1 次提交
- Z
  
  move_flags_to_unified_files_for_management, test=develop (#19224) · 708bd979
  由 Zeng Jinle 提交于 8月 16, 2019
  
  708bd979
01 8月, 2019 1 次提交

Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950) · 08fa98f7

由 Zeng Jinle 提交于 8月 01, 2019

* fix gpu_info, test=develop

* fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop

* fix bug again for reserving size, test=develop

08fa98f7

31 7月, 2019 1 次提交
- H
  GPU allocation uses fraction of available memory (#18896) · ea6ee76f
  由 Huihuang Zheng 提交于 7月 31, 2019
```
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
```
  ea6ee76f
18 7月, 2019 1 次提交

Optimize the content of error reporting information, print error code and... · 772e0956

由 zhouwei25 提交于 7月 18, 2019

Optimize the content of error reporting information, print error code and official document web sites (#18671)

optimize the error reporting information of cuda related API
index on develop: 130ac177 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop

772e0956

16 7月, 2019 1 次提交
- L
  
  print out error code of cudaGetDeviceProperties if failed (#18643) · 75953096
  由 liuwei1031 提交于 7月 16, 2019
  
  75953096
30 4月, 2019 1 次提交
- H
  Fix a typo in gpu_info.cc (#17175) · e4a53324
  由 Huihuang Zheng 提交于 4月 30, 2019
```
test=develop
```
  e4a53324
21 3月, 2019 1 次提交

add more unittest · 953214ad

由 sneaxiy 提交于 3月 19, 2019

modify allocator strategy
remove changes of legacy buddy_allocator
test=develop

953214ad

19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

S
lazy_allocator · 51227bd4
由 sneaxiy 提交于 1月 23, 2019
```
test=develop
```
51227bd4

04 12月, 2018 1 次提交

[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661) · 29d9fb53

由 Wu Yi 提交于 12月 04, 2018

* wip multi process multi gpu dist training

* workable for p2p

* update test=develop

* change back env name test=develop

* fix alloc init

* fix cpu build test=devlop

* fix mac tests test=develop

* refine code

* refine test=develop

29d9fb53

27 11月, 2018 1 次提交
- P
  
  minor fix · 38715e6f
  由 peizhilin 提交于 11月 27, 2018
  
  38715e6f
26 11月, 2018 2 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
- P
  
  Given the different fraction_of_gpu_memory_to_use depends on platform · b2f8d418
  由 peizhilin 提交于 11月 26, 2018
  
  b2f8d418
22 11月, 2018 2 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

P

fix unit test cases · 7c8c9dc9
由 peizhilin 提交于 11月 22, 2018

7c8c9dc9

08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
15 10月, 2018 1 次提交
- C
  add cuda version display (#13885) · 2c9839c8
  由 chengduo 提交于 10月 15, 2018
```
test=develop
```
  2c9839c8
08 10月, 2018 1 次提交
- X
  clarify the fraction_of_gpu_memory flag · ab798a28
  由 Xin Pan 提交于 10月 08, 2018
```
test=develop
```
  ab798a28
27 9月, 2018 1 次提交
- T
  Revert "Some trivial optimization (#13530)" · a4f7696a
  由 typhoonzero 提交于 9月 27, 2018
```
This reverts commit 1d91a49d.
```
  a4f7696a
26 9月, 2018 1 次提交

Some trivial optimization (#13530) · 1d91a49d

由 chengduo 提交于 9月 26, 2018

* some trivial opt

* remove the fix of lod_tensor and shrink_rnn_memory_op

* refine ShrinkRNNMemoryOp

test=develop

1d91a49d

14 8月, 2018 1 次提交
- C
  
  refine by reviewer's advice · da39d84a
  由 chenweihang 提交于 8月 14, 2018
  
  da39d84a
08 8月, 2018 1 次提交
- C
  
  polish high frequency enforce error message · 61052cdb
  由 chenweihang 提交于 8月 08, 2018
  
  61052cdb
23 4月, 2018 1 次提交
- F
  
  Add synchronous TensorCopy and use it in double buffer · 9f11da59
  由 fengjiayi 提交于 4月 23, 2018
  
  9f11da59
08 4月, 2018 1 次提交
- Y
  Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710) · 0c43a376
  由 Yi Wang 提交于 4月 07, 2018
```
* Fix cpplint errors with paddle/fluid/platform/gpu_info.*

* Update
```
  0c43a376
10 3月, 2018 1 次提交
- K
  
  add gpu info func to get compute cap · 1998d5af
  由 Kexin Zhao 提交于 3月 09, 2018
  
  1998d5af
03 3月, 2018 1 次提交
- C
  
  get max threads of GPU · 00e596ed
  由 chengduoZH 提交于 3月 02, 2018
  
  00e596ed
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
10 1月, 2018 1 次提交
- D
  
  "fix CI" · a6edc038
  由 dzhwinter 提交于 1月 09, 2018
  
  a6edc038

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致