- 04 2月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include second time, test=develop
-
- 19 1月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* unify calling cudaSetDevice * fix compile
-
- 13 11月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 16 8月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 15 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop
-
- 07 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* refine PADDLE_ENFORCE test=develop
-
- 16 6月, 2020 1 次提交
-
-
由 hutuxian 提交于
* Add a StatValue class in the backend to represent a stat. * Add a singleton StatRegistry to maintain the collection of stats. * For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.
-
- 20 4月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* Optimize the error messages of paddle CUDA API, test=develop * fix the error messages of paddle CUDA API, test=develop * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop * remove build_ex_string,test=develop * merge conflict,test=develop
-
- 04 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* add recorded cuda memory apis, fix typo, test=develop * add more ut, test=develop * follow comments, test=develop * fix py35 incompatible issues, test=develop
-
- 09 1月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 14 11月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 06 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 01 11月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 12 10月, 2019 1 次提交
-
-
由 Wilber 提交于
enable cpu machine to run paddle model in gpu lib
-
- 16 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 01 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix gpu_info, test=develop * fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop * fix bug again for reserving size, test=develop
-
- 31 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
-
- 18 7月, 2019 1 次提交
-
-
由 zhouwei25 提交于
Optimize the content of error reporting information, print error code and official document web sites (#18671) optimize the error reporting information of cuda related API index on develop: 130ac177 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop
-
- 16 7月, 2019 1 次提交
-
-
由 liuwei1031 提交于
-
- 30 4月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
test=develop
-
- 21 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
modify allocator strategy remove changes of legacy buddy_allocator test=develop
-
- 19 3月, 2019 1 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
- 24 1月, 2019 2 次提交
-
-
由 Yiqun Liu 提交于
* Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop
-
由 sneaxiy 提交于
test=develop
-
- 04 12月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip multi process multi gpu dist training * workable for p2p * update test=develop * change back env name test=develop * fix alloc init * fix cpu build test=devlop * fix mac tests test=develop * refine code * refine test=develop
-
- 27 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 26 11月, 2018 2 次提交
- 22 11月, 2018 2 次提交
-
-
由 chengduo 提交于
* refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop
-
由 peizhilin 提交于
-
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 15 10月, 2018 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 08 10月, 2018 1 次提交
-
-
由 Xin Pan 提交于
test=develop
-
- 27 9月, 2018 1 次提交
-
-
由 typhoonzero 提交于
This reverts commit 1d91a49d.
-
- 26 9月, 2018 1 次提交
-
-
由 chengduo 提交于
* some trivial opt * remove the fix of lod_tensor and shrink_rnn_memory_op * refine ShrinkRNNMemoryOp test=develop
-
- 14 8月, 2018 1 次提交
-
-
由 chenweihang 提交于
-
- 08 8月, 2018 1 次提交
-
-
由 chenweihang 提交于
-
- 23 4月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 08 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* Fix cpplint errors with paddle/fluid/platform/gpu_info.* * Update
-