- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 02 9月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 20 8月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop
-
- 12 8月, 2019 1 次提交
-
-
由 wopeizl 提交于
* add tensorrt support for windows
-
- 05 8月, 2019 1 次提交
-
-
由 liuwei1031 提交于
* fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
-
- 27 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
Also fix a dependency error which may cause compile error
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 03 6月, 2019 2 次提交
-
-
由 wangchaochaohu 提交于
* revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop
-
由 chengduo 提交于
test=develop
-
- 07 5月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop
-
- 03 4月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
test=develop This reverts commit c38c7c56.
-
- 02 4月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* link the libwbaes.so into paddle * polish detail, test=develop * try fix mac_pr_ci error, test=develop * add compile option, test=develop * fix ci error, test=develop * ignore failed to find mac lib, test=develop * change cdn to bj, cdn can't get the latest version * trigger ci, test=develop * temporary delete win32 lib linking, test=develop * change https to http, test=develop * turn compile option on to off * turn compile option off to on, test=develop * try lib compiled by gcc4.8, test=develop * update lib version, test=develop * link other lib, test=develop * add setup config * delete false, test=develop * delete no_soname, test=develop * recover so name set * fix, test=develop * adjust make config, test=develop * remove link to wbaes, test=develop * remove useless define, test=develop
-
- 04 3月, 2019 2 次提交
- 27 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 26 2月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
test=develop
-
- 22 2月, 2019 2 次提交
-
-
由 tensor-tang 提交于
* Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit 676995c8. * test=develop
-
由 Yihua Xu 提交于
* Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop
-
- 28 1月, 2019 1 次提交
-
-
由 tensor-tang 提交于
test=develop
-
- 26 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 19 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 18 12月, 2018 3 次提交
- 13 12月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 05 12月, 2018 1 次提交
-
-
由 liuhongyu 提交于
-
- 04 12月, 2018 1 次提交
-
-
由 liuhongyu 提交于
-
- 27 11月, 2018 2 次提交
-
-
由 Jacek Czaja 提交于
-
由 liuhongyu 提交于
-
- 26 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 23 11月, 2018 1 次提交
-
-
由 chengduozh 提交于
test=develop
-
- 22 11月, 2018 3 次提交
-
-
由 chengduo 提交于
* refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop
-
由 peizhilin 提交于
-
由 wopeizl 提交于
* add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop
-
- 21 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 19 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Convolution fusion operator. * Clean code test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* add cudnn ctc loss * wip add test test=develop * wip * wip * done test=develop * move include cudnn test=develop * test test=develop * fix build test=develop * fix build test=develop * fix build on cudnn5 test=develop * fix cudnn5 build test=develop * fix cudnn5 build test=develop * merge develop softmax functor change test=develop
-
- 13 11月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 09 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* exhaustive search for cuDNN conv. * Refine code and add unit testing. * Fix model load in fluid/inference and unit testing in conv2d * Follow comments. * Fix compiling test=develop
-