- 15 9月, 2020 3 次提交
- 07 9月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* add lstm cudnn of padding data and refine cudnn codes
-
- 03 9月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 19 8月, 2020 1 次提交
-
-
由 GaoWei8 提交于
-
- 07 8月, 2020 1 次提交
-
-
由 Pei Yang 提交于
* fix trt plugin registry without trt lib * support trt4 * refine code style
-
- 05 8月, 2020 2 次提交
-
-
由 Zhaolong Xing 提交于
* cunn8 support test=develop * fix ci error test=develop
-
由 Pei Yang 提交于
* develop dynamic shape serilization * add test param for gelu * fix bugs * delete redundant comments * debug * fix conflict. test=develop * fix bug. test=develop * add trt dynamic shape serialized support * fix ernie serialized bug test=develop * fix codestyle test=develop * fix bug test=develop * fix bug.test=develop * modify cmakelist test=develop * fix bug test=develop * fix error message. test=develop * fix trt register plugin based on pr#25003 * add trt dynload * fix deserialization bug of not finding plugin registration * refine code style * recover engine key in tensorrt_subgraph_pass * for ci coverage * add unittest for deserialization Co-authored-by: Nhaozech <chenhaoze94@gmail.com>
-
- 20 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* polish install error hint msg, test=develop * fix variable error, test=develop * polish hint messgae again
-
- 15 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop
-
- 09 7月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
-
由 Zhen Wang 提交于
-
- 07 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* refine PADDLE_ENFORCE test=develop
-
- 03 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* fix PADDLE_ENFORCE and refine the description test=develop
-
- 02 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* refactor dynamic dso search func, test=develop * polish details, test=develop * polish detail based review comments, test=develop * revert string type change, test=develop
-
- 24 6月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add default cudnn lib path, test=develop * change default path in func, test=develop * move to linux branch, test=develop * fix var error in other plat, test=develop
-
- 05 6月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* support selectedrows allreduce in multi-cards dygraph, test=develop * remove useless import modules in unittests, test=develop * add nccl cmake to get nccl version, test=develop * add if-condition to compiled correctly, test=develop * add detail version parseing for old nccl, test=develop * polish camke details, test=develop * fix remove test cmake error, test=develop * fix cmake condition, test=develop * change unittest camke list, test=develop * fix unittest cmake rule, test=develop, test=framep0
-
- 18 5月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the check for whether CUDA Driver and NVRTC is available for the runtime system. * Call cuInit to initialize the CUDA Driver API before all CUDA callings. test=develop * Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting. test=develop * Do not initialize CUDA Driver API for windows and macos. test=develop * Remove the call of cuInit when entering paddle and enable the test_code_generator. test=develop * Add some built-in functions for __half. test=develop * Change save_intermediate_out to false in unittest. test=develop * Fix error reference to tempropary variable when seting including path for device_code. test=develop
-
- 08 5月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
test=develop test=win_gpu
-
- 30 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Fix cusolver loader for Windows in dynamic_loader.cc. test=develop * Fix missing CUSOLVER_ROUTINE_EACH_R1. test=gpu test=develop * Add unsupprot for cusolver on Windows temporarily. test=develop * Fix GetCusolverDsoHandle error message. test=develop
-
- 27 4月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 24 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop
-
- 10 4月, 2020 2 次提交
-
-
由 littletomatodonkey 提交于
add addmm op
-
由 Tao Luo 提交于
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 01 12月, 2019 1 次提交
-
-
由 Jie Fang 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 danleifeng 提交于
Improve elementwise operators performance in same dimensions
-
- 28 9月, 2019 2 次提交
-
-
由 qingqing01 提交于
* How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.
-
由 liym27 提交于
* fix pool2d pool3d: 1. support asymmetric padding; 2. support padding algorithm:"SAME" and "VALID"; 3. support channel_last: data_format NHWC and NDHWC; 4. support inferring shape when input with negative dims in compile time; 5. change doc of python API and c++; 6. fix bug in cuda kernel when Attr(adaptive) is true. test=develop,test=document_preview * fix 'tensors' to 'Tensors'. test=develop,test=document_preview * add test for converage ValueError.test=develop,test=document_preview * resolve conflict in test_pool2d. test=develop
-
- 14 9月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
test=develop
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 02 9月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 20 8月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop
-
- 12 8月, 2019 1 次提交
-
-
由 wopeizl 提交于
* add tensorrt support for windows
-
- 05 8月, 2019 1 次提交
-
-
由 liuwei1031 提交于
* fix warpctc.dll not found issue, test=develop * revert the linux platform change, test=develop * delete warpctc_lib_path.h.in, test=develop * add SetPySitePackagePath function * fix warpctc.dylib not found issue on Mac, test=develop * improve the paddle lib path setting logic, test=develop * fix mac ci issue caused by test_warpctc_op unittest, test=develop * tweak code, test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
-
- 27 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
Also fix a dependency error which may cause compile error
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-