- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 31 1月, 2020 1 次提交
-
-
由 Michał Gallus 提交于
* Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* fix the bug of profile update test=develop
-
- 09 1月, 2020 3 次提交
-
-
由 石晓伟 提交于
-
由 Yiqun Liu 提交于
* Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop
-
由 wangchaochaohu 提交于
* add support for nested profiling event and printing in different level
-
- 08 1月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Zeng Jinle 提交于
-
- 07 1月, 2020 2 次提交
-
-
由 bingyanghuang 提交于
-
由 Chen Weihang 提交于
-
- 06 1月, 2020 3 次提交
-
-
由 Jacek Czaja 提交于
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
- 05 1月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 01 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 30 12月, 2019 2 次提交
-
-
由 Chen Weihang 提交于
-
由 Chen Weihang 提交于
-
- 15 12月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* rename paddle throw error macro, test=develop * fix new error use case, test=develop
-
- 10 12月, 2019 1 次提交
-
-
由 Adam 提交于
* MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop
-
- 06 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 05 12月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
As the title
-
由 wangchaochaohu 提交于
* fix profiler warning message in cpu profile mode test=develop
-
- 04 12月, 2019 1 次提交
-
-
由 Pei Yang 提交于
* make DisableGlogInfo able to mute all logs in inference.
-
- 03 12月, 2019 2 次提交
-
-
由 Zhaolong Xing 提交于
* add jeston compile support test=develop * refine the cmake test=develop
-
由 Huihuang Zheng 提交于
Add warning message when initialize GLOG failed
-
- 02 12月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop
-
- 01 12月, 2019 1 次提交
-
-
由 Jie Fang 提交于
-
- 29 11月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 28 11月, 2019 2 次提交
-
-
由 wangchaochaohu 提交于
* fix profile api high version test=develop
-
由 wangchaochaohu 提交于
-
- 25 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 24 11月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 18 11月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
由 liuwei1031 提交于
cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows
-
- 14 11月, 2019 2 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Chen Weihang 提交于
-
- 13 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
-
- 12 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* simplify C++ error stack by rewrite Place, test=develop * polish assignment overload func, test=develop
-
- 08 11月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdb, reversing changes made to 2ce6473f. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd7. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop
-