- 03 3月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
-
- 02 3月, 2020 2 次提交
-
-
由 wangchaochaohu 提交于
-
由 wangchaochaohu 提交于
* add profiler_help.h to refine the code test=develop
-
- 26 2月, 2020 1 次提交
-
-
由 Adam 提交于
-
- 25 2月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add framework overhead ratio, test=develop * print GpuMemcpy overhead, test=develop
-
- 24 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add support for the driver api callback and fix the profiler name show bug
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 21 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 19 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* fix the profile print error test=develop
-
- 18 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add python flag to control profile level test=develop
-
- 14 2月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
-
由 Chen Weihang 提交于
* reproduce match error, test=develop, test=document_fix * fix mismatch error, test=develop, test=document_fix
-
- 10 2月, 2020 1 次提交
-
-
由 Wilber 提交于
Compile without nccl deps. [1/2] Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 07 2月, 2020 1 次提交
-
-
由 LielinJiang 提交于
* optimize interpolate op, test=develop
-
- 06 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 31 1月, 2020 1 次提交
-
-
由 Michał Gallus 提交于
* Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* fix the bug of profile update test=develop
-
- 09 1月, 2020 3 次提交
-
-
由 石晓伟 提交于
-
由 Yiqun Liu 提交于
* Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop
-
由 wangchaochaohu 提交于
* add support for nested profiling event and printing in different level
-
- 08 1月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Zeng Jinle 提交于
-
- 07 1月, 2020 2 次提交
-
-
由 bingyanghuang 提交于
-
由 Chen Weihang 提交于
-
- 06 1月, 2020 3 次提交
-
-
由 Jacek Czaja 提交于
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
- 05 1月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 01 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 30 12月, 2019 2 次提交
-
-
由 Chen Weihang 提交于
-
由 Chen Weihang 提交于
-
- 15 12月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* rename paddle throw error macro, test=develop * fix new error use case, test=develop
-
- 10 12月, 2019 1 次提交
-
-
由 Adam 提交于
* MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop
-
- 06 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 05 12月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
As the title
-
由 wangchaochaohu 提交于
* fix profiler warning message in cpu profile mode test=develop
-
- 04 12月, 2019 1 次提交
-
-
由 Pei Yang 提交于
* make DisableGlogInfo able to mute all logs in inference.
-
- 03 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* add jeston compile support test=develop * refine the cmake test=develop
-