- 16 1月, 2020 1 次提交
-
-
由 Wilber 提交于
* [cherry-pick] fluid-lite subgraph resnet50 test. test=develop test=release/1.7 * modify lite commit id. test=develop test=release/1.7
-
- 15 1月, 2020 1 次提交
-
-
由 zhouwei25 提交于
[cherry-pick] faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11(#22230)
-
- 14 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
-
- 13 1月, 2020 1 次提交
-
-
由 Wilber 提交于
[cherry-pick] #22191 - 添加了fluid-lite子图方式运行resnet的单测 - 修改了依赖Lite的git commit id
-
- 10 1月, 2020 1 次提交
-
-
由 baojun 提交于
-
- 09 1月, 2020 2 次提交
- 06 1月, 2020 1 次提交
-
-
由 Adam 提交于
-
- 04 1月, 2020 1 次提交
-
-
由 Adam 提交于
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 26 12月, 2019 3 次提交
- 25 12月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 24 12月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 16 12月, 2019 2 次提交
- 12 12月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 11 12月, 2019 1 次提交
-
-
由 baojun 提交于
-
- 10 12月, 2019 1 次提交
-
-
由 Adam 提交于
* MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop
-
- 09 12月, 2019 1 次提交
-
-
由 Leo Chen 提交于
* dygraph_grad_maker supports varbase without grad_var, test=develop * fix compile, test=develop * fix test_tracer, test=develop * follow comments, test=develop
-
- 05 12月, 2019 1 次提交
-
-
由 Leo Chen 提交于
* test=develop, fix docker with paddle nccl problem * don't expose numerous Tensor.set(), test=develop * fix condition, test=develop * fix float16 bug, test=develop * feed should be Tensor or np.array, not Variable or number, test=develop * use forcecast to copy numpy slice to new array, test=develop * remove float16-uint16 hacking, test=develop * add variable method to varbase and refactor to_variable to support return varbase * support kwargs in varbase constructor * add VarBase constructor to support default python args * refine varbase initial method * reset branch * fix ut for change VarBase error info to PaddleEnforce * cherry is parameter change before * overload isinstance to replace too many change of is_variable * rm useless files * rm useless code merged by git * test=develop, fix some ut failed error * test=develop, fix test_graph_wrapper * add some tests, test=develop * refine __getitem__, test=develop * add tests, test=develop * fix err_msg, test=develop
-
- 04 12月, 2019 1 次提交
-
-
由 silingtong123 提交于
* modify the repo address of eigen and warpctc * fix the eigen not work on windows * fix the eigen and warpctc can't recompile
-
- 03 12月, 2019 2 次提交
-
-
由 Zhaolong Xing 提交于
* add jeston compile support test=develop * refine the cmake test=develop
-
- 02 12月, 2019 2 次提交
-
-
由 gongweibao 提交于
-
由 Zhaolong Xing 提交于
test=develop
-
- 30 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 28 11月, 2019 2 次提交
- 27 11月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
* Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop
-
- 26 11月, 2019 2 次提交
-
-
由 Tao Luo 提交于
* make CUDA_ARCH_NAME default Auto test=develop * refine warning test=develop
-
由 silingtong123 提交于
-
- 25 11月, 2019 2 次提交
- 20 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* make Docker to gcc 8.2, test=develop * add -std=c11 to grpc.cmake, test=develop
-
- 19 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 18 11月, 2019 3 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
由 zhouwei25 提交于
fix bug when build openblas with a computer that has installed openblas before,test=develop (#21160)
-
由 Jeng Bai-Cheng 提交于
* Fix TensorRT detection bug 1. Add new search path for TensorRT at tensorrt.cmake 2. Add better debug message 3. Fix the bug of detection of TensorRT version In NVIDIA official docker image, TensorRT headers are located at `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will fail to detect TensorRT. There is no debug/warning message to tell developer that TensorRT is failed to be detected. In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is defined at `NvInferVersion.h` instead of `NvInfer.h`, so add compatibility fix. * Fix TensorRT variables in CMake 1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}` 2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}` Manually type path may locate incorrect path of TensorRT. Use the paths detected by system instead. * Fix TensorRT library path 1. Add new variable - `${TENSORRT_LIBRARY_DIR}` 2. Fix TensorRT library path inference_lib.cmake and setup.py.in need the path of TensorRT library instead of the file of TensorRT library, so add new variable to fix it. * Add more general search rule for TensoRT Let system detect architecture instead of manually assign it, so replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`. * Add more general search rule for TensorRT Remove duplicate search rules for TensorRT libraries. Use `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so test=develop
-