1. 30 11月, 2019 1 次提交
  2. 28 11月, 2019 2 次提交
  3. 27 11月, 2019 1 次提交
    • M
      INT8 Fully-connected (#17641) · 5d7d5482
      Michał Gallus 提交于
      * Implement Int8 FC
      
      * Integrate FC into INT8v2
      
      test=develop
      
      * int8 FC: transpose weights before computing scales
      
      test=develop
      
      * Add support for activation_type string in FC
      
      test=develop
      
      * Disable MKL-DNN's FC in VGG16 and 19
      
      test=develop
      
      * Disable FC quantization when mkldnn FC is disabled
      
      test=develop
      
      * Solve PADDLE_ENFORCES in FC int8
      
      * Fix Paddle enforces and remove const cast
      
      test=develop
      
      * Fix style changes
      
      test=develop
      
      * Fix quantizer_tester test and add fc quantization
      
      test=develop
      
      * Fix FC test fail on CUDA
      
      * Remove unnecessary log from quantize placement pass
      
      test=develop
      
      * Add Thread ID to FC hash key
      
      test=develop
      
      * Add comments to MKL-DNN FC Kernel
      
      test=develop
      
      * Refactor quantizer
      
      test=develop
      
      * Fix linter issues
      
      test=develop
      
      * Fix crash in slim googlenet
      
      test=develop
      
      * Fix PADDLE_ENFORCE messages
      
      test=develop
      5d7d5482
  4. 26 11月, 2019 2 次提交
  5. 25 11月, 2019 2 次提交
  6. 20 11月, 2019 1 次提交
  7. 19 11月, 2019 1 次提交
  8. 18 11月, 2019 3 次提交
    • Z
      Fix warn of gcc8 (#21205) · cdb3d279
      Zeng Jinle 提交于
      * fix warnings oof gcc 8 compilation, test=develop
      
      * fix boost::bad_get, test=develop
      
      * refine PADDLE_ENFORCE, test=develop
      cdb3d279
    • Z
      fix bug when build openblas with a computer that has installed openblas... · 5d821578
      zhouwei25 提交于
      fix bug when build openblas with a computer that has installed openblas before,test=develop (#21160)
      
      5d821578
    • J
      Better TensorRT support (#20858) · 330b173c
      Jeng Bai-Cheng 提交于
      * Fix TensorRT detection bug
      
      1. Add new search path for TensorRT at tensorrt.cmake
      2. Add better debug message
      3. Fix the bug of detection of TensorRT version
      
      In NVIDIA official docker image, TensorRT headers are located at
      `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
      at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
      fail to detect TensorRT.
      
      There is no debug/warning message to tell developer that TensorRT
      is failed to be detected.
      
      In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
      defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
      compatibility fix.
      
      * Fix TensorRT variables in CMake
      
      1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
      2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`
      
      Manually type path may locate incorrect path of TensorRT. Use the
      paths detected by system instead.
      
      * Fix TensorRT library path
      
      1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
      2. Fix TensorRT library path
      
      inference_lib.cmake and setup.py.in need the path of TensorRT library
      instead of the file of TensorRT library, so add new variable to fix it.
      
      * Add more general search rule for TensoRT
      
      Let system detect architecture instead of manually assign it, so
      replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.
      
      * Add more general search rule for TensorRT
      
      Remove duplicate search rules for TensorRT libraries. Use
      `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so
      
      test=develop
      330b173c
  9. 12 11月, 2019 1 次提交
  10. 11 11月, 2019 1 次提交
  11. 08 11月, 2019 3 次提交
  12. 05 11月, 2019 1 次提交
    • Z
      Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5
      Zeng Jinle 提交于
      * support no need buffer vars in dygraph, test=develop
      
      * fix inference compilation error, test=develop
      
      * update no_need_buffer_vars_inference, test=develop
      
      * add unittests for no_need_buffer_vars_context, test=develop
      
      * refine no_need_buffer_vars by return ref, test=develop
      
      * polish some codes, test=develop
      878a40f5
  13. 04 11月, 2019 1 次提交
  14. 31 10月, 2019 2 次提交
    • H
      GradMaker for dygraph (#19706) · 8c4573a3
      hong 提交于
      * refactor dygraph,test=develop
      
      * fix failed unittest,test=develop
      
      * polish code,test=develop
      
      * check windows ci error,test=develop
      try to fix windows ci error by np.allclose,test=develop
      
      * polish vlog and profiler, test=develop
      
      * try to fix preceding ops order,test=develop
      
      * test transformer in windows ci, test=develop
      
      * use python c-api to speed up tracer.trace,test=develop
      
      * test=develop, fix docker with paddle nccl problem
      
      * test=develop, add ut for debug string and gradient_accumulator
      
      * test=develop, add tests for layer/gradient_accumulator/prepared_op
      
      * test=develop, fix complie error for test_prepared_op
      
      * test=develop, add more ut for dygraph
      
      * test=develop, create API.spec for dygraph api change
      
      * optimize grad maker; test=develop
      
      * optimize grad maker
      
      * test
      
      * grad make optim; test=develop
      
      * fix unittest bugs; test=develop
      
      * add dygraph grad op maker and split_op
      
      * grad op maker refactor; test=develop
      
      * add dygraph grad maker; test=develop
      
      * fix op deformable_conv_v1_op bug; test=develop
      
      * fix deformable_conv prroi pool bugs;
      
      * fix new op grad op maker bug; test=develop
      
      * fix split by ref bug; test=develop
      
      * fix dygraph auto prune bug; test=develop
      
      * fix test_trace bug; test=develop
      
      * fix fused emb seq pool bug; test=develop
      
      * remove useless code in op_desc file; test=develop
      
      * remove useless code, StrVarBaseNode; test=develop
      
      * fix review issues; test=develop
      
      * fix rank_loss grad maker; test=develop
      
      * remove flag in VarBase; test=develop
      
      * fix distributed_notify_op compile bug ; test=develop
      
      * fix reshape op double grad; test=develop
      
      * fix expand as op; test=develop
      
      * add impertive type_defs.h for demo_train; test=develop
      
      * fix inference lib cmake; test=develop
      
      * fix inference lib; test=develop
      
      * fix infernce_lib; test=develop
      
      * fix inference cmake; test=develop
      
      * fix inference lib; test=develop
      
      * fix inference lib; test=develop
      
      * remove condition dygraph grad maker, modify local name; test=develop
      
      * fix split grad maker bug; test=develop
      
      * fix pyramid_op bug; test=develop
      
      * change travis time out limit; test=develop
      
      * restore travis; test=develop
      
      * change timeout limit; test=develop
      8c4573a3
    • Z
      Integration of third_party compilation structure (#20887) · b7417610
      zhouwei25 提交于
      b7417610
  15. 28 10月, 2019 1 次提交
  16. 22 10月, 2019 1 次提交
  17. 18 10月, 2019 2 次提交
  18. 15 10月, 2019 1 次提交
  19. 14 10月, 2019 1 次提交
    • 6
      Dlpack support (#20039) · 12e4be03
      633WHU 提交于
      * support dlpack to tensor and implement python interface test=develop
      
      * add unittest for _to_dlpack and from_dlpack test=develop
      12e4be03
  20. 07 10月, 2019 1 次提交
  21. 02 10月, 2019 1 次提交
  22. 29 9月, 2019 1 次提交
    • L
      fix conv2d and conv3d: (#20042) · 3aa331d9
      liym27 提交于
      1.support asymmetric padding;
          2.support padding algorithm:"SAME" and "VALID";
          3.support channel_last: data_format NHWC and NDHWC;
          4.change doc of python API and c++;
      
          test=develop, test=document_preview
      3aa331d9
  23. 27 9月, 2019 1 次提交
    • update operator compatible info, test=develop (#19978) · 01b9d079
      石晓伟 提交于
      * update operator compatible info, test=develop
      
      * revert cmake/version.cmake, test=develop
      
      * add unit_tests and fix bugs, test=develop
      
      * update ../paddle/fluid/framework/framework.proto, test=develop
      
      * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop
      
      * update paddle/fluid/framework/version_test.cc, test=develop
      
      * add comments and rename interfaces, test=develop
      01b9d079
  24. 20 9月, 2019 1 次提交
  25. 19 9月, 2019 1 次提交
    • Y
      Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6
      Yiqun Liu 提交于
      * Add fc_elementwise_layernorm_fuse pass and unittest.
      
      * Add fused_fc_elementwise_layernorm op and its GPU kernel.
      test=develop
      
      * Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
      
      * Add the setting of attrs in the definition of binary_op.
      test=develop
      
      * Add comment.
      
      * Implement the unittest.
      test=develop
      
      * Change the unittest name of layer_norm.
      test=develop
      3cd985a6
  26. 17 9月, 2019 1 次提交
  27. 16 9月, 2019 1 次提交
  28. 11 9月, 2019 2 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  29. 10 9月, 2019 1 次提交
  30. 07 9月, 2019 1 次提交