1. 10 1月, 2020 3 次提交
    • L
      fix xception precision problem (#22188) · c7248cda
      liu zhengxi 提交于
      c7248cda
    • G
      [cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c
      GaoWei8 提交于
      * Optimize the kernel implementation of layernorm with openmp (#20895)
      
      * Add ernie c++ inference test (#21015)
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * remove ngraph
      
      * optimize gpu test
      test=develop
      
      * optimize codes
      test=develop
      
      * fix cmake fails on inference_download_and_uncompress (#21185)
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
      
      * Add fc padding to solve mkl performance
      test=develop
      
      * fix gpu pass and error information
      test=develop
      
      * fix fc_fuse_pass_test
      test=develop
      
      * fix error information
      test=develop
      
      * fix error information
      test=develop
      
      * fix name and add fc op padding test
      test=develop
      
      * fix attributes
      test=develop
      
      * optimize fc padding
      test=develop
      
      * fix test
      test=develop
      
      * Polish the codes of fc when needs padding (#21378)
      
      test=develop
      
      * Add ernie large c++ inference test (#21365)
      
      * add ernie-large test
      test=develop
      
      * add ernie large c++ inference test
      test=develop
      
      * Modify padding strategy: remove weight copy in fc padding (#21650)
      
      test=develop
      
      * optimize fc jit (#21878)
      
      test=develop
      Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
      3df38f5c
    • fix multi-thread error of fc_gru_fuse_pass.cc, test=develop (#21841) (#22185) · e8e12499
      石晓伟 提交于
      * fix multi-thread error of fc_gru_fuse_pass.cc, test=develop
      
      * export FLAGS and GLOG symbols, test=develop
      e8e12499
  2. 09 1月, 2020 3 次提交
  3. 08 1月, 2020 2 次提交
  4. 07 1月, 2020 4 次提交
  5. 16 12月, 2019 1 次提交
  6. 09 12月, 2019 2 次提交
  7. 08 12月, 2019 1 次提交
  8. 06 12月, 2019 4 次提交
    • B
      cherry-pick MKL-DNN NHWC FWD support fix (#21593) · 1f598dfa
      bingyanghuang 提交于
      1f598dfa
    • A
      f83254d6
    • e228e707
    • Z
      CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5
      Zhaolong Xing 提交于
      * Fix TensorRT detection bug
      
      1. Add new search path for TensorRT at tensorrt.cmake
      2. Add better debug message
      3. Fix the bug of detection of TensorRT version
      
      In NVIDIA official docker image, TensorRT headers are located at
      `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
      at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
      fail to detect TensorRT.
      
      There is no debug/warning message to tell developer that TensorRT
      is failed to be detected.
      
      In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
      defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
      compatibility fix.
      
      * Fix TensorRT variables in CMake
      
      1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
      2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`
      
      Manually type path may locate incorrect path of TensorRT. Use the
      paths detected by system instead.
      
      * Fix TensorRT library path
      
      1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
      2. Fix TensorRT library path
      
      inference_lib.cmake and setup.py.in need the path of TensorRT library
      instead of the file of TensorRT library, so add new variable to fix it.
      
      * Add more general search rule for TensoRT
      
      Let system detect architecture instead of manually assign it, so
      replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.
      
      * Add more general search rule for TensorRT
      
      Remove duplicate search rules for TensorRT libraries. Use
      `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so
      
      test=release/1.6
      0a4002f5
  9. 05 12月, 2019 5 次提交
  10. 04 12月, 2019 6 次提交
  11. 03 12月, 2019 9 次提交