1. 10 1月, 2020 2 次提交
    • G
      Cherry pick from #21862 (#22194) · fa7ace7c
      Guo Sheng 提交于
      * Fix default label dim of label_smooth_op. test=develop (#21862)
      
      * Fix unit tests of label_smooth_op's data size.
      fa7ace7c
    • G
      [cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c
      GaoWei8 提交于
      * Optimize the kernel implementation of layernorm with openmp (#20895)
      
      * Add ernie c++ inference test (#21015)
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * remove ngraph
      
      * optimize gpu test
      test=develop
      
      * optimize codes
      test=develop
      
      * fix cmake fails on inference_download_and_uncompress (#21185)
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
      
      * Add fc padding to solve mkl performance
      test=develop
      
      * fix gpu pass and error information
      test=develop
      
      * fix fc_fuse_pass_test
      test=develop
      
      * fix error information
      test=develop
      
      * fix error information
      test=develop
      
      * fix name and add fc op padding test
      test=develop
      
      * fix attributes
      test=develop
      
      * optimize fc padding
      test=develop
      
      * fix test
      test=develop
      
      * Polish the codes of fc when needs padding (#21378)
      
      test=develop
      
      * Add ernie large c++ inference test (#21365)
      
      * add ernie-large test
      test=develop
      
      * add ernie large c++ inference test
      test=develop
      
      * Modify padding strategy: remove weight copy in fc padding (#21650)
      
      test=develop
      
      * optimize fc jit (#21878)
      
      test=develop
      Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
      3df38f5c
  2. 09 1月, 2020 2 次提交
  3. 08 1月, 2020 1 次提交
  4. 07 1月, 2020 3 次提交
  5. 09 12月, 2019 1 次提交
  6. 06 12月, 2019 3 次提交
    • B
      cherry-pick MKL-DNN NHWC FWD support fix (#21593) · 1f598dfa
      bingyanghuang 提交于
      1f598dfa
    • A
      f83254d6
    • Z
      CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5
      Zhaolong Xing 提交于
      * Fix TensorRT detection bug
      
      1. Add new search path for TensorRT at tensorrt.cmake
      2. Add better debug message
      3. Fix the bug of detection of TensorRT version
      
      In NVIDIA official docker image, TensorRT headers are located at
      `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
      at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
      fail to detect TensorRT.
      
      There is no debug/warning message to tell developer that TensorRT
      is failed to be detected.
      
      In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
      defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
      compatibility fix.
      
      * Fix TensorRT variables in CMake
      
      1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
      2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`
      
      Manually type path may locate incorrect path of TensorRT. Use the
      paths detected by system instead.
      
      * Fix TensorRT library path
      
      1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
      2. Fix TensorRT library path
      
      inference_lib.cmake and setup.py.in need the path of TensorRT library
      instead of the file of TensorRT library, so add new variable to fix it.
      
      * Add more general search rule for TensoRT
      
      Let system detect architecture instead of manually assign it, so
      replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.
      
      * Add more general search rule for TensorRT
      
      Remove duplicate search rules for TensorRT libraries. Use
      `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so
      
      test=release/1.6
      0a4002f5
  7. 05 12月, 2019 2 次提交
  8. 04 12月, 2019 4 次提交
  9. 03 12月, 2019 7 次提交
  10. 02 12月, 2019 1 次提交
    • T
      [cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0
      Thunderbrook 提交于
      * support dump param of model into afs (#20302)
      
      * support dump param to afs
      test=develop
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * dump param
      test=develop
      
      * dump param
      test=develop
      
      * dump param
      test=develop
      
      * dump param
      test=develop
      
      * find lookup table in order (#20932)
      
      test=develop
      
      * cherry-pick
      test=develop
      
      * solve pslib core in stop worker
      test=develop
      
      * print table stat info for pslib
      test=develop
      893ea7e0
  11. 29 11月, 2019 2 次提交
  12. 28 11月, 2019 1 次提交
    • X
      cherry-pick1.6 fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21339) · 072eb5b6
      xujiaqi01 提交于
      * fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)
      
      * fix cache table bug
      * add save_paddle_inference_model
      * fix hdfs util bug
      * test=develop
      
      * fix several sparse table issuses (#20686)
      
      * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
      * add find_distributed_lookup_table_grads instead of hard code GRAD
      * support embedding stop gradient. push sparse has error before fix this.* 
      * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
      * fix pull sparse, skip slots which do not have embedding.
      * fix collect feasign label info, skip slots which do not have embedding.
      * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
      * test=develop
      
      * add copy table (#21086)
      
      * copy some feasigns and corresponding embeddings from one sparse table to another
      * copy all feasigns and corresponding embeddings from one sparse table to another
      * copy all dense params from one table to another
      * copy some local vars to other local vars
      
      * fix fs_client_param bug (#21212)
      
      * fix fs_client_param bug, user can set this config through fleet_desc_file or fleet config
      * test=develop
      
      * fix fleet util bug (#21254)
      
      * fix fleet util bug in save paddle inference model
      * test=develop
      072eb5b6
  13. 26 11月, 2019 4 次提交
  14. 25 11月, 2019 3 次提交
    • L
      cherry-pick error info check of Print_op for release1.6 (#21349) · 9a98d11e
      lijianshe02 提交于
      * add input type and input data type check for Print_op test=develop (#21250)
      
      * add input type and input data type check for Print_op test=develop
      
      * cherry-pick error info check of Print_op for release1.6 test=develop
      
      * cherry-pick error info check of Print_op for release1.6 test=develop
      9a98d11e
    • Y
      fix bug of issue #21259 (#21331) · da9752fe
      Yi Liu 提交于
      * fix bug of issue #21259 (#21287)
      pass the argument `allow_out_of_range` of one_hot op to c++ back end.
      da9752fe
    • Z
      [cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720
      Zhang Ting 提交于
      * [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)
      
      * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview
      
      * fix the bug that attr(offsets) should be initialized, test=develop
      
      * [cherry-pick] maxout supports channel_last input (#20846)
      
      * maxout support channel_last input, test=develop
      
      * modified details of Input(X) and Attr(groups, axis) in doc, test=develop
      
      * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
      3848f720
  15. 23 11月, 2019 2 次提交
  16. 21 11月, 2019 1 次提交
    • L
      [cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation,... · 7ab85396
      liym27 提交于
      [cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation, _get_padding_with_SAME and conv2dtranspose_forward_naive. (#20997) (#21225)
      
      * fix bug in pool/conv/conv_transpose:
          1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation;
          2. fix bug of func  _get_padding_with_SAME in test_conv/conv_transpose_op.py;
          3. fix bug of the computation process in function conv2dtranspose_forward_naive.
          test=release/1.6
      7ab85396
  17. 14 11月, 2019 1 次提交