1. 10 1月, 2020 1 次提交
    • G
      [cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c
      GaoWei8 提交于
      * Optimize the kernel implementation of layernorm with openmp (#20895)
      
      * Add ernie c++ inference test (#21015)
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * remove ngraph
      
      * optimize gpu test
      test=develop
      
      * optimize codes
      test=develop
      
      * fix cmake fails on inference_download_and_uncompress (#21185)
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
      
      * Add fc padding to solve mkl performance
      test=develop
      
      * fix gpu pass and error information
      test=develop
      
      * fix fc_fuse_pass_test
      test=develop
      
      * fix error information
      test=develop
      
      * fix error information
      test=develop
      
      * fix name and add fc op padding test
      test=develop
      
      * fix attributes
      test=develop
      
      * optimize fc padding
      test=develop
      
      * fix test
      test=develop
      
      * Polish the codes of fc when needs padding (#21378)
      
      test=develop
      
      * Add ernie large c++ inference test (#21365)
      
      * add ernie-large test
      test=develop
      
      * add ernie large c++ inference test
      test=develop
      
      * Modify padding strategy: remove weight copy in fc padding (#21650)
      
      test=develop
      
      * optimize fc jit (#21878)
      
      test=develop
      Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
      3df38f5c
  2. 29 10月, 2019 1 次提交
    • C
      [Cherry-pick to 1.6] Block part of "tensor should not be null" error message (#20845) · d29e9aa4
      Chen Weihang 提交于
      * Add IndicateVarDataType interface to block tensor is not initialized problem in OP GetExceptedKernelType (#20044)
      
      * add indicate_var_data_type inferface, test=develop
      
      * add unittests & polish error message, test=develop
      
      * remove needless include, test=develop
      
      * extract public function & polish message, test=develop
      
      * delete empty var check, test=develop
      
      * change data_type to pointer parameter, test=develop
      
      * polish details, test=develop
      
      * Replace risky GetInputType method with secure IndicateVarDataType interface (#20668)
      
      * replace part of the old implementation, test=develop
      
      * restore concat op, test=develop
      
      * update all ops implemention & delete GetDataTypeOfVar func, test=develop
      
      test=release/1.6
      d29e9aa4
  3. 16 9月, 2019 1 次提交
    • Y
      Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      
      * Enhance fc_fuse_pass to enable fusing relu.
      
      * Allow print the shapes of var_desc in graph.
      test=develop
      
      * Enhance fc_fuse_pass_tester.
      
      * Remove the use of PADDLE_ENFORCE.
      test=develop
      
      * Correct the number of ops after fusing.
      test=develop
      
      * Fix a typo.
      test=develop
      
      * Set activation_type to null when there is no relu in fc.
      test=develop
      
      * Refine fc_fuse_pass's codes.
      
      * Enable the set of shape for tensor.
      
      * Refine repeated_fc_relu_pass and add unittest.
      test=develop
      c67c8758
  4. 11 9月, 2019 1 次提交
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  5. 19 3月, 2019 1 次提交
  6. 18 3月, 2019 1 次提交
  7. 15 3月, 2019 1 次提交
  8. 19 2月, 2019 1 次提交
  9. 18 12月, 2018 1 次提交
  10. 12 12月, 2018 1 次提交
  11. 14 11月, 2018 2 次提交
  12. 21 8月, 2018 1 次提交
  13. 16 8月, 2018 2 次提交
  14. 14 8月, 2018 3 次提交
  15. 13 8月, 2018 1 次提交
  16. 07 6月, 2018 1 次提交
    • M
      Mkldnn layout (#11040) · 3ff9ba0e
      mozga-intel 提交于
      * Add MKLDNN layout support in Paddle
      
      Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
      can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
      is hardcode to be used in all MKLDNN op kernels. As a result,
      non-optimized execution path is selected in MKLDNN primitive which
      bring worse performance.
      Besides framework change, three MKLDNN OP kernels were updated
      for using new MKLDNN layout. They are conv/pool2d/batch_norm.
      Other MKLDNN OP kernels need be also updated in similar way to
      achieve best performance.
      
      * Add MKLDNN layout support in activation OP
      
      * Don't populate layout from input to output when kMKLDNN in
      
      * Refine pool mkldnn op kernel
      
      * MKLDNN layout
      
      * Remove the inferitance from tensor file
      
      * MKLDNN layout: refactoring
      
      * Remove additional #define to register new operator
      
      * Prepare mkldnn tests to work with layout
      3ff9ba0e
  17. 08 5月, 2018 1 次提交
    • Y
      Clean OpProtoAndCheckerMaker · 0e78cb69
      Yu Yang 提交于
      Do not use ctor
      
      * Reduce line of codes.
      * We can use virtual function for Maker now.
      * The implementation does not care what maker holds, it is easier to
      refactor later.
      0e78cb69
  18. 19 4月, 2018 1 次提交
  19. 17 4月, 2018 1 次提交
  20. 03 4月, 2018 3 次提交