1. 10 1月, 2020 1 次提交
    • G
      [cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c
      GaoWei8 提交于
      * Optimize the kernel implementation of layernorm with openmp (#20895)
      
      * Add ernie c++ inference test (#21015)
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * Add ernie unit test
      test=develop
      
      * remove ngraph
      
      * optimize gpu test
      test=develop
      
      * optimize codes
      test=develop
      
      * fix cmake fails on inference_download_and_uncompress (#21185)
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * solve cmake fails on inference_download_and_uncompress
      test=develop
      
      * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
      
      * Add fc padding to solve mkl performance
      test=develop
      
      * fix gpu pass and error information
      test=develop
      
      * fix fc_fuse_pass_test
      test=develop
      
      * fix error information
      test=develop
      
      * fix error information
      test=develop
      
      * fix name and add fc op padding test
      test=develop
      
      * fix attributes
      test=develop
      
      * optimize fc padding
      test=develop
      
      * fix test
      test=develop
      
      * Polish the codes of fc when needs padding (#21378)
      
      test=develop
      
      * Add ernie large c++ inference test (#21365)
      
      * add ernie-large test
      test=develop
      
      * add ernie large c++ inference test
      test=develop
      
      * Modify padding strategy: remove weight copy in fc padding (#21650)
      
      test=develop
      
      * optimize fc jit (#21878)
      
      test=develop
      Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
      3df38f5c
  2. 06 12月, 2019 1 次提交
  3. 05 12月, 2019 1 次提交
  4. 02 12月, 2019 1 次提交
  5. 25 11月, 2019 1 次提交
    • Z
      [cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720
      Zhang Ting 提交于
      * [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)
      
      * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview
      
      * fix the bug that attr(offsets) should be initialized, test=develop
      
      * [cherry-pick] maxout supports channel_last input (#20846)
      
      * maxout support channel_last input, test=develop
      
      * modified details of Input(X) and Attr(groups, axis) in doc, test=develop
      
      * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
      3848f720
  6. 21 11月, 2019 1 次提交
    • C
      Cherry-pick error type support for release1.6 (#21294) · 974b8a83
      Chen Weihang 提交于
      * delete paddle infershape enforce marco (#20832)
      
      * Polish and arrange code in enforce.h (#20901)
      
      * Enrich the type of error and declare the error type interfaces (#21024)
      
      * Enrich the type of error and declare the error type interfaces, test=develop
      
      * adjust tests to adapt new form, test=develop
      
      * add inference deps with error_codes.pb.h, test=develop
      
      * restore stack iter start pos, test=develop
      
      * polish code based review comments, test=develop
      
      * Add dependency for error_codes.proto (#21084)
      
      * fix activation_functions deps, test=develop, test=document_fix
      
      * add error_codes_proto deps, test=develop, test=document_fix
      
      * try delete enforce.h, test=develop, test=document_fix
      
      * change cuda enforce & add example (#21142)
      test=release/1.6
      974b8a83
  7. 31 10月, 2019 2 次提交
  8. 16 10月, 2019 1 次提交
  9. 14 10月, 2019 1 次提交
  10. 10 10月, 2019 1 次提交
  11. 08 10月, 2019 1 次提交
  12. 03 10月, 2019 1 次提交
    • L
      fix conv2d and conv3d: (#20042) (#20121) · 2faa38cd
      liym27 提交于
      1.support asymmetric padding;
      2.support padding algorithm:"SAME" and "VALID";
      3.support channel_last: data_format NHWC and NDHWC;
      4.change doc of python API and c++;
      
      test=release/1.6
      2faa38cd
  13. 01 10月, 2019 1 次提交
  14. 28 9月, 2019 1 次提交
    • L
      fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472
      liym27 提交于
      * fix pool2d pool3d:
      1. support asymmetric padding;
      2. support padding algorithm:"SAME" and "VALID";
      3. support channel_last: data_format NHWC and NDHWC;
      4. support inferring shape when input with negative dims in compile time;
      5. change doc of python API and c++;
      6. fix bug in cuda kernel when Attr(adaptive) is true.
      
      test=develop,test=document_preview
      
      * fix 'tensors' to 'Tensors'. test=develop,test=document_preview
      
      * add test for converage ValueError.test=develop,test=document_preview
      
      * resolve conflict in test_pool2d. test=develop
      24010472
  15. 27 9月, 2019 1 次提交
  16. 25 9月, 2019 1 次提交
    • B
      add support of matmul with multiple head even different width and height (#19708) · c670058a
      Bob Zhu 提交于
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * refactor the code of matmul with multiple head even different width and height
      
      test=develop
      c670058a
  17. 23 9月, 2019 1 次提交
  18. 20 9月, 2019 1 次提交
  19. 16 9月, 2019 1 次提交
  20. 11 9月, 2019 2 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  21. 05 9月, 2019 3 次提交
  22. 04 9月, 2019 1 次提交
  23. 03 9月, 2019 2 次提交
  24. 02 9月, 2019 1 次提交
  25. 29 8月, 2019 1 次提交
  26. 20 8月, 2019 1 次提交
  27. 19 8月, 2019 1 次提交
  28. 01 8月, 2019 1 次提交
  29. 24 7月, 2019 1 次提交
    • B
      Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60
      Bob Zhu 提交于
      * extend matmul op to support multiple head multiplication
      
      With the support of multiple head, the multiplication of two big matrixes is
      split into multiplication of several (head_number) small matrixes. e.g. if
      Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
      as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
      [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
      220eef60
  30. 28 6月, 2019 1 次提交
  31. 25 6月, 2019 1 次提交
    • H
      Sequence mask support tensor (#18249) · df2eee71
      Hongyu Liu 提交于
      * sequnce mask support max length tensor input; test=develop
      
      * add rnn_impl.py; test=develop
      
      * add basic gru lstm unittest; test=develop
      
      * fix api spec; test=develop
      
      * fix sequence_mask op bug;
      test=develop
      test=document_preview
      
      * change +-*x to elmentwise_op; test=develop
      
      * add mkl flag; test=develop
      
      * fix rnn impl bug; test=develop
      
      * update api spec; test=develop
      
      * fix doc bug; test=develop
      
      * fix lstm bugs; test=develop
      df2eee71
  32. 14 6月, 2019 1 次提交
  33. 12 6月, 2019 1 次提交
  34. 10 6月, 2019 1 次提交
    • Y
      Enable seq_pool op to accept len 0 input (#17284) · 33d1e565
      Yibing Liu 提交于
      * Enable seq_pool op to accept len 0 input
      
      test=develop
      
      * Update sequence_pool's api
      
      test=develop
      
      * Add more unittest cases for seq_pool op
      
      test=develop
      
      * Remove legacy comments
      
      test=develop
      
      * Don't use template in op maker
      
      test=develop
      33d1e565
  35. 30 5月, 2019 1 次提交