1. 06 12月, 2019 1 次提交
  2. 05 12月, 2019 1 次提交
  3. 02 12月, 2019 1 次提交
  4. 25 11月, 2019 1 次提交
    • Z
      [cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720
      Zhang Ting 提交于
      * [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)
      
      * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview
      
      * fix the bug that attr(offsets) should be initialized, test=develop
      
      * [cherry-pick] maxout supports channel_last input (#20846)
      
      * maxout support channel_last input, test=develop
      
      * modified details of Input(X) and Attr(groups, axis) in doc, test=develop
      
      * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
      3848f720
  5. 21 11月, 2019 1 次提交
    • C
      Cherry-pick error type support for release1.6 (#21294) · 974b8a83
      Chen Weihang 提交于
      * delete paddle infershape enforce marco (#20832)
      
      * Polish and arrange code in enforce.h (#20901)
      
      * Enrich the type of error and declare the error type interfaces (#21024)
      
      * Enrich the type of error and declare the error type interfaces, test=develop
      
      * adjust tests to adapt new form, test=develop
      
      * add inference deps with error_codes.pb.h, test=develop
      
      * restore stack iter start pos, test=develop
      
      * polish code based review comments, test=develop
      
      * Add dependency for error_codes.proto (#21084)
      
      * fix activation_functions deps, test=develop, test=document_fix
      
      * add error_codes_proto deps, test=develop, test=document_fix
      
      * try delete enforce.h, test=develop, test=document_fix
      
      * change cuda enforce & add example (#21142)
      test=release/1.6
      974b8a83
  6. 31 10月, 2019 2 次提交
  7. 16 10月, 2019 1 次提交
  8. 14 10月, 2019 1 次提交
  9. 10 10月, 2019 1 次提交
  10. 08 10月, 2019 1 次提交
  11. 03 10月, 2019 1 次提交
    • L
      fix conv2d and conv3d: (#20042) (#20121) · 2faa38cd
      liym27 提交于
      1.support asymmetric padding;
      2.support padding algorithm:"SAME" and "VALID";
      3.support channel_last: data_format NHWC and NDHWC;
      4.change doc of python API and c++;
      
      test=release/1.6
      2faa38cd
  12. 01 10月, 2019 1 次提交
  13. 28 9月, 2019 1 次提交
    • L
      fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472
      liym27 提交于
      * fix pool2d pool3d:
      1. support asymmetric padding;
      2. support padding algorithm:"SAME" and "VALID";
      3. support channel_last: data_format NHWC and NDHWC;
      4. support inferring shape when input with negative dims in compile time;
      5. change doc of python API and c++;
      6. fix bug in cuda kernel when Attr(adaptive) is true.
      
      test=develop,test=document_preview
      
      * fix 'tensors' to 'Tensors'. test=develop,test=document_preview
      
      * add test for converage ValueError.test=develop,test=document_preview
      
      * resolve conflict in test_pool2d. test=develop
      24010472
  14. 27 9月, 2019 1 次提交
  15. 25 9月, 2019 1 次提交
    • B
      add support of matmul with multiple head even different width and height (#19708) · c670058a
      Bob Zhu 提交于
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * refactor the code of matmul with multiple head even different width and height
      
      test=develop
      c670058a
  16. 23 9月, 2019 1 次提交
  17. 20 9月, 2019 1 次提交
  18. 16 9月, 2019 1 次提交
  19. 11 9月, 2019 2 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  20. 05 9月, 2019 3 次提交
  21. 04 9月, 2019 1 次提交
  22. 03 9月, 2019 2 次提交
  23. 02 9月, 2019 1 次提交
  24. 29 8月, 2019 1 次提交
  25. 20 8月, 2019 1 次提交
  26. 19 8月, 2019 1 次提交
  27. 01 8月, 2019 1 次提交
  28. 24 7月, 2019 1 次提交
    • B
      Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60
      Bob Zhu 提交于
      * extend matmul op to support multiple head multiplication
      
      With the support of multiple head, the multiplication of two big matrixes is
      split into multiplication of several (head_number) small matrixes. e.g. if
      Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
      as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
      [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
      220eef60
  29. 28 6月, 2019 1 次提交
  30. 25 6月, 2019 1 次提交
    • H
      Sequence mask support tensor (#18249) · df2eee71
      Hongyu Liu 提交于
      * sequnce mask support max length tensor input; test=develop
      
      * add rnn_impl.py; test=develop
      
      * add basic gru lstm unittest; test=develop
      
      * fix api spec; test=develop
      
      * fix sequence_mask op bug;
      test=develop
      test=document_preview
      
      * change +-*x to elmentwise_op; test=develop
      
      * add mkl flag; test=develop
      
      * fix rnn impl bug; test=develop
      
      * update api spec; test=develop
      
      * fix doc bug; test=develop
      
      * fix lstm bugs; test=develop
      df2eee71
  31. 14 6月, 2019 1 次提交
  32. 12 6月, 2019 1 次提交
  33. 10 6月, 2019 1 次提交
    • Y
      Enable seq_pool op to accept len 0 input (#17284) · 33d1e565
      Yibing Liu 提交于
      * Enable seq_pool op to accept len 0 input
      
      test=develop
      
      * Update sequence_pool's api
      
      test=develop
      
      * Add more unittest cases for seq_pool op
      
      test=develop
      
      * Remove legacy comments
      
      test=develop
      
      * Don't use template in op maker
      
      test=develop
      33d1e565
  34. 30 5月, 2019 1 次提交
  35. 29 5月, 2019 1 次提交
    • Y
      Optimize the concat and split kernel for specical cases when the number of... · 5782ddda
      Yiqun Liu 提交于
      Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)
      
      * Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2.
      test=develop
      
      * Refine codes.
      test=develop
      
      * Correct the condition.
      test=develop
      
      * Move the define of tmp_data outside the if statement.
      
      * Print the cudnn minor version.
      test=develop
      
      * Fix the case when in_num/o_num is 1 in concat/split op.
      test=develop
      
      * Remove const_cast.
      test=develop
      5782ddda