1. 11 9月, 2019 2 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  2. 05 9月, 2019 3 次提交
  3. 04 9月, 2019 1 次提交
  4. 03 9月, 2019 2 次提交
  5. 02 9月, 2019 1 次提交
  6. 29 8月, 2019 1 次提交
  7. 20 8月, 2019 1 次提交
  8. 19 8月, 2019 1 次提交
  9. 01 8月, 2019 1 次提交
  10. 24 7月, 2019 1 次提交
    • B
      Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60
      Bob Zhu 提交于
      * extend matmul op to support multiple head multiplication
      
      With the support of multiple head, the multiplication of two big matrixes is
      split into multiplication of several (head_number) small matrixes. e.g. if
      Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
      as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
      [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
      220eef60
  11. 28 6月, 2019 1 次提交
  12. 25 6月, 2019 1 次提交
    • H
      Sequence mask support tensor (#18249) · df2eee71
      Hongyu Liu 提交于
      * sequnce mask support max length tensor input; test=develop
      
      * add rnn_impl.py; test=develop
      
      * add basic gru lstm unittest; test=develop
      
      * fix api spec; test=develop
      
      * fix sequence_mask op bug;
      test=develop
      test=document_preview
      
      * change +-*x to elmentwise_op; test=develop
      
      * add mkl flag; test=develop
      
      * fix rnn impl bug; test=develop
      
      * update api spec; test=develop
      
      * fix doc bug; test=develop
      
      * fix lstm bugs; test=develop
      df2eee71
  13. 14 6月, 2019 1 次提交
  14. 12 6月, 2019 1 次提交
  15. 10 6月, 2019 1 次提交
    • Y
      Enable seq_pool op to accept len 0 input (#17284) · 33d1e565
      Yibing Liu 提交于
      * Enable seq_pool op to accept len 0 input
      
      test=develop
      
      * Update sequence_pool's api
      
      test=develop
      
      * Add more unittest cases for seq_pool op
      
      test=develop
      
      * Remove legacy comments
      
      test=develop
      
      * Don't use template in op maker
      
      test=develop
      33d1e565
  16. 30 5月, 2019 1 次提交
  17. 29 5月, 2019 1 次提交
    • Y
      Optimize the concat and split kernel for specical cases when the number of... · 5782ddda
      Yiqun Liu 提交于
      Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)
      
      * Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2.
      test=develop
      
      * Refine codes.
      test=develop
      
      * Correct the condition.
      test=develop
      
      * Move the define of tmp_data outside the if statement.
      
      * Print the cudnn minor version.
      test=develop
      
      * Fix the case when in_num/o_num is 1 in concat/split op.
      test=develop
      
      * Remove const_cast.
      test=develop
      5782ddda
  18. 24 5月, 2019 1 次提交
    • T
      [CPU] refine cpu softmax bwd (#17534) · 7ae461eb
      tensor-tang 提交于
      * refine softmax fwd
      
      test=develop
      
      * refine cpu softmax bwd
      
      test=develop
      
      * fix batch size
      
      test=develop
      
      * fix compile issue with gpu
      
      test=develop
      
      * add value clip
      7ae461eb
  19. 23 5月, 2019 1 次提交
  20. 21 5月, 2019 1 次提交
  21. 16 5月, 2019 1 次提交
  22. 15 5月, 2019 1 次提交
  23. 10 5月, 2019 1 次提交
  24. 07 5月, 2019 1 次提交
    • K
      Softmax_cross_entropy op add axis (#16806) · a71d8fdb
      Kaipeng Deng 提交于
      * add attr axis infershape. test=develop
      
      * add CUDA kernel. test=develop
      
      * fix unittest. test=develop
      
      * fix unittest for soft_label. test=develop
      
      * fix fp16 unittest. test=develop
      
      * remove comment code. test=develop
      
      * refine test for axis. test=develop
      
      * add python api. test=develop
      
      * fix doc. test=develop
      
      * fix fp16 unittest. test=develop
      
      * fix ngraph test. test=develop
      
      * fix ENFORCE for test_imperative_transformer. test=develop
      
      * fit for ngraph test. test=develop
      
      * fix after rebase develop. test=develop
      
      * fix doc. test=develop
      
      * fix API.spec. test=develop
      
      * fix test_layers. test=develop
      
      * fix format. test=develop
      a71d8fdb
  25. 20 4月, 2019 1 次提交
  26. 17 4月, 2019 1 次提交
    • K
      fix overflow by int32 mul test=develop (#16794) · c474e7dd
      Kevin 提交于
      * fix overflow by int32 mul test=develop
      
      * fix reference nullptr
      
      * fix codestyle test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify . to -> test=develop
      c474e7dd
  27. 12 4月, 2019 3 次提交
  28. 25 3月, 2019 1 次提交
  29. 20 3月, 2019 2 次提交
  30. 18 3月, 2019 2 次提交
  31. 14 3月, 2019 2 次提交