1. 26 10月, 2021 11 次提交
  2. 25 10月, 2021 9 次提交
  3. 24 10月, 2021 1 次提交
    • J
      Add viterbi decode (#35778) (#36615) · 1906c746
      Jack Zhou 提交于
      * add viterbi decode cpu kernel
      
      * add viterbi decoder api in paddle.text
      
      * add a data buffer once to avoid create many small pieces of data buffer frequently
      
      * fix viterbi max_seq_length bug
      
      * fix seq_len=1 bug
      
      * fix device context
      
      * move split out of for loop
      
      * remove INVERSE_SUB
      
      * remove 2 GET_CAST_MASK
      
      * remove 1 loop
      
      * remove Functor
      
      * add to_static deploy code
      
      * use MAX_FUNC instead of ELE_MAX
      
      * add MaxFunctor
      
      * impl max_func
      
      * remove MaxFunctor
      
      * remove cast op
      
      * use REGISTER_OP_WITHOUT_GRADIENT
      
      * add viterbi cuda kernel
      
      * add FIX_BLOCKDIM_CASE macro
      
      * add MKL add, mul; add get data mask
      
      * add arange mkl impl
      
      * add CPU Argmax
      
      * add cpu gather
      
      * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL
      
      * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP
      
      * use SAME_DIMS_ELEMENT_BINARY_OP
      
      * add SimpleBroadcastBinaryOP
      
      * use int instead of int64_t to accelerate
      
      * optimize SimpleBroadcastBinaryOP
      
      * optimize SimpleBroadcastBinaryOP
      
      * optimize performance in both single thread and multithread situation
      
      * remove useless line
      
      * remove useless code
      
      * add CREATE_TENSOR_BUFFER macro
      
      * add INIT_REQUIRED_TENSOR macro
      
      * add comment
      
      * fix windows ci
      
      * add viterbi unittest
      
      * remove cuda add functor
      
      * remove cuda equal
      
      * remove a template function
      
      * fix windows ci
      
      * fix windows dtype
      
      * remove some template instance
      
      * remove useless header file
      
      * remove some blockdim
      
      * remove transpose impl
      
      * accelerate cpu performance on single thread situation
      
      * viterbi_decode->crf_decode
      
      * rename crf params name
      
      * add viterbi api test
      
      * remove useless import
      
      * add enable_static
      
      * use viterbi decoder
      
      * fix viterbi len=1
      
      * fix  viterbi unittest
      
      * remove useless comments
      
      * reconstruct viterbi decode
      
      * remove ADD,SUB,MUL structure
      
      * fix coverage
      
      * remove CREATE_TENSOR
      
      * add name args
      
      * crf.py->ops.py; with_start_stop_tag->include_start_end_tag
      
      * update crf_decode en docs
      
      * fix viterbi decode en docs
      
      * fix some review comments
      
      * add FIXED_BLOCK_DIM_CASE in cuda
      
      * push_back->emplace_back
      
      * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag
      
      * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode
      
      * fix viterbi_decode en docs
      1906c746
  4. 21 10月, 2021 2 次提交
  5. 20 10月, 2021 2 次提交
  6. 19 10月, 2021 3 次提交
    • L
      [cherry-pick]Add sparse attention cherrypick (#36447) · 36edb0e1
      Liu-xiandong 提交于
          The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.
      
          The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.
      
          The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.
      36edb0e1
    • C
      quant support matmul_v2 (#36469) (#36499) · b8167ed2
      ceci3 提交于
      * quant support matmul_v2
      
      * fix format
      b8167ed2
    • S
      Add operators for async read & async write (#36333) (#36501) · d65f8af8
      Siming Dai 提交于
      * fix async_read bug
      
      * change index place to cpu
      
      * add tensor size judge
      
      * add async_read & async_write test
      
      * fix bug in async_write
      
      * fix mac py3 ci
      
      * fix bug for cpu version paddle
      
      * fix windows ci bug
      
      * change input argument error type
      
      * change const_cast to mutable_data
      
      * add async_write out-of-bound check and consumate error hint
      
      * fix a small bug for dst_tensor
      
      * add docs and refine codes
      
      * refine docs
      
      * notest,test=windows_ci
      
      * fix windows ci
      
      * fix require
      
      * fix code-block
      
      * add core.is_compiled_with_cuda()
      d65f8af8
  7. 18 10月, 2021 1 次提交
  8. 15 10月, 2021 2 次提交
  9. 14 10月, 2021 1 次提交
  10. 13 10月, 2021 3 次提交
  11. 12 10月, 2021 1 次提交
  12. 11 10月, 2021 2 次提交
  13. 30 9月, 2021 2 次提交