1. 05 1月, 2022 1 次提交
    • C
      implementation of broadcast div backward by reduce (#38044) · 55cd9cb8
      crystal 提交于
      * add elementwise div
      
      * move mul and div grad functor
      
      * Combine multiple CUDA kernels
      
      * Update the reduce interface call
      
      * add multi-output
      
      * add multi-output div
      
      * add branch judge
      
      * Package branch
      
      * Combine the x and y functions into one
      55cd9cb8
  2. 04 1月, 2022 1 次提交
  3. 31 12月, 2021 1 次提交
  4. 29 12月, 2021 1 次提交
  5. 28 12月, 2021 1 次提交
  6. 21 12月, 2021 1 次提交
  7. 20 12月, 2021 1 次提交
    • S
      Support FP16 for more ops (#38123) · 1f445bf3
      sneaxiy 提交于
      * support FP16 for more ops
      
      * add amp list tests
      
      * refine reduce_mean_grad
      
      * fix OP benchmark ci
      
      * fix fp16 reduce_mean
      
      * updat ut, but still have some problems
      
      * remove mean/reduce_mean fp16 kernel
      1f445bf3
  8. 18 12月, 2021 1 次提交
  9. 17 12月, 2021 1 次提交
  10. 16 12月, 2021 3 次提交
  11. 15 12月, 2021 1 次提交
  12. 09 12月, 2021 1 次提交
  13. 08 12月, 2021 2 次提交
  14. 03 12月, 2021 1 次提交
  15. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  16. 24 11月, 2021 1 次提交
    • Y
      elementwise_mul refactor (#37471) · c5e857d4
      YuanRisheng 提交于
      * elementwise_mul refactor
      
      * perfect code in test
      
      * delete redundant code
      
      * fix bugs when run test_multiply
      
      * adjust the location of macro
      
      * fix bugs when run ci
      c5e857d4
  17. 23 11月, 2021 1 次提交
  18. 22 11月, 2021 1 次提交
  19. 18 11月, 2021 1 次提交
    • Y
      [PTen]elementwise_sub kernel refactor (#37260) · 36a95654
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      
      * elementwise_sub refactor
      
      * add PD_DLL_DECL for elementwise_sub
      
      * fix bugs when compilei
      36a95654
  20. 17 11月, 2021 1 次提交
  21. 15 11月, 2021 1 次提交
  22. 12 11月, 2021 1 次提交
    • Y
      [Pten]Refactor the Elementwise_add Kernel (#37043) · c1310343
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      c1310343
  23. 02 11月, 2021 1 次提交
  24. 28 10月, 2021 1 次提交
  25. 27 10月, 2021 1 次提交
    • P
      Added fp32 / bf16 forward and backward elementwise_div_mkldnn operator (#36158) · e92e6b06
      piotrekobiIntel 提交于
      * Add WIP version of elementwise_div_mkldnn without working dy grad
      
      * Add dy gradient calculation implementation, disable broadcast tests
      
      * Readd removed tests from static_mode_white_list
      
      * Add bfloat16 gradient tests, remove int8 and uint8 support
      
      * - Change the way dy grad is calculated to improve performance
      - Refactor BinaryMKLDNNHandler to use a default parameter
      
      * Change copyright year
      
      * Refactor as suggested
      
      * Attempt to bypass CI Approval
      not accepting max_relative_error
      
      * Fix formatting issue
      e92e6b06
  26. 25 10月, 2021 1 次提交
  27. 22 10月, 2021 1 次提交
  28. 21 10月, 2021 2 次提交
    • J
      Add viterbi decode (#35778) · 6072aecb
      Jack Zhou 提交于
      * add viterbi decode cpu kernel
      
      * add viterbi decoder api in paddle.text
      
      * add a data buffer once to avoid create many small pieces of data buffer frequently
      
      * fix viterbi max_seq_length bug
      
      * fix seq_len=1 bug
      
      * fix device context
      
      * move split out of for loop
      
      * remove INVERSE_SUB
      
      * remove 2 GET_CAST_MASK
      
      * remove 1 loop
      
      * remove Functor
      
      * add to_static deploy code
      
      * use MAX_FUNC instead of ELE_MAX
      
      * add MaxFunctor
      
      * impl max_func
      
      * remove MaxFunctor
      
      * remove cast op
      
      * use REGISTER_OP_WITHOUT_GRADIENT
      
      * add viterbi cuda kernel
      
      * add FIX_BLOCKDIM_CASE macro
      
      * add MKL add, mul; add get data mask
      
      * add arange mkl impl
      
      * add CPU Argmax
      
      * add cpu gather
      
      * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL
      
      * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP
      
      * use SAME_DIMS_ELEMENT_BINARY_OP
      
      * add SimpleBroadcastBinaryOP
      
      * use int instead of int64_t to accelerate
      
      * optimize SimpleBroadcastBinaryOP
      
      * optimize SimpleBroadcastBinaryOP
      
      * optimize performance in both single thread and multithread situation
      
      * remove useless line
      
      * remove useless code
      
      * add CREATE_TENSOR_BUFFER macro
      
      * add INIT_REQUIRED_TENSOR macro
      
      * add comment
      
      * fix windows ci
      
      * add viterbi unittest
      
      * remove cuda add functor
      
      * remove cuda equal
      
      * remove a template function
      
      * fix windows ci
      
      * fix windows dtype
      
      * remove some template instance
      
      * remove useless header file
      
      * remove some blockdim
      
      * remove transpose impl
      
      * accelerate cpu performance on single thread situation
      
      * viterbi_decode->crf_decode
      
      * rename crf params name
      
      * add viterbi api test
      
      * remove useless import
      
      * add enable_static
      
      * use viterbi decoder
      
      * fix viterbi len=1
      
      * fix  viterbi unittest
      
      * remove useless comments
      
      * reconstruct viterbi decode
      
      * remove ADD,SUB,MUL structure
      
      * fix coverage
      
      * remove CREATE_TENSOR
      
      * add name args
      
      * crf.py->ops.py; with_start_stop_tag->include_start_end_tag
      
      * update crf_decode en docs
      
      * fix viterbi decode en docs
      
      * fix some review comments
      
      * add FIXED_BLOCK_DIM_CASE in cuda
      
      * push_back->emplace_back
      
      * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag
      
      * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode
      
      * fix viterbi_decode en docs
      6072aecb
    • N
      Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) · 921c0917
      niuliling123 提交于
      * Update the implement of reduceAnyKernel according to kernel primitive api
      * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
      921c0917
  29. 19 10月, 2021 1 次提交
  30. 18 10月, 2021 1 次提交
  31. 12 10月, 2021 1 次提交
  32. 29 9月, 2021 1 次提交
  33. 24 9月, 2021 1 次提交
    • P
      Added elementwise_sub_mkldnn operator (#35662) · 787273ed
      piotrekobiIntel 提交于
      * Add elementwise_sub_mkldnn_op without grad
      
      * Add test to static_mode_white_list
      
      * Refactor code, change license years
      
      * Remove invalid grad implementation
      
      * Fix element_wise_sub_op test
      
      * Fix CI Approval error
      
      * Remove unnecessary EltwiseSubMKLDNNGradKernel class
      
      * Fix CI Approval 2
      
      * Fix CI Approval 3
      
      * Fix CI Approval Attempt #4
      
      * Fix CI Approve Attempt #5
      
      * Fix CI Approval Attempt #6
      
      * Fix CI Approval Attemt #7
      
      * Change test names containing add to sub
      
      * Fix old tests testing add instead of sub
      
      * Copy grad implementation from elementwise_add_mkldnn
      
      * CI test fix attempt
      
      * Revert "CI test fix attempt"
      
      This reverts commit c647cacf41e6a87c715385a185de5cbf65fc8900.
      
      * Fix CI attempt 2
      
      * Fix elementwise_sub tests, temporary mkldnn broadcast test disable
      
      * Add working implementation of elementwise_sub grad
      
      * Fix build errors caused by pull
      
      * Fix format error
      
      * Fix format error 2
      
      * Disable elementwise_sub_mkldnn test on GPU
      
      * Apply fix for paddle.fluid import
      
      * Revert changes of test_elementwise_sub and Fix mkldnn test
      
      * Revert "Apply fix for paddle.fluid import"
      
      This reverts commit fc3b122fec8e12f2bcb32928a2685ba4d20fd742.
      
      * fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#35862)
      
      * Add changes suggested by reviewers
      
      * Change @unittest.skipIf... to @OpTestTool.skip_if_not_cpu_bf16() to satisfy Approval CI
      
      * Remove check_dygraph=False to satisify CI Approval
      Co-authored-by: Nzhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
      787273ed
  34. 23 9月, 2021 1 次提交
  35. 21 9月, 2021 1 次提交
  36. 18 9月, 2021 1 次提交