1. 27 10月, 2021 1 次提交
  2. 26 10月, 2021 2 次提交
    • J
      fix build permute_test.cpp (#6608) · 686ac9e8
      Juncheng 提交于
      686ac9e8
    • Z
      Dev Batch Permute (#6441) · bca2e098
      ZZK 提交于
      * dev torch style permute kernel
      
      * Refine
      
      * fix batch permute launch condition
      
      * fix batch permute dispatch logic
      
      * remove redundant header file
      
      * simplified check logic
      
      * use permute primitives in transpose kernels
      
      * fix batch permute logic and avoid mod
      
      * remove redundant templates
      
      * fix grid step
      
      * add grid for loop to avoid the elementnum is too large
      
      * fix bug when hw is not divided by tile size
      
      * refine format
      
      * add a copy kernel as a baseline
      
      * remove annotation
      
      * add copy kernel
      
      * add sync
      
      * use batch permute for profile
      
      * add copy tile baseline
      
      * simplify params for copy kernel
      
      * add slow copy kernel
      
      * use mul to instead mod and remove copy
      
      * use movement size = 4 when h w is modify by 2
      
      * Add temp process for half2
      
      * add half2 specialized kernel
      
      * remove redundant license
      
      * simplified code
      
      * fix format
      
      * fix comment
      
      * fix comment
      
      * use bad for loop condition
      
      * merge half2 in load
      
      * fix bad for loop in batch permute
      
      * refine
      
      * use align storage
      
      * refine
      
      * fix comment
      
      * fix comment
      
      * fix format
      
      * add const and remove redundant header file
      
      * remove register macro
      
      * refine cuda code
      
      * fix guoran comment
      
      * fix format
      
      * fix some details
      
      * remove cuda graph
      
      * fix for 0d tensor
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      bca2e098
  3. 23 10月, 2021 1 次提交
  4. 22 10月, 2021 1 次提交
  5. 21 10月, 2021 4 次提交
  6. 19 10月, 2021 1 次提交
  7. 15 10月, 2021 1 次提交
  8. 08 10月, 2021 1 次提交
  9. 01 10月, 2021 1 次提交
  10. 30 9月, 2021 2 次提交
  11. 27 9月, 2021 1 次提交
    • J
      PermutePrimitive (#6390) · 30bca281
      Juncheng 提交于
      * PermutePrimitive
      
      * refine
      
      * refine
      
      * Refine movement size (#6417)
      
      * refine movement size
      
      * fix
      
      * refine
      
      * refine
      30bca281
  12. 23 9月, 2021 1 次提交
  13. 18 9月, 2021 2 次提交
  14. 17 9月, 2021 1 次提交
  15. 13 9月, 2021 1 次提交
  16. 11 9月, 2021 1 次提交
  17. 09 9月, 2021 1 次提交
  18. 08 9月, 2021 1 次提交
  19. 07 9月, 2021 1 次提交