oneflow/core/primitive/common/permute.h · bca2e098a14af3eaf08134fc300f128c57d00690 · Oneflow-Inc / oneflow

由 ZZK 提交于 10月 26, 2021

* dev torch style permute kernel

* Refine

* fix batch permute launch condition

* fix batch permute dispatch logic

* remove redundant header file

* simplified check logic

* use permute primitives in transpose kernels

* fix batch permute logic and avoid mod

* remove redundant templates

* fix grid step

* add grid for loop to avoid the elementnum is too large

* fix bug when hw is not divided by tile size

* refine format

* add a copy kernel as a baseline

* remove annotation

* add copy kernel

* add sync

* use batch permute for profile

* add copy tile baseline

* simplify params for copy kernel

* add slow copy kernel

* use mul to instead mod and remove copy

* use movement size = 4 when h w is modify by 2

* Add temp process for half2

* add half2 specialized kernel

* remove redundant license

* simplified code

* fix format

* fix comment

* fix comment

* use bad for loop condition

* merge half2 in load

* fix bad for loop in batch permute

* refine

* use align storage

* refine

* fix comment

* fix comment

* fix format

* add const and remove redundant header file

* remove register macro

* refine cuda code

* fix guoran comment

* fix format

* fix some details

* remove cuda graph

* fix for 0d tensor
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

bca2e098

permute.h 9.4 KB

Oneflow-Inc / oneflow 上一次同步 2 年多

Replace permute.h

Oneflow-Inc / oneflow
上一次同步 2 年多