提交 · 6af4e70bd2e7c4ad691551e639f11283b435279a · Oneflow-Inc / oneflow

28 10月, 2021 7 次提交

由 Shenghang Tsai 提交于 10月 28, 2021

* add todo

* refine

* add attr

* refine

* refine

* add todo

* refine

* add alias c1 for check-oneflow

* fix

* update scripts

* refine

* fix single client env reinit

* add attr

* save and pass mlir module

* fix

* restore module in kernel

* lower in kernel

* refien

* add scf to std

* update lit

* fmt

* add all passes

* add alisas

* refein

* refein

* add check

* fix pass order

* add TODO

* refein

* create jit exe

* refein

* fix arity

* add check and rpint err

* refein

* refein

* refein

* refein

* refein

* refein

* emiit c

* working

* revert

* add err print

* e2e works

* refein

* refein

* refein

* use STATIC_SWITCH_FUNC

* add log

* rename

* use invoke packed

* refein

* add todo

* refein

* rm log

* fix

* refein

* rm

* refein

* add scf to gpu

* add cmake flag for cuda runner

* add CMAKE_CUDA_COMPILER

* refine

* refien

* register gpu kernel

* refein

* add gpu passes

* refein

* add

* refine

* add ptx to cubin pass

* produce cubin

* add gpu to llvm pass

* refein

* add log

* refien

* link mlir cuda runtime lib

* add note

* make gpu runner available in file check

* rm unused

* add to prevent break

* fix with cuda

* edit mlir by hand to have it run on cuda

* rm useless

* add todo

* upgrade llvm

* refein m,irror scripts

* fix for llvm upgrade

* refein cmake

* fix

* fix for llvm upgrade

* remove unused headers

* refeine

* refein

* refactor

* add

* refine

* refine

* cmake first class cuda support

* refine

* refine

* refein

* refine

* refine

* refine

* refein

* add todo

* refine

* pass shared lib path from py

* prevent redef ONEFLOW_CMAKE_BUILD_TYPE

* refine msg

* fix fmt

* fix fmt

* fix fmt

* refine

* refueb

* fix

* refactor jit function outline

* refein

* rm debug log

* rm unnecessary erase

* use 75

* refein

* add allowFoldingUnitDimReshapes

* refine

* Outline JIT func (#6542)

* check in pass impl

* add test

* check in changes

* add todo

* extract func to create attrs

* refine

* refine and mv bert

* refein LLVM_EXTERNAL_LIT

* refine log user_op::AttrValueUtil::ToCppAttrValue

* fix for nd_sbp

* refine log

* fix warnings

* fix

* leverage input_order and output_order

* save lbn_segment_keys as input output order

* refine

* refein

* add CUDATOOLKIT_BIN_ROOT

* finish todo

* finish todo

* finish todo

* add matmul

* rm repetitive code

* add log

* add unary

* add gather

* refine and add gelu

* fix loc

* add mlir conv op (#6559)

* add mlir conv op

* fix conv2d tabelgen bug

* fix merge compile error

* fix comments

* Update mlir-cuda-75.cmake

* add mlir resnet50 test

* add SI32ArrayAttr
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>

* backport refactoring of translation

* Add resnet50 mlir dialect part ops (#6607)

* add scalar math ops tablegen

* add pool ops

* add bias_add op

* fix comment

* fix comment

* code format

* add reshape op

* add reduce ops and restruct scalar math ops

* fix bug

* fix typo

* address review

* address review

* rm loggin

* address review

* rm logging

* backport variable rename

* add flag ONEFLOW_MLIR_ENABLE_FUSERS
Co-authored-by: NXiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

6af4e70b

G
fix input_op infer_nd_sbp (#6642) · 0de1979d
由 guo ran 提交于 10月 28, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
0de1979d

fix typo PopulateOpAttribute (#6641) · 4bee2ddd

由 Shenghang Tsai 提交于 10月 28, 2021

* use git to clean dir

* rm useless to trigger CI

* trigger CI

* refine

* refine

* refine

* refine

* fix typo PopulateOpAttribute

4bee2ddd

fix heap leak (#6636) · bcb16f85

由 liufengwei0103 提交于 10月 28, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

bcb16f85

L

remove inplace add (#6628) · f74be394
由 Luyang 提交于 10月 28, 2021

f74be394

Feat autograd function impl (#6593) · e96a5259

由 Yinggang Wang 提交于 10月 28, 2021

* feat(autograd.Function): add base class define

* format

* feat(autograd.Function): cache FunctionOpExpr in AutogradFunctionBase
                         and pass autograd.Function name to cpp

* feat(autograd.Function): wrapper PyFunction to FType

* fix(autograd.Function): fix wrapper function capture bug

* feat(autograd.Function): support autograd.Function backward

* feat(autograd.Function): refine apply return value

* fix(autograd.Function): fix autograd.Function name bug

* feat(autograd.Function): refine ctx python api

* feat(*): refine apply interface

* test(autograd.Function): fix ctx interface and add test

* feat(autograd.Function): support mark_non_differentiable

* align ctx.saved_tensors interface

* docs(autograd.Function): export documentation

* refine function names

* refine interface

* use py::args instead of py::object

* refine code

* fix(*): fix `func_name` variable conflict with CHECK_JUST

* feat(autograd.Function): support static call

* docs(autograd.Function): update documentation

* refine code

* add JUST
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e96a5259

J
Interface primitive::BroadcastElementwiseBinary (#6629) · 1cbefd2d
由 Juncheng 提交于 10月 28, 2021
```
* Interface primitive::BroadcastElementwiseBinary

* refine
Co-authored-by: Nguo ran <360112263@qq.com>
```
1cbefd2d

27 10月, 2021 4 次提交

support cpu allreduce (#6627) · b9fd067e

由 daquexian 提交于 10月 27, 2021

Signed-off-by: Ndaquexian <daquexian566@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

b9fd067e

Matmul kernels use primitive (#6589) · 81086cff

由 Juncheng 提交于 10月 27, 2021

* Matmul kernels use primitive

* refine

* fix
Co-authored-by: Nguo ran <360112263@qq.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

81086cff

fix_naive_eager_boxing_checker (#6624) · a633be98
由 qq_22305325 提交于 10月 27, 2021
```
* fix_naive_eager_boxing_checker

* refine eager_p_to_b_kerne and eager_p_to_s_kernel
```
a633be98

adding logical_not operator (#6497) · f21ff9e1

由 Zhanghuihong Guan 提交于 10月 27, 2021

* initial commit for adding logical_not operator

* adding logical_not op, debugging Dtype related problems

* finished testing locally, need to add tests

* added tests

* added docs and formatted code

* format file

* format file

* remove python wrapper

* modification based on review

* remove redundant code, format file

* modifications based on reviews

* modifications based on review

* fix duplicate license info

* fix docstring

* fix docstring warning
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f21ff9e1

26 10月, 2021 8 次提交

J

fix build permute_test.cpp (#6608) · 686ac9e8
由 Juncheng 提交于 10月 26, 2021

686ac9e8

Fix Prelu SBP (#6619) · 921bddc2

由 ZZK 提交于 10月 26, 2021

* fix sbp for prelu

* auto format by CI
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

921bddc2

Imporve roll op speed (#6618) · eb7d2a01

由 Liang Depeng 提交于 10月 26, 2021

* imporve roll speed

* imporve speed of len(dims) > 1 cases
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

eb7d2a01

Dev Batch Permute (#6441) · bca2e098

由 ZZK 提交于 10月 26, 2021

* dev torch style permute kernel

* Refine

* fix batch permute launch condition

* fix batch permute dispatch logic

* remove redundant header file

* simplified check logic

* use permute primitives in transpose kernels

* fix batch permute logic and avoid mod

* remove redundant templates

* fix grid step

* add grid for loop to avoid the elementnum is too large

* fix bug when hw is not divided by tile size

* refine format

* add a copy kernel as a baseline

* remove annotation

* add copy kernel

* add sync

* use batch permute for profile

* add copy tile baseline

* simplify params for copy kernel

* add slow copy kernel

* use mul to instead mod and remove copy

* use movement size = 4 when h w is modify by 2

* Add temp process for half2

* add half2 specialized kernel

* remove redundant license

* simplified code

* fix format

* fix comment

* fix comment

* use bad for loop condition

* merge half2 in load

* fix bad for loop in batch permute

* refine

* use align storage

* refine

* fix comment

* fix comment

* fix format

* add const and remove redundant header file

* remove register macro

* refine cuda code

* fix guoran comment

* fix format

* fix some details

* remove cuda graph

* fix for 0d tensor
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

bca2e098

c++ standard: bump version to 14 (#6252) · 38a3746d

由 Twice 提交于 10月 26, 2021

* c++ standard: bump to 14

* remove cplusplus_14.h & use cxx14

* fix python test

* fix .clang-format
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

38a3746d

support create opt in graph (#6598) · 9ed5706f

由 Xiaoyu Xu 提交于 10月 26, 2021

* support create opt in graph

* add comment
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

9ed5706f

cudnn add bfloat16 (#6615) · a07c5ade

由 guo ran 提交于 10月 26, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

a07c5ade

fix_memory_leak (#6614) · 432efc64
由 qq_22305325 提交于 10月 26, 2021
```
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
```
432efc64

25 10月, 2021 4 次提交

J

CudaDriverGetPrimaryCtxActive (#6604) · 51fd48a2
由 Juncheng 提交于 10月 25, 2021

51fd48a2
L
faster vm response for instructions send by main thread (#6609) · f7b8bb8a
由 Li Xinqi 提交于 10月 25, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
f7b8bb8a

add roll op (#6573) · a1ddc4ca

由 Liang Depeng 提交于 10月 25, 2021

* add roll op

* imporve speed

* improve speed when len(dims) == 1

* move some logic to C++

* fix static analysis error

* refine doc

* add roll doc

* refine codes according to review comments

* remove runcudakernel macro
Co-authored-by: NZZK <42901638+MARD1NO@users.noreply.github.com>

a1ddc4ca

Reduce usage intrusive macro (#6543) · 225eec2b

由 Li Xinqi 提交于 10月 25, 2021

* remove most usage of macros INTRUSIVE_*

* rename most INTRUSVE_XXX macros to REFLECTIVE_XXX

* move intrusive::Base to intrusive/base.h

* 1) remove OFFSET_STRUCT_FIELD; 2) mv test cases of HeadFreeList into head_free_list_test.cpp
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

225eec2b

23 10月, 2021 7 次提交

refactor vm preschedule (#6579) · e6540a60

由 Li Xinqi 提交于 10月 23, 2021

* refactor vm preschedule

* TryMoveFromWaitingToReady

* revert flying_instruction_cnt

* revert to single position to call DispatchInstruction

* revert several code

* remove is_xxx_hook_empty
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e6540a60

Add pip index keys (#6606) · cd53d282

由 XIE Xuan 提交于 10月 23, 2021

* add commit based pip index.html

* use provision for test

* roll back to release nodes
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

cd53d282

async copy data between blob and python object (#6562) · a728a17a

由 Houjiang Chen 提交于 10月 23, 2021

* Async access blob and copy data between python object, and fix deadlock.

* revert

* fix

* refine code style

* optimize treat single as tuple

* Fix tensor numpy api.

* adapt interface to compatiblility test

* auto format by CI

* refine

* Back up the numpy array when copy data from array to tensor async

* fix pybind blob api

* Make sure array is C-style contiguous.

* decrease ref

* fixup

* Move foreign lock helper base into core/common.

* Release GIL before call SpinWaitUntilTimeout
Co-authored-by: NZhanghuihong <garfield.gzhh@gmail.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

a728a17a

fix device 0 ctx mem (#6605) · 130932d2

由 guo ran 提交于 10月 23, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

130932d2

Refine functors with sequence function (#6567) · 390ddf38

由 Peihong Liu 提交于 10月 23, 2021

* refine sequence_function.h

* refine nn_functor with sequence_function

* refine activation_functor with sequence_function

* refine generator

* refine

* add thne_if

* refine array_functor with sequence_function

* refine

* refine reduce grad funcs with sequence_function

* remove GET_GENERATOR
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

390ddf38

add norm、vector_norm、matrix_norm form python to c++ and add tripletMarginLoss (#5965) · 582c3d9f

由 tangnana925 提交于 10月 23, 2021

* add test file at first

* add tripletMarginLoss py code

* module ok

* add  forward test

* amend test code

* delete import torch

* add autotest ok

* delete numpy test code

* amend docstring

* amend loss.py, delete None

* API transfer to C++

* motify module

* delete cout

* delete cout

* Submit some modified code first

* submit vector_norm functor

* matrix norm

* Refine max/min functor (#6359)

merge to dev_tripletMarginLoss

* replace reducemax and reducemin

* amend code error

* motify code

* delete norm2

* delete print

* delete norm2

* delete print

* motify review code

* add assert to c++

* motify review code

* add else

* motify review problem

* add code

* add test code

* motify code delete dim_check

* delete norm.py code

* delete print

* delete print

* delete pu norm

* delete error code

* motify docsting

* auto format by CI

* delete no use num_dims

* delete import torch lib

* delete CI bug code

* motify clip_grad_norm_ resolve autotest bug

* auto format by CI

* motify loss docstring

* motify norm docstring
Co-authored-by: NZhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>

582c3d9f

Fix SimplifyPermutation (#6600) · 3bcd09da

由 Juncheng 提交于 10月 23, 2021

* Fix SimplifyPermutation

* fix

* fix typo

* fix

* add test

* fix init
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

3bcd09da

22 10月, 2021 9 次提交

create cuda context before copying tensor. (#6590) · 37605556

由 Zhanghuihong Guan 提交于 10月 22, 2021

* initalizes cuda context in kernel of copy

* debugging

* add call once

* remove redundant code

* changes based on review

* delete redundant code

* fix clang compile error
Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

37605556

D
relax speed threshold, make the fail fatal (#6602) · 2c0e7513
由 daquexian 提交于 10月 22, 2021
```
Signed-off-by: Ndaquexian <daquexian566@gmail.com>
```
2c0e7513
G
softmax return cudaError_t (#6596) · 0f89f3b4
由 guo ran 提交于 10月 22, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
0f89f3b4

Accelerate scalar math cud (#6599) · 3dd71f33

由 Shijie 提交于 10月 22, 2021

* fix typo

* use cuda elementwise
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

3dd71f33

dev masked fill (#6588) · 09c3a781

由 Shijie 提交于 10月 22, 2021

* dev masked fill

* refine

* make static_check happy
Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

09c3a781

Remove CopyElem (#6591) · 5640fe80

由 Juncheng 提交于 10月 22, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

5640fe80

Interface primitive::softmax (#6594) · 8d35f739

由 guo ran 提交于 10月 22, 2021

* interface primitive::softmax

* refine
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

8d35f739

fix (#6592) · c0e7e119

由 Houjiang Chen 提交于 10月 22, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

c0e7e119

L

refine docs (#6578) · e3810240
由 Luyang 提交于 10月 22, 2021

e3810240

21 10月, 2021 1 次提交
- J
  Interface primitive::ElementwiseUnary (#6586) · 857e3f0b
  由 Juncheng 提交于 10月 21, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
  857e3f0b

Oneflow-Inc / oneflow 上一次同步 2 年多

Oneflow-Inc / oneflow
上一次同步 2 年多