提交 · dev_fix_broadcast_binary_ops_grad_func · Oneflow-Inc / oneflow

02 11月, 2021 7 次提交
- H
  Merge branch 'dev_fix_broadcast_binary_ops_grad_func' of... · ebfc7a5a
  由 hjchen2 提交于 11月 02, 2021
```
Merge branch 'dev_fix_broadcast_binary_ops_grad_func' of https://github.com/Oneflow-Inc/oneflow into dev_fix_broadcast_binary_ops_grad_func
```
  ebfc7a5a
- H
  
  remove unused comment · 285c13d6
  由 hjchen2 提交于 11月 02, 2021
  
  285c13d6
- H
  
  Merge branch 'master' into dev_fix_broadcast_binary_ops_grad_func · 5ce6c8fe
  由 Houjiang Chen 提交于 11月 02, 2021
  
  5ce6c8fe
- H
  
  Reduce memory usage for broadcast binary ops backward · 926666c3
  由 hjchen2 提交于 11月 02, 2021
  
  926666c3
- X
  adjust GILForeignLockHelper order to avoid glog print to stderr (#6671) · 55d32c33
  由 Xiaoyu Xu 提交于 11月 02, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
  55d32c33
- Z
  Fix model update pass adam (#6673) · 22473860
  由 ZZK 提交于 11月 02, 2021
```
* add first version of unary primitive op

* fix

* remove redundant file

* Revert

* fix format

* use has input to check
```
  22473860
- L
  restruct reshape gradient funcs (#6634) · 8b94ac9b
  由 Luyang 提交于 11月 02, 2021
```
* restruct

* refine
```
  8b94ac9b
01 11月, 2021 7 次提交

T
just macro: rename local variables to prevent shadowing (#6667) · 21caffd9
由 Twice 提交于 11月 01, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
21caffd9

update speed test threshold (#6664) · f88c979a

由 daquexian 提交于 11月 01, 2021

Signed-off-by: Ndaquexian <daquexian566@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f88c979a

migrate parital fc op from lazy to functor (#6387) · 2e96920b

由 Yao Chi 提交于 11月 01, 2021

* migrate partial_fc

* add test and fix DistributedPariticalFCSample release bug

* fix typos in functional_api.yaml

* initialization

* refine testcase

* skip cpu-only test

* reformat
Co-authored-by: Nbbuf <1182563586@qq.com>
Co-authored-by: NXiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: NYinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

2e96920b

S

add --inplace (#6661) · 808bf377
由 Shenghang Tsai 提交于 11月 01, 2021

808bf377

Optimize Dropout (#6640) · 0abf1a5d

由 ZZK 提交于 11月 01, 2021

* init

* use cuda elementwise template

* Add half version

* try to use different rand init

* try to use copynd primitives to improve efficiency

* use copynd primitive in narrow backward

* remove redundant templates

* revert

* use elementwise templates

* fix comment

* Remove useless header file

* rename copynd_primitive to copy_nd_primitive

* fix format

0abf1a5d

optional: add monadic operations `map`, `bind` and `or_else` (#6652) · 2f5344c6

由 Twice 提交于 11月 01, 2021

* c++ standard: bump to 14

* remove cplusplus_14.h & use cxx14

* optional: add monadic operations `map`, `bind` and `or_else`

* remove useless header

* fix or_else

* fix operator!=

* fix type

* format
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

2f5344c6

Change maybe to optional (#6611) · 380d2414

由 Zhanghuihong Guan 提交于 11月 01, 2021

* initial commit, add code for async construct tensor from numpy array

* inital commit to change Maybe to Optional

* delete redundant code

* replace Maybe with Optional

* fix compile errors

* format code

* changes based on review

* format code, fix based on review

* format code

* fix multiclient type

* changes based on review

* changes based on review

* unify calling to IsMultiClirnt

* refector multi_client related code

* restore InMultiClient interface

* double check for unnecessary changes

* remove unnecessary changes

* format code

* Update oneflow/api/python/symbol/job_conf_symbol.cpp

* Update oneflow/api/python/symbol/op_conf_symbol.cpp

* Update oneflow/api/python/symbol/op_node_signature_symbol.cpp

* Update oneflow/core/common/optional.h

* Update oneflow/api/python/symbol/string_symbol.cpp

* Update oneflow/api/python/symbol/scope_symbol.cpp

* Update oneflow/api/python/symbol/placement_symbol.cpp

* Update oneflow/api/python/symbol/op_conf_symbol.cpp
Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: NTwice <i@twice.moe>

380d2414

30 10月, 2021 2 次提交

D
update speed test threshold (#6656) · 8c619789
由 daquexian 提交于 10月 30, 2021
```
Signed-off-by: Ndaquexian <daquexian566@gmail.com>
```
8c619789

Refactor oneflow.Size (#6645) · 4be2b0a3

由 Houjiang Chen 提交于 10月 30, 2021

* Refactor oneflow.Size

* refine

* add pybind11 caster

* Support Shape cast

* refine

* fix size index

* include size header if need export C++ Shape to Python.

4be2b0a3

29 10月, 2021 8 次提交

Restruct op part4 (#6587) · b4ec60f7

由 Luyang 提交于 10月 29, 2021

* restruct matmul

* restruct narrow

* restruct unsqueeze

* restruct permute

* refine

* refine

* restruct one_hot

* format

* refine

* restruct concat

* refine

b4ec60f7

Feat: ActorContext (#6644) · f652ce4f

由 Juncheng 提交于 10月 29, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f652ce4f

fix softmax compile error (#6650) · 32fa2e25

由 guo ran 提交于 10月 29, 2021

* fix softmax compile error

* refine
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

32fa2e25

add ReduceLROnPlateau op (#6564) · 4bbf4c71

由 QiangX-man 提交于 10月 29, 2021

* add ReduceLROnPlateau op

* add unit test case

* Mod review comments

* mod review comments

* mod review comments

* mod review comments

* mod review comment

* mod test case

* adjust test case
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

4bbf4c71

S

rm NCCL_NVCC_GENCODE (#6651) · fb2947ba
由 Shenghang Tsai 提交于 10月 29, 2021

fb2947ba
fix_cpu_bn_memory_leak (#6648) · 88c355d5
由 qq_22305325 提交于 10月 29, 2021

88c355d5

Returns py::tuple instead of tensor tuple, and refine split functional api (#6638) · 805b7c4e

由 Houjiang Chen 提交于 10月 29, 2021

* Returns py::tuple instead of tensor tuple, and refine split functional api.

* fix and refine code

* fix compile

* modified split_sizes
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: luqiang-guo <702572275@qq.com>

805b7c4e

softmax primitive kernel (#6601) · f8a868be

由 guo ran 提交于 10月 29, 2021

* softmax primitive kernel

* refine

* fix

* refine

* refine

* refine

* refine

* log softmax

* refine

* refine
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f8a868be

28 10月, 2021 8 次提交

J
Blob::CopyHeaderFrom remove parameter device_ctx (#6643) · f053d5c2
由 Juncheng 提交于 10月 28, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
f053d5c2

IR round trip pass (#4138) · 6af4e70b

由 Shenghang Tsai 提交于 10月 28, 2021

* add todo

* refine

* add attr

* refine

* refine

* add todo

* refine

* add alias c1 for check-oneflow

* fix

* update scripts

* refine

* fix single client env reinit

* add attr

* save and pass mlir module

* fix

* restore module in kernel

* lower in kernel

* refien

* add scf to std

* update lit

* fmt

* add all passes

* add alisas

* refein

* refein

* add check

* fix pass order

* add TODO

* refein

* create jit exe

* refein

* fix arity

* add check and rpint err

* refein

* refein

* refein

* refein

* refein

* refein

* emiit c

* working

* revert

* add err print

* e2e works

* refein

* refein

* refein

* use STATIC_SWITCH_FUNC

* add log

* rename

* use invoke packed

* refein

* add todo

* refein

* rm log

* fix

* refein

* rm

* refein

* add scf to gpu

* add cmake flag for cuda runner

* add CMAKE_CUDA_COMPILER

* refine

* refien

* register gpu kernel

* refein

* add gpu passes

* refein

* add

* refine

* add ptx to cubin pass

* produce cubin

* add gpu to llvm pass

* refein

* add log

* refien

* link mlir cuda runtime lib

* add note

* make gpu runner available in file check

* rm unused

* add to prevent break

* fix with cuda

* edit mlir by hand to have it run on cuda

* rm useless

* add todo

* upgrade llvm

* refein m,irror scripts

* fix for llvm upgrade

* refein cmake

* fix

* fix for llvm upgrade

* remove unused headers

* refeine

* refein

* refactor

* add

* refine

* refine

* cmake first class cuda support

* refine

* refine

* refein

* refine

* refine

* refine

* refein

* add todo

* refine

* pass shared lib path from py

* prevent redef ONEFLOW_CMAKE_BUILD_TYPE

* refine msg

* fix fmt

* fix fmt

* fix fmt

* refine

* refueb

* fix

* refactor jit function outline

* refein

* rm debug log

* rm unnecessary erase

* use 75

* refein

* add allowFoldingUnitDimReshapes

* refine

* Outline JIT func (#6542)

* check in pass impl

* add test

* check in changes

* add todo

* extract func to create attrs

* refine

* refine and mv bert

* refein LLVM_EXTERNAL_LIT

* refine log user_op::AttrValueUtil::ToCppAttrValue

* fix for nd_sbp

* refine log

* fix warnings

* fix

* leverage input_order and output_order

* save lbn_segment_keys as input output order

* refine

* refein

* add CUDATOOLKIT_BIN_ROOT

* finish todo

* finish todo

* finish todo

* add matmul

* rm repetitive code

* add log

* add unary

* add gather

* refine and add gelu

* fix loc

* add mlir conv op (#6559)

* add mlir conv op

* fix conv2d tabelgen bug

* fix merge compile error

* fix comments

* Update mlir-cuda-75.cmake

* add mlir resnet50 test

* add SI32ArrayAttr
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>

* backport refactoring of translation

* Add resnet50 mlir dialect part ops (#6607)

* add scalar math ops tablegen

* add pool ops

* add bias_add op

* fix comment

* fix comment

* code format

* add reshape op

* add reduce ops and restruct scalar math ops

* fix bug

* fix typo

* address review

* address review

* rm loggin

* address review

* rm logging

* backport variable rename

* add flag ONEFLOW_MLIR_ENABLE_FUSERS
Co-authored-by: NXiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

6af4e70b

G
fix input_op infer_nd_sbp (#6642) · 0de1979d
由 guo ran 提交于 10月 28, 2021
```
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
0de1979d

fix typo PopulateOpAttribute (#6641) · 4bee2ddd

由 Shenghang Tsai 提交于 10月 28, 2021

* use git to clean dir

* rm useless to trigger CI

* trigger CI

* refine

* refine

* refine

* refine

* fix typo PopulateOpAttribute

4bee2ddd

fix heap leak (#6636) · bcb16f85

由 liufengwei0103 提交于 10月 28, 2021

Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

bcb16f85

L

remove inplace add (#6628) · f74be394
由 Luyang 提交于 10月 28, 2021

f74be394

Feat autograd function impl (#6593) · e96a5259

由 Yinggang Wang 提交于 10月 28, 2021

* feat(autograd.Function): add base class define

* format

* feat(autograd.Function): cache FunctionOpExpr in AutogradFunctionBase
                         and pass autograd.Function name to cpp

* feat(autograd.Function): wrapper PyFunction to FType

* fix(autograd.Function): fix wrapper function capture bug

* feat(autograd.Function): support autograd.Function backward

* feat(autograd.Function): refine apply return value

* fix(autograd.Function): fix autograd.Function name bug

* feat(autograd.Function): refine ctx python api

* feat(*): refine apply interface

* test(autograd.Function): fix ctx interface and add test

* feat(autograd.Function): support mark_non_differentiable

* align ctx.saved_tensors interface

* docs(autograd.Function): export documentation

* refine function names

* refine interface

* use py::args instead of py::object

* refine code

* fix(*): fix `func_name` variable conflict with CHECK_JUST

* feat(autograd.Function): support static call

* docs(autograd.Function): update documentation

* refine code

* add JUST
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e96a5259

J
Interface primitive::BroadcastElementwiseBinary (#6629) · 1cbefd2d
由 Juncheng 提交于 10月 28, 2021
```
* Interface primitive::BroadcastElementwiseBinary

* refine
Co-authored-by: Nguo ran <360112263@qq.com>
```
1cbefd2d

27 10月, 2021 4 次提交

support cpu allreduce (#6627) · b9fd067e

由 daquexian 提交于 10月 27, 2021

Signed-off-by: Ndaquexian <daquexian566@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

b9fd067e

Matmul kernels use primitive (#6589) · 81086cff

由 Juncheng 提交于 10月 27, 2021

* Matmul kernels use primitive

* refine

* fix
Co-authored-by: Nguo ran <360112263@qq.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

81086cff

fix_naive_eager_boxing_checker (#6624) · a633be98
由 qq_22305325 提交于 10月 27, 2021
```
* fix_naive_eager_boxing_checker

* refine eager_p_to_b_kerne and eager_p_to_s_kernel
```
a633be98

adding logical_not operator (#6497) · f21ff9e1

由 Zhanghuihong Guan 提交于 10月 27, 2021

* initial commit for adding logical_not operator

* adding logical_not op, debugging Dtype related problems

* finished testing locally, need to add tests

* added tests

* added docs and formatted code

* format file

* format file

* remove python wrapper

* modification based on review

* remove redundant code, format file

* modifications based on reviews

* modifications based on review

* fix duplicate license info

* fix docstring

* fix docstring warning
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f21ff9e1

26 10月, 2021 4 次提交

J

fix build permute_test.cpp (#6608) · 686ac9e8
由 Juncheng 提交于 10月 26, 2021

686ac9e8

Fix Prelu SBP (#6619) · 921bddc2

由 ZZK 提交于 10月 26, 2021

* fix sbp for prelu

* auto format by CI
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

921bddc2

Imporve roll op speed (#6618) · eb7d2a01

由 Liang Depeng 提交于 10月 26, 2021

* imporve roll speed

* imporve speed of len(dims) > 1 cases
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

eb7d2a01

Dev Batch Permute (#6441) · bca2e098

由 ZZK 提交于 10月 26, 2021

* dev torch style permute kernel

* Refine

* fix batch permute launch condition

* fix batch permute dispatch logic

* remove redundant header file

* simplified check logic

* use permute primitives in transpose kernels

* fix batch permute logic and avoid mod

* remove redundant templates

* fix grid step

* add grid for loop to avoid the elementnum is too large

* fix bug when hw is not divided by tile size

* refine format

* add a copy kernel as a baseline

* remove annotation

* add copy kernel

* add sync

* use batch permute for profile

* add copy tile baseline

* simplify params for copy kernel

* add slow copy kernel

* use mul to instead mod and remove copy

* use movement size = 4 when h w is modify by 2

* Add temp process for half2

* add half2 specialized kernel

* remove redundant license

* simplified code

* fix format

* fix comment

* fix comment

* use bad for loop condition

* merge half2 in load

* fix bad for loop in batch permute

* refine

* use align storage

* refine

* fix comment

* fix comment

* fix format

* add const and remove redundant header file

* remove register macro

* refine cuda code

* fix guoran comment

* fix format

* fix some details

* remove cuda graph

* fix for 0d tensor
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

bca2e098

Oneflow-Inc / oneflow 上一次同步 2 年多

Oneflow-Inc / oneflow
上一次同步 2 年多