1. 02 11月, 2021 7 次提交
  2. 01 11月, 2021 7 次提交
  3. 30 10月, 2021 2 次提交
  4. 29 10月, 2021 8 次提交
  5. 28 10月, 2021 8 次提交
    • J
    • S
      IR round trip pass (#4138) · 6af4e70b
      Shenghang Tsai 提交于
      * add todo
      
      * refine
      
      * add attr
      
      * refine
      
      * refine
      
      * add todo
      
      * refine
      
      * add alias c1 for check-oneflow
      
      * fix
      
      * update scripts
      
      * refine
      
      * fix single client env reinit
      
      * add attr
      
      * save and pass mlir module
      
      * fix
      
      * restore module in kernel
      
      * lower in kernel
      
      * refien
      
      * add scf to std
      
      * update lit
      
      * fmt
      
      * add all passes
      
      * add alisas
      
      * refein
      
      * refein
      
      * add check
      
      * fix pass order
      
      * add TODO
      
      * refein
      
      * create jit exe
      
      * refein
      
      * fix arity
      
      * add check and rpint err
      
      * refein
      
      * refein
      
      * refein
      
      * refein
      
      * refein
      
      * refein
      
      * emiit c
      
      * working
      
      * revert
      
      * add err print
      
      * e2e works
      
      * refein
      
      * refein
      
      * refein
      
      * use STATIC_SWITCH_FUNC
      
      * add log
      
      * rename
      
      * use invoke packed
      
      * refein
      
      * add todo
      
      * refein
      
      * rm log
      
      * fix
      
      * refein
      
      * rm
      
      * refein
      
      * add scf to gpu
      
      * add cmake flag for cuda runner
      
      * add CMAKE_CUDA_COMPILER
      
      * refine
      
      * refien
      
      * register gpu kernel
      
      * refein
      
      * add gpu passes
      
      * refein
      
      * add
      
      * refine
      
      * add ptx to cubin pass
      
      * produce cubin
      
      * add gpu to llvm pass
      
      * refein
      
      * add log
      
      * refien
      
      * link mlir cuda runtime lib
      
      * add note
      
      * make gpu runner available in file check
      
      * rm unused
      
      * add to prevent break
      
      * fix with cuda
      
      * edit mlir by hand to have it run on cuda
      
      * rm useless
      
      * add todo
      
      * upgrade llvm
      
      * refein m,irror scripts
      
      * fix for llvm upgrade
      
      * refein cmake
      
      * fix
      
      * fix for llvm upgrade
      
      * remove unused headers
      
      * refeine
      
      * refein
      
      * refactor
      
      * add
      
      * refine
      
      * refine
      
      * cmake first class cuda support
      
      * refine
      
      * refine
      
      * refein
      
      * refine
      
      * refine
      
      * refine
      
      * refein
      
      * add todo
      
      * refine
      
      * pass shared lib path from py
      
      * prevent redef ONEFLOW_CMAKE_BUILD_TYPE
      
      * refine msg
      
      * fix fmt
      
      * fix fmt
      
      * fix fmt
      
      * refine
      
      * refueb
      
      * fix
      
      * refactor jit function outline
      
      * refein
      
      * rm debug log
      
      * rm unnecessary erase
      
      * use 75
      
      * refein
      
      * add allowFoldingUnitDimReshapes
      
      * refine
      
      * Outline JIT func (#6542)
      
      * check in pass impl
      
      * add test
      
      * check in changes
      
      * add todo
      
      * extract func to create attrs
      
      * refine
      
      * refine and mv bert
      
      * refein LLVM_EXTERNAL_LIT
      
      * refine log user_op::AttrValueUtil::ToCppAttrValue
      
      * fix for nd_sbp
      
      * refine log
      
      * fix warnings
      
      * fix
      
      * leverage input_order and output_order
      
      * save lbn_segment_keys as input output order
      
      * refine
      
      * refein
      
      * add CUDATOOLKIT_BIN_ROOT
      
      * finish todo
      
      * finish todo
      
      * finish todo
      
      * add matmul
      
      * rm repetitive code
      
      * add log
      
      * add unary
      
      * add gather
      
      * refine and add gelu
      
      * fix loc
      
      * add mlir conv op (#6559)
      
      * add mlir conv op
      
      * fix conv2d tabelgen bug
      
      * fix merge compile error
      
      * fix comments
      
      * Update mlir-cuda-75.cmake
      
      * add mlir resnet50 test
      
      * add SI32ArrayAttr
      Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
      
      * backport refactoring of translation
      
      * Add resnet50 mlir dialect part ops (#6607)
      
      * add scalar math ops tablegen
      
      * add pool ops
      
      * add bias_add op
      
      * fix comment
      
      * fix comment
      
      * code format
      
      * add reshape op
      
      * add reduce ops and restruct scalar math ops
      
      * fix bug
      
      * fix typo
      
      * address review
      
      * address review
      
      * rm loggin
      
      * address review
      
      * rm logging
      
      * backport variable rename
      
      * add flag ONEFLOW_MLIR_ENABLE_FUSERS
      Co-authored-by: NXiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
      6af4e70b
    • G
      0de1979d
    • S
      fix typo PopulateOpAttribute (#6641) · 4bee2ddd
      Shenghang Tsai 提交于
      * use git to clean dir
      
      * rm useless to trigger CI
      
      * trigger CI
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * fix typo PopulateOpAttribute
      4bee2ddd
    • L
      bcb16f85
    • L
      remove inplace add (#6628) · f74be394
      Luyang 提交于
      f74be394
    • Y
      Feat autograd function impl (#6593) · e96a5259
      Yinggang Wang 提交于
      * feat(autograd.Function): add base class define
      
      * format
      
      * feat(autograd.Function): cache FunctionOpExpr in AutogradFunctionBase
                               and pass autograd.Function name to cpp
      
      * feat(autograd.Function): wrapper PyFunction to FType
      
      * fix(autograd.Function): fix wrapper function capture bug
      
      * feat(autograd.Function): support autograd.Function backward
      
      * feat(autograd.Function): refine apply return value
      
      * fix(autograd.Function): fix autograd.Function name bug
      
      * feat(autograd.Function): refine ctx python api
      
      * feat(*): refine apply interface
      
      * test(autograd.Function): fix ctx interface and add test
      
      * feat(autograd.Function): support mark_non_differentiable
      
      * align ctx.saved_tensors interface
      
      * docs(autograd.Function): export documentation
      
      * refine function names
      
      * refine interface
      
      * use py::args instead of py::object
      
      * refine code
      
      * fix(*): fix `func_name` variable conflict with CHECK_JUST
      
      * feat(autograd.Function): support static call
      
      * docs(autograd.Function): update documentation
      
      * refine code
      
      * add JUST
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      e96a5259
    • J
      Interface primitive::BroadcastElementwiseBinary (#6629) · 1cbefd2d
      Juncheng 提交于
      * Interface primitive::BroadcastElementwiseBinary
      
      * refine
      Co-authored-by: Nguo ran <360112263@qq.com>
      1cbefd2d
  6. 27 10月, 2021 4 次提交
  7. 26 10月, 2021 4 次提交
    • J
      fix build permute_test.cpp (#6608) · 686ac9e8
      Juncheng 提交于
      686ac9e8
    • Z
      Fix Prelu SBP (#6619) · 921bddc2
      ZZK 提交于
      * fix sbp for prelu
      
      * auto format by CI
      Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      921bddc2
    • L
      Imporve roll op speed (#6618) · eb7d2a01
      Liang Depeng 提交于
      * imporve roll speed
      
      * imporve speed of len(dims) > 1 cases
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      eb7d2a01
    • Z
      Dev Batch Permute (#6441) · bca2e098
      ZZK 提交于
      * dev torch style permute kernel
      
      * Refine
      
      * fix batch permute launch condition
      
      * fix batch permute dispatch logic
      
      * remove redundant header file
      
      * simplified check logic
      
      * use permute primitives in transpose kernels
      
      * fix batch permute logic and avoid mod
      
      * remove redundant templates
      
      * fix grid step
      
      * add grid for loop to avoid the elementnum is too large
      
      * fix bug when hw is not divided by tile size
      
      * refine format
      
      * add a copy kernel as a baseline
      
      * remove annotation
      
      * add copy kernel
      
      * add sync
      
      * use batch permute for profile
      
      * add copy tile baseline
      
      * simplify params for copy kernel
      
      * add slow copy kernel
      
      * use mul to instead mod and remove copy
      
      * use movement size = 4 when h w is modify by 2
      
      * Add temp process for half2
      
      * add half2 specialized kernel
      
      * remove redundant license
      
      * simplified code
      
      * fix format
      
      * fix comment
      
      * fix comment
      
      * use bad for loop condition
      
      * merge half2 in load
      
      * fix bad for loop in batch permute
      
      * refine
      
      * use align storage
      
      * refine
      
      * fix comment
      
      * fix comment
      
      * fix format
      
      * add const and remove redundant header file
      
      * remove register macro
      
      * refine cuda code
      
      * fix guoran comment
      
      * fix format
      
      * fix some details
      
      * remove cuda graph
      
      * fix for 0d tensor
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      bca2e098