提交 · 380d2414d2ebd45dbd04b9a22a3241098790aec3 · Oneflow-Inc / oneflow

01 11月, 2021 1 次提交

Change maybe to optional (#6611) · 380d2414

由 Zhanghuihong Guan 提交于 11月 01, 2021

* initial commit, add code for async construct tensor from numpy array

* inital commit to change Maybe to Optional

* delete redundant code

* replace Maybe with Optional

* fix compile errors

* format code

* changes based on review

* format code, fix based on review

* format code

* fix multiclient type

* changes based on review

* changes based on review

* unify calling to IsMultiClirnt

* refector multi_client related code

* restore InMultiClient interface

* double check for unnecessary changes

* remove unnecessary changes

* format code

* Update oneflow/api/python/symbol/job_conf_symbol.cpp

* Update oneflow/api/python/symbol/op_conf_symbol.cpp

* Update oneflow/api/python/symbol/op_node_signature_symbol.cpp

* Update oneflow/core/common/optional.h

* Update oneflow/api/python/symbol/string_symbol.cpp

* Update oneflow/api/python/symbol/scope_symbol.cpp

* Update oneflow/api/python/symbol/placement_symbol.cpp

* Update oneflow/api/python/symbol/op_conf_symbol.cpp
Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: NTwice <i@twice.moe>

380d2414

15 9月, 2021 1 次提交

Naive Eager S to S (#6186) · 5c079988

由 qq_22305325 提交于 9月 15, 2021

* mv_boxing_folder_to_core

* minor fix

* cpu_eager_s0_to_s0

* refine

* refine

* Update naive_s0_to_s0_boxing.cpp

* refine

* refine

* Update eager_s0_to_s0_op.cpp

* refine

* refine

* minor fix

* refine

* refactor with TensorSliceView && support asymmetric S(x) To S(y)

* refine

* refine

* refine

* minor fix

* rename files

* gpu naive s to s support

* add micro judge

* del outdate head file

* minor fix

* refine

* refine

* refine

* refine

* refine
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

5c079988

10 9月, 2021 1 次提交

Use attr nd_sbp to check consistent (#6222) · e1a16561

由 leaves-zwx 提交于 9月 10, 2021

* change

* ofrecord_dataset compitable

* add comment

* specify 1n1d sbp
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e1a16561

03 9月, 2021 1 次提交

Decompose nd sbp boxing (#5800) · 9c464a31

由 Li Xinqi 提交于 9月 03, 2021

* GetBroadcastGroup

* fix comment typo.

* broadcast shape and dtype

* 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal

* fix wrong use of LocalRank

* 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta.

* don't set consistent_id when recursively calling eager consistent op interpreter.

* decompose nd_sbp boxing

* disable checking consistent tensor meta recursively.

* GetDecomposableEquivalent

* fix a unittest case bug

* fix a bug in unittest

* fix compiler complain

* add unitests for CalcDecomposableEquivalentShapeAndNdSbpPair

* InitNdSbpValidTransformationAxisSequence

* DecomposeIntoNaiveTransformations

* fix compiler complains

* move several unitests in parallel_desc_test.cpp into placement_sbp_util_test.cpp

* abstract_consistent_to_consistent_op_expr

* fix compiler complaint

* refactor consistent-to-consistent eager consisitent op interpreter

* fix compiler complaint

* refactor ConsistentToConsistentOpExpr

* lazy interpreter (#5903)

* fix bugs about consistent_id

* refactor functional::ToConsistent

* refactor GetNdSbp

* fix compiler complaints

* upgrade gtest and fix static check error

* update head file index

* fix bug

* modify path of gtest lib

* refactor NaiveNdSbpBoxingInterpreter to BoxingExpr(symmetric-nd-sbp-to-nd-sbp)

* fix compiler complaints

* Update gmock_headers.txt

* Update gtest_headers.txt

* fix bug about disable checking consistent meta in local to consistent functor

* fix include bug
Co-authored-by: clackhan <han_binbin@163.com>
Co-authored-by: Nleaves-zwx <kunta0932@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Nliufengwei <2472937968@qq.com>
Co-authored-by: NTwice <i@twice.moe>
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>

9c464a31

31 8月, 2021 1 次提交

Fix sync nccl and async nccl deadlock (#6071) · d51d893b

由 liufengwei0103 提交于 8月 31, 2021

* add loop p2b test

* refine

* refine

* use more data

* use more data

* refactor test_sync_and_async_allreduce.py

* sequential all comm_net ops

* remove unused a bash test file

* reset pool_size of async_launced_nccl to high water mark

* Don't sequentail stream comm_net and stream async_launched_nccl

* sequential nccl if defined(WITH_CUDA)

* EnvGlobalObjectsScope has no responsibility for MakeParallelDesc4Device

* auto format by CI

* Update object_msg_core.h

NOLINT

* auto format by CI

* default op_device

* refactor flow.F.xxx to flow._C.xxx
Co-authored-by: NXinqi Li <lixinqi0703106@163.com>
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>

d51d893b

30 8月, 2021 1 次提交

Rename Error::xx to Error::xxError (#6049) · 34b411a7

由 ZZK 提交于 8月 30, 2021

* fix error

* fix unimplemented to unimplementederror

* fix
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

34b411a7

20 8月, 2021 2 次提交

Cpu mpi (#5865) · f289c8f5

由 Li Xinqi 提交于 8月 20, 2021

* cuda base cpu mpi boxing

* cpu_mpi

* fix conflicts

* add cpu mpi unittests

* more checks and unittests

* abstract_consistent_to_consistent_op_expr

* fix compiler complaint

* refactor consistent-to-consistent eager consisitent op interpreter

* fix compiler complaint

* refactor ConsistentToConsistentOpExpr

* lazy interpreter (#5903)

* fix bugs about consistent_id

* more test_consistent_cast unittests

* refactor functional::ToConsistent

* refactor GetNdSbp

* fix compiler complaints

* refactor GetDevice4CurrentProcessCtx

* fix error
Co-authored-by: clackhan <han_binbin@163.com>
Co-authored-by: Nleaves-zwx <kunta0932@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f289c8f5

extract_consistent_to_consistent_op_expr (#5870) · 585479a6

由 qq_22305325 提交于 8月 20, 2021

* abstract_consistent_to_consistent_op_expr

* fix compiler complaint

* refactor consistent-to-consistent eager consisitent op interpreter

* fix compiler complaint

* refactor ConsistentToConsistentOpExpr

* lazy interpreter (#5903)

* fix bugs about consistent_id

* refactor functional::ToConsistent

* refactor GetNdSbp

* Update eager_consistent_op_interpreter.cpp

* Update eager_mirrored_op_interpreter.cpp

* fix error

* fix error

* auto format by CI

* Update nd_sbp.h

* refine identity boxing

* fix sync checkmeta error

* avoid consistent id check in lazy
Co-authored-by: NXinqi Li <lixinqi0703106@163.com>
Co-authored-by: Nleaves-zwx <kunta0932@gmail.com>
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>

585479a6

13 8月, 2021 1 次提交

Decorator 4 disable recursive boxing call (#5796) · c071635f

由 Li Xinqi 提交于 8月 13, 2021

* GetBroadcastGroup

* fix comment typo.

* broadcast shape and dtype

* 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal

* fix wrong use of LocalRank

* 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta.

* don't set consistent_id when recursively calling eager consistent op interpreter.

* refactor tensor_rpc_util.h

* add GlobalProcessCtx::NodeId and GetParallelId4CurrentProcessCtx

* fix compiler complain

* fix compiler complain

* address pr comments
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Ncheng cheng <472491134@qq.com>

c071635f

11 8月, 2021 1 次提交

stateful local kernel supports consistent (#5789) · 05e40d7f

由 daquexian 提交于 8月 11, 2021

* add consistent tensor meta in stateful local opkernel
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* cache parallel ctx
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* reformat
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* refine
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* delete CachedGetParallelContext4CurrentProcessCtx for now
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* refine and add tests
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* ThreadLocalCopiable -> ThreadLocal
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* auto format by CI
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>

05e40d7f

10 8月, 2021 1 次提交

Get parallel_id and parallel_num through rank and world size in DDP (#5717) · 963cc01b

由 leaves-zwx 提交于 8月 10, 2021

* get parallel_id and parallel_num through rank and world size in dpp

* address review

* coco reader support parallel distribution

* fix that device not set

* test parallel for ofrecord reader

* update test

* update test

* erase illegal check

* test success

* add GPTIndexedBinDataReader module

* test distributed GPTIndexedBinDataReader

* fix that LogicalTensorDesc4ArgNameAndIndex unimplemented in local kernel init context

* graph handle TensorTuple output

* fix COCOReader forward

* update test_gpt_data_loader

* test_coco_reader

* auto format by CI

* fix block call return not supporting tensor tuple

* fix IsMirroredParallelContext

* convert TensorTuple to tuple of Tensor

* check
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

963cc01b

03 8月, 2021 1 次提交

Support tensor.to()/to_local() (#5271) · a72c21d9

由 qq_22305325 提交于 8月 03, 2021

* support_tensor_to/to_local

* export consistent_tensor.to_local()

* refine code

* export tensor.to()...

* refine code

* refine code

* optimize code

* refine code

* refine

* back up

* add tensor.to func

* make of_format

* remove to in pyTensor

* sync gpu data

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* backup

* refine

* rebase

* check in gen py

* merge master and fix bugs

* address pr comments

* address pr comments

* auto format by CI

* remove boxing

* refine

* Fix optional

* remove to in tensor.cpp

* update

* Support symbol placement type in functional.

* add sbp and sbp list arg

* refine

* use functional

* refactor CastConsistentOpExpr

* to_consistent(flow.B) backward

* Cache op expr

* add EagerNcclOpKernelState

* refine

* refine

* refine

* refine

* refine

* refine

* minor fix

* capture OpInterpContext

* unimplemented apply

* add GetNdSbp

* add mutex

* refine

* merge EagerConsistentTensorImpl::NewWithPhyTensor and EagerConsistentTensorImpl::NewWithoutPhyTensor into EagerConsistentTensorImpl::New

* rename functiona SyncData to SyncMetaAndData

* of_format

* add to_local to pybind

* add placement_sbp_util

* minor fix

* sync shape and data when tensor_to_local

* fix rpc_token bugs

* refactor AsyncRpcCtx

* set logical_shape correctly

* simplify implementation of consistent_tensor.to_local

* initialize rpc_token with zero

* refactor grad functions of to_consistent/to_local

* reformat and address pr comment

* reformat

* refactor eager_nccl_reduce lernel
Co-authored-by: Ntsai <jackalcooper@gmail.com>
Co-authored-by: NXinqi Li <lixinqi0703106@163.com>
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Nhjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

a72c21d9

31 7月, 2021 1 次提交

New sync consistent meta info (#5634) · 0a44b54e

由 Li Xinqi 提交于 7月 31, 2021

* rebase

* check in gen py

* merge master and fix bugs

* address pr comments

* address pr comments

* auto format by CI

* rebase

* address pr comments

* auto format by CI

* functional python_arg

* reuse ctrl rpc token for avoiding long time timeout waiting.

* fix compiler complaints

* auto format by CI

* auto format by CI

* remove unused files

* fix return type error on gcc 4.8.5
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* auto format by CI

* fix return type error in xrt
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* fix tick ibn sbp signature

* auto format by CI
Co-authored-by: Ntsai <jackalcooper@gmail.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Ndaquexian <daquexian566@gmail.com>

0a44b54e

30 7月, 2021 1 次提交

rebase (#5601) · bf4bdd62

由 Li Xinqi 提交于 7月 30, 2021

* rebase

* check in gen py

* merge master and fix bugs

* address pr comments

* address pr comments

* auto format by CI

* functional python_arg

* auto format by CI

* remove unused files

* fix return type error on gcc 4.8.5
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* auto format by CI

* fix return type error in xrt
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* fix tick ibn sbp signature

* auto format by CI
Co-authored-by: Ntsai <jackalcooper@gmail.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Ndaquexian <daquexian566@gmail.com>

bf4bdd62

01 7月, 2021 1 次提交

Infer consistent tensor meta (#5118) · c3238bfd

由 Li Xinqi 提交于 7月 01, 2021

* Device::compute_dep_object_

* sequantialize instructions in the same stream.

* refactor AttrMap

* refactor Tensor

* Export ConsistentTensor::is_cuda

* remove ConsistentTensor::blob_object

* refactor TensorImpl

* minor fix

* fix compiler' complains

* Implements EagerConsistentTensorImpl::New

* minor fix

* fix compiler complains

* remove unused code

* skip test_creating_consistent_tensor

* backup code

* Symbol::shared_from_symbol

* remove redundant header file includes

* fix bug in Symbol::shared_from_symbol

* symbolize ParallelDesc and ParallelDistribution

* symbolize Scope::GetParallelDesc()

* IsScalarType

* fix compiler complains

* InputConsistentTensorMeta

* refactor Scope with PlacementScope

* fix bug in exporting Scope to python

* backup code

* refactor DType

* fix compiler complains

* backup code

* DType is only allowed to be used in python code

* backup code

* dtype api bugfix

* fix error on exiting
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* lazily get rank
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* Export const DType* into python

* minor fix

* fix bug

* refine

* refactor signature of OpExpr::InferLogicalShapeAndDtype

* fix bug

* backup_code

* fix bug

* refactor SbpXXX to cfg::SbpXXX

* merge refactor_sbp_to_cfg_sbp

* fix bug

* Infer ConsistentTensorMeta

* Implement EagerConsistentInterpret::ApplyImpl

* 1) move XXXTensorMeta into the new file tensor_meta.h; 2) add new Class ConsistentTensorInferCache

* add class ConsistentTensorInferResult

* remove unused OpArgMutConsistentTensorMeta::parallel_distribution_

* fix stack-overflow bug in Tensor::mut_eager_mirrored_tensor_impl

* ignore empty parallel distribution constaint

* fix bug

* add explicit of cfg

* fix xla compile bug

* auto format by CI

* fix according comment

* fix bug
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: clackhan <han_binbin@163.com>
Co-authored-by: Ndaquexian <daquexian566@gmail.com>
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>

c3238bfd

24 4月, 2021 1 次提交

eager local stateful kernel (#4559) · f1f09f84

由 daquexian 提交于 4月 24, 2021

* PhyInstrOperand

* CHECK_NOTNULL

* LocalCallOpKernelUtil

* implement LocalCallOpKernelUtil

* fix WithOpInferContext/WithComputeInferContext

* fix tensor->blob_object() to tensor->eager_blob_object()

* init commit
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* init more ctx in constructor
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* test
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* set parallel_desc according to scope
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* LogicalRun -> PhysicalRun

* refine stateful op kernel

* refine

* refine

* build eager blob object list before calling builder, rename TensorsPtr
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* Fix device CHECK_EQ

* update

* update
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update

* refine

* code style updates

* add const quantifiers
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* fix comments

* update tests
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* revert api/python/symbol/placement_symbol.cpp
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* update ForEachOutputTensor, replace auto with const auto&
Signed-off-by: Ndaquexian <daquexian566@gmail.com>

* add local dep objects in local opkernel
Signed-off-by: Ndaquexian <daquexian566@gmail.com>
Co-authored-by: Nlixinqi <lixinqi0703106@163.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

f1f09f84

31 3月, 2021 1 次提交

Feat: NCCL use compute stream support 2D SBP (#4533) · 8abe18ea

由 cheng cheng 提交于 3月 31, 2021

* Insert NCCL logical op pass support hierarchy

* Add NCCL logical 2D SBP op/kernel support (*P)->(*B)

* Add NCCL logical 2D SBP op/kernel support (P*)->(B*)

* Fix bug and support (*, S(0)) -> (*, B) [dim1:AllGather] and (*, S(in)) -> (*, S(out)) [dim1:All2All]

* Fix BUG and runnable

* fix hierarchy equal bug

8abe18ea

23 3月, 2021 1 次提交

New ci about multi process on single client (#4367) · b7075933

由 qq_22305325 提交于 3月 23, 2021

* add_node_size_config

* add IsThisProcessMaster

* fix error

* add CHECK_EQ

* mutil_process_on_single_client

* fix error

* minor fix

* fix error

* fix bug

* new_ci_about_mutil_process_on_single_clien

* fix unittest.py

* refactor parallel_desc

* fix bug

* backup code

* fix ResourceDesc::ResourceDesc

* make of_format

* fix test bug

* fix bug

* backup

* minor fix

* backup

* minor fix

* del useless log

* optimize code

* add ci test

* Update test.yml

* minor fix

* backup

* run callback_notifier after all critical sections done

* group ticks originated from different critical sections

* minor fix about plan display

* backup

* refine code

* minor fix

* add comment

* fix path error

* add CHECK

* do not use oneflow_worker

* minor fix
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Nlixinqi <lixinqi0703106@163.com>
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>

b7075933

10 3月, 2021 1 次提交

Parallel conf add hierarchy (#4348) · 30a44a4e

由 guo ran 提交于 3月 10, 2021

* parallel_conf add hierarchy

* refine

* fix

* fix

* refine

* fix

* fix
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

30a44a4e

06 1月, 2021 1 次提交

Refactor python remote blob (#4081) · d399fd49

由 qq_22305325 提交于 1月 06, 2021

* remove BlobDef

* fix code format

* del uneless line

* del uneless line

* fix mistake

* tmp storage

* fix bug

* refactor ArgBlobDef

* fix bug

* refactor_consist_blob

* fix bug

* fix code format

* fix distribute test bug

* fix op test bug

* refactor lazy consist blob

* fix distribute test bug

* fix according to comment

* remove const

* fix code format

* fix bug

* refactor_mirrored_blob

* fix bug

* fix bug

* fix bug

* fix bug

* rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS

* rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS

* replace oneflow_api with flow in test file

* fix mirrored bug

* fix EagerMirroredBlob init bug

* fix get_dtype bug

d399fd49

26 12月, 2020 1 次提交

Parallel desc with symbol (#4017) · abdc6dea

由 qq_22305325 提交于 12月 25, 2020

* parallel desc with symbol_id

* migrate ParallelDescSymbol

* fix code format

* fix bug in oneflow_testexe

* Make oneflow worker docker stay alive for 6 hours

* exception

* except in pybind11 and python

* finetune api

* print traceback

* fix bug

* fix format

* ParallelDesc::cfg_parallel_conf

* remove traceback in test_checkpoint

* fix python codeformat

* del job_build_and_infer_cfg_error.py

* optimize api struct

* add CompileOptionWrongError

* rename OF_COMPLIE_OPTION_EEEOR
Co-authored-by: Nlixinqi <lixinqi0703106@163.com>
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

abdc6dea

10 10月, 2020 1 次提交

Eager transport (#3598) · 5181feed

由 Li Xinqi 提交于 10月 10, 2020

* TransportStreamType

* no constexpr specifier for Clamp<T>

* refine signature of Send

* GetTransportInstructionParallelConfs

* Send/Receive multi blobs by one instruction

* refine: static inline -> inline static

* refine comment according to google style guide

* implement Send/Receive by Grpc

* reimplement Send/Receive

* oneflow.eager_assign_121

* BoxingInterNodeOneToOne

* 2node_test_assign

* rename OF_BARRIAER

* add eager_2node_test.py

* InitLazyGlobalSession if eager execution not enabled

* remove Global<LbiDiffWatcherInfo>

* add TODO() comments for OF_SESSION_BARRIER under directory core/comm_network

* import atexit in python/framework/unittest.py

* fix minor bug in test_assign.py

* fix macro name error: PLATFORM_POSIX -> OF_PLATFORM_POSIX

5181feed

07 8月, 2020 1 次提交
- J
  
  Fix parallel_desc.h include issus for user op (#3434) · 50b467cf
  由 Juncheng 提交于 8月 07, 2020
  
  50b467cf
26 7月, 2020 1 次提交

refactor ParallelConf (#3268) · dd0786dd

由 qq_22305325 提交于 7月 25, 2020

* refactor ParallelConf

* refactor parallel_conf

* fix parallel_conf init

* refactor opkernel_instruction_type init/compute

* fix test_cpu_only_user_op

* remove notes

* fux non_distributed_optimizer_pass.cpp bug

* fix code style

* fix a small bug
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>

dd0786dd

23 7月, 2020 1 次提交

Dev apache2 license (#3266) · d0bdbd5d

由 Shenghang Tsai 提交于 7月 23, 2020

* add license at root dir

* check in empty files

* rm space

* check in script

* update script

* fix bug

* add print

* fix

* add exit

* add to of_format

* add CI task

* fix license

* Revert "fix license"

This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57.

* only add once

* quick fix

* fix script

* dont fmt empty file

* fix

* quick fix

* fix py

* add license

* fix exit

* add license for hpp

* add license

* license new vm files
Co-authored-by: Ntsai <caishenghang@oneflow.org>

d0bdbd5d

04 7月, 2020 2 次提交

Add function CheckWithResourceDesc (#3117) · c3f95bc5

由 OuYang Yu 提交于 7月 04, 2020

* Add function CheckWithResourceDesc

* Modify function CheckWithResourceDesc

* Modify function CheckWithResourceDesc

* Function CheckWithResourceDesc properties private

* remove check

c3f95bc5

O

migrate cpp code of eager (#3126) · cb467383
由 OuYang Yu 提交于 7月 04, 2020

cb467383

30 6月, 2020 1 次提交

make op kernel lookup failure message more user friendly (#3099) · 1dc47220

由 Liang Depeng 提交于 6月 30, 2020

* make op kernel lookup failure message more user friendly

* make op kernel lookup failure message more user friendly
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>

1dc47220

24 6月, 2020 2 次提交
- O
  Modify DeviceTag4DeviceType func return to Maybe<const char*> (#3077) · f652693c
  由 OuYang Yu 提交于 6月 24, 2020
```
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
```
  f652693c
- D
  Add high order bool implementationfor cpp (#3014) · 91fe8b3d
  由 Depeng Liang 提交于 6月 24, 2020
```
* feat: add high order bool implementationfor cpp

* use 'SetIsMatchedHob' to substitute 'SetIsMatchedPred' in user_op kernel registration
```
  91fe8b3d
23 6月, 2020 1 次提交
- O
  Dev migrate vm from dev eager (#3059) · e84b364f
  由 OuYang Yu 提交于 6月 23, 2020
```
* Add vm code

* Remove duplicate include
```
  e84b364f
26 11月, 2019 1 次提交

Merge quick dirty from obj detect (#2444) · f5937569

由 Li Xinqi 提交于 11月 26, 2019

* cmake find python note when version less 3.14 (#2286)

* fix bug: reduce split kernel inplace (#2297)

* Dev bias add (#2299)

* use bias add

* fix

* bias_add

* bias add half

* fix

* reinterpret_cast

* fix half

* HALF

* fix

* ADD_DEFAULT_KERNEL_CREATOR

* fix

* format

* Fix dev python test (#2294)

* add decode random

* fix decode random actor

* fix dev_python test scripts

* fix batch_size test scripts

* fix

* Memory Version 2.0 Step 2:  MemSharedAndReused between jobs (#2267)

* MemBlockProto and ChunkProto

* create mem block and chunk after improver

* interface merge mem block and chunk between sub plans

* merge chunk between jobs for memory reuse

* using memory zone unique id replace memory case hash

* merge interface op mem block between jobs for mem shared

* gen GlobalCriticalSection by mem block id and chunk id

* check mem block and chunk valid before runtime

* Refactor: RegstMgr ;  allocate memory by mem block and chunk instead of regst

* fix bug; and pass test

* fig bug: init chunk_id_count in id_manager

* reuse copyHd out mem between jobs

* PushPlan and PullPlan for memblock and chunk

* refine merge mem block / chunk in oneflow.cpp

* at(i);

* GetOpName2JobId2TaskProtos functional

* using output ptr; pass test AlexNet and Resnet

* Dev cuda 9 arch 70 (#2318)

* kCudaAlignSize = 256

* always compute_70

* __CUDA_API_VERSION >= 10000

* __CUDA_API_VERSION >= 10000

* disable_all_reduce_sequence

* Fix cuda9 cudnn turing issue (#2329)

* fix cuda 9 issus on turing device

* CUDA_VERSION

* no cuda check

* bias add kernel gpu half (#2330)

* mem_block=>header_mem_block (#2338)

* speedup oneflow compilation

* identity_sbp_conf

* DropOut Version2 (#2355)

* random mask like op conf; refine dropout op in python

* remove useless dropout kernel conf

* implement of random mask like op

* refine dropout op

* refine dropout grad op

* refine generate dropout backward

* random mask like kernel

* refine dropout (grad) kernel

* fix link problem for template separated compile

* fix bug and pass test

* dropout kernel for half

* add check for dropout mask input data type

* bugfixs

* Remove IsOpFloat32() in auto_mixed_precision.cpp (#2358)

* fuse op/kernl to 1 cpp

* refine for review

* fix bug

* Refactor Kernel Registry for more flexible registration (#2363)

* feat: update KernelRegistration and add KernelRegValProto

* Refactor Kernel Registry for more flexible registration

* Remove unused kernel_reg_value.proto

* Memory Version 2.0 Step 3: MemReused in job (#2319)

* use_memory_allocation_algorithm_v2 for switch improver mem block id

* reuse plan task graph and ctrl edge for inferred mem block

* refine interface; InJobMemSharingUtil

* navie merge memory big chain; gen regst apply/release queue; handle for inplace hint regst

* generate regst 2 mutual exclusion regsts

* bugfix: apply should before release

* interface for multi-thread run algorithm get mem block offset result

* selet best algorithm to set mem block id and mem block offset

* set mem block for inplace consumer regst

* 3 algorithm interface

* half implement of algo 1

* implement of algorithm0_OfColorImproved

* runnable in 1 machine 1 device

* Memory Chain

* merge MemoryChain and pass Correctness test of alexnet and resnet50

* bugfixs: continues inplace consume relationship in bert-base fp16

* erase useless info in MemoryChain

* implement of BfcAllocator and Tf_Bfc algorithm

* use bfc algo and fix bug

* only use default algo

* renme in_job_* => intra_job_*

* rename: InJob* => IntraJob*

* rename: 1) apply_regsts_queue => alloc_regsts_queue; 2) release_regsts_queue => free_regsts_queue

* rename function name in job/intra_job_mem_sharing_util.cpp

* rename variable names in job/intra_job_mem_sharing_util.cpp: 1) *apply* => *alloc*; 2) *release* => *free*

* refactor FindFreeOffset => FindFreeOffsetAndNewBufferSize

* rename method: DeallocateRaw => FreeRaw

* rename varable for review

* use enum for mem reused algorithm and add python interface

* fix sbp infer (#2373)

* mv addr calculation out of decoder (#2374)

* use tmp blob for temp storage (#2375)

* INDEX_DATA_TYPE_SEQ (#2381)

* refine include (#2382)

* refine include

* format


format

* element_wise_mul (#2383)

* gather refine (#2384)

* Dev fix sbp (#2388)

* fix sbp

* fix sbp

* remove VirtualGenKernelConf

* rename Read to ReadFully (#2389)

* Dev parallel cast (#2391)

* parallel cast

* op_conf

* refine

* Dev auto zero padding (#2393)

* auto_zero_padding

* auto_zero_padding

* fix

* fix input_mask and token_type_id (#2398)

* fix job launch (#2401)

* fix sbp bug (#2402)

* fix sbp

* fix

* add missing header files (#2410)

* refactor cnn model tests (#2411)

* refactor cnn model tests

* reformat README.md

* reformat README.md

* refactor ndarray_reduce (#2412)

* fix inplace reachability bug (#2413)

* refactor gpu relu (#2414)

* refactor gpu relu

* CHECK_KERNEL_SAFE_INT32

* there may be a subtle cuda bug in ((float) x < 0)

* refactor ndarray_reduce (#2405)

* refactor ndarray_reduce

* refactor relu/bias_add

* refactor relu

* refactor relu

* refactor bias_add

* refactor relu/bias_add

* fix inplace_lbi bug

* refactor addition

* IsKernelSafeInt32

* CUDA_1D_KERNEL_LOOP_T

* CUDA_1D_KERNEL_LOOP_T

* If add (#2415)

* refactor ndarray_reduce

* refactor relu/bias_add

* refactor relu

* refactor relu

* refactor bias_add

* refactor relu/bias_add

* fix inplace_lbi bug

* refactor addition

* IsKernelSafeInt32

* CUDA_1D_KERNEL_LOOP_T

* CUDA_1D_KERNEL_LOOP_T

* add unless oprand is nonzero

* Clear session (#2416)

* oneflow.clear_default_session

* fix bugs in oneflow.config.machine

* refactor function return type (#2417)

* fix for py2 (#2418)

* blob parallel conf

* Pr watch scope (#2419)

* pr oneflow.watch*

* merge more code to pass watch_scope.py

* TODO: input_blob_def.parallel_conf

* fix reexport of identity op

* merge dev_quick_dirty_object_detection

* oneflow.cluster (#2423)

* oneflow.cluster

* no alias for oneflow.cluster.*

* mv cpp_logging_conf from config_proto to cluster_proto

* rename: cluster => env

* rename: Environment => Session

* Free port (#2427)

* oneflow.cluster

* no alias for oneflow.cluster.*

* mv cpp_logging_conf from config_proto to cluster_proto

* rename: cluster => env

* rename: Environment => Session

* auto find a free port for single node environment

* localhost only

* Dev single processor test (#2430)

* oneflow.cluster

* no alias for oneflow.cluster.*

* mv cpp_logging_conf from config_proto to cluster_proto

* rename: cluster => env

* rename: Environment => Session

* auto find a free port for single node environment

* localhost only

* single process test

* Cluster::WorkerLoop

* delete unnecessary OF_BARRIER_ALL

* no longer fork children processes to run tests

* format

* fix align byte size bug (#2436)

* fix align bugs (#2440)

* fix: GetNumOfLoDLevels lack return

* minor script fix and update

* update script

* remove redundant function

f5937569

17 11月, 2019 1 次提交
- L
  
  blob parallel conf · efeccd6f
  由 lixinqi 提交于 11月 17, 2019
  
  efeccd6f
17 10月, 2019 1 次提交
- L
  
  speedup oneflow compilation · 6d50f038
  由 lixinqi 提交于 10月 17, 2019
  
  6d50f038
12 10月, 2019 1 次提交
- L
  
  ParseMachineAndDeviceIdList · 633f2baf
  由 lixinqi 提交于 10月 12, 2019
  
  633f2baf
11 10月, 2019 1 次提交
- L
  
  oneflow.debug.distribute_split · e82ebf10
  由 lixinqi 提交于 10月 11, 2019
  
  e82ebf10
10 10月, 2019 1 次提交
- L
  
  current_placement_scope.parallel_size · 89edbda4
  由 lixinqi 提交于 10月 10, 2019
  
  89edbda4
24 9月, 2019 1 次提交

merge with dev_python (#2249) · 3960d2cb

由 Niu Chong 提交于 9月 24, 2019

* Dev actor msg queue (#2225)

* async msg queue

* EnqueueAsyncMsg

* Merge wnd python (#2226)

* not ready yet

* segment fix

* fix segment_sum bugs

* 1st wide_n_deep push

* Fix tick in multi node parallel (#2042)

* check in fixes

* fix by adding boxing method

* register tick op

* move code and add more check

* fix typo

* fix bug when filtering op nodes before adding tick

* fix wheel build not adding .so (#2052)

* color plan dot VERSION-2 (#2045)

* run sucessfully on single GPU

* fix 121 for tick (#2069)

* delete unncessary multiply_grad class

* speed up generate time for dot2svg (#2083)

* Add axis conf to bias_add for any axis channel (#2087)

* bias_add completion

* follow comment

* make conf axis required

* Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091)

This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47.

* updated

* fix segment_sum_grad

* fix sbp

* fix segment_sum impl for data parallel

* fix

* remove useless code in segment_kernel_util.h

* add python interface

* fix sigmoid conf

* fix naming error

* fix typo

* temp mod loss sbp

* add LazyAdam

* Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep

* rm useless code

* unsorted_segment_sum

* refactor sigmoid_cross_entropy_loss_kernel to high performance

* Improve sigmoid cross entropy loss grad (#2207)

* remove for loop called cuda kernel

* minor fix

* ../oneflow/python/ops/data_ops.py (#2209)

* fix lazy_adam

* Merge wnd and python (#2214)

* rm ActivationType from op/kernel (#2205)

* refactor sigmoid_cross_entropy_loss

* fix SigmoidGrad::InferBatchAxis

* support part_name_prefix and part_name_suffix_length (#2208)

* rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus

* oneflow.watch for debug

* Dev decode batch size (#2206)

* rm batch_size and piece_size

* merge dev_python

* Update reshape_like_op.cpp (#2213)

* oneflow.parallel (#2211)

* oneflow.parallel

* refactor split_axis => parallel

* rename parallel => distribute

* fix typo: *Parallel => *Distribute

* add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()

* merge dev_python

* fix boxing: P->S(0)

* check in docker build scripts (#2216)

* Dev python widedeep docker (#2218)

* check in docker build scripts

* check in .dockerignore

* rm oneflow.segment_sum

* remove segment_sum

* rm unused file

* rm debug code

* rm debug code

* rm double empty lines

* remove useless comments

* fix send msg (#2227)

* fix reduction_coefficient (#2228)

* refactor ndarray for eq/ne/...

* Dev kernel launch synchronized (#2230)

* IsKernelLaunchSynchronized

* virtual

* refine

* refine

* seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC

* more static_assert

* remove unused task related dot function (#2236)

* remove unused task related dot function

* do not output dot rank info

* Dev non distributed optimizer js (#2234)

* op&kernel&actor

* job

* job_completer

* graph

* format

* fix pd

* fix

* ignore DelPlacementByOpName

* fix auto tick

* JobBuilder

* fix

* config util

* fix

* fix opgrade

* broadcast tick

* fix allreduce

* balance by model size

* GetSoleOutBlobSize

* async_actor_msg_deque

* group

* AddOrMutOpsOnlyOnce

* fix NcclTupleBroadcastGrad

* order

* set nccl order hint

* op_conf

* grad hint

* NcclTupleBroadcastReduceSequencePass

* add missed mutops

* order fix

* try kMdUpdtArea

* fix nccl_order_hint

* fix

* add ti

* tuple_identity_op

* remove useless

* group

* fix dead lock

* force ctrl in

* sc broadcast

* sort obn

* group nccl

* config group_size_mbyte

* non_distributed_optimizer_group_size_mbyte

* format

* stop check

* rm message sending optimization

* refine lazy adam (#2244)

* refine lazy adam

* update

* memory version 2 step 1: replace original concept about mem sharing (#2242)

* mem_shared_id -> mem_block_id;  mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem

* memory version 2 step 1: replace original concept about mem sharing

* record reader multi thread (#2246)

* multi thread

* ComputeThreadPoolSize

* python api

3960d2cb

20 9月, 2019 1 次提交

Dev non distributed optimizer js (#2234) · 2b7c50b0

由 Juncheng 提交于 9月 20, 2019

* op&kernel&actor

* job

* job_completer

* graph

* format

* fix pd

* fix

* ignore DelPlacementByOpName

* fix auto tick

* JobBuilder

* fix

* config util

* fix

* fix opgrade

* broadcast tick

* fix allreduce

* balance by model size

* GetSoleOutBlobSize

* async_actor_msg_deque

* group

* AddOrMutOpsOnlyOnce

* fix NcclTupleBroadcastGrad

* order

* set nccl order hint

* op_conf

* grad hint

* NcclTupleBroadcastReduceSequencePass

* add missed mutops

* order fix

* try kMdUpdtArea

* fix nccl_order_hint

* fix

* add ti

* tuple_identity_op

* remove useless

* group

* fix dead lock

* force ctrl in

* sc broadcast

* sort obn

* group nccl

* config group_size_mbyte

* non_distributed_optimizer_group_size_mbyte

* format

* stop check

* rm message sending optimization

2b7c50b0

17 9月, 2019 1 次提交

remove parallel policy； rm FC/rnn/embedding look up op/kernel (#2215) · c014bf3e

由 cheng cheng 提交于 9月 17, 2019

* remove parallel policy

* rm FC/rnn/embedding_look_up op/kernel

* add check data parallel for conv/layer_norm op

* bugfix: bias add + use math_add when batch size = 1

c014bf3e

Oneflow-Inc / oneflow 上一次同步 2 年多

Oneflow-Inc / oneflow
上一次同步 2 年多