- 01 11月, 2021 1 次提交
-
-
由 Zhanghuihong Guan 提交于
* initial commit, add code for async construct tensor from numpy array * inital commit to change Maybe to Optional * delete redundant code * replace Maybe with Optional * fix compile errors * format code * changes based on review * format code, fix based on review * format code * fix multiclient type * changes based on review * changes based on review * unify calling to IsMultiClirnt * refector multi_client related code * restore InMultiClient interface * double check for unnecessary changes * remove unnecessary changes * format code * Update oneflow/api/python/symbol/job_conf_symbol.cpp * Update oneflow/api/python/symbol/op_conf_symbol.cpp * Update oneflow/api/python/symbol/op_node_signature_symbol.cpp * Update oneflow/core/common/optional.h * Update oneflow/api/python/symbol/string_symbol.cpp * Update oneflow/api/python/symbol/scope_symbol.cpp * Update oneflow/api/python/symbol/placement_symbol.cpp * Update oneflow/api/python/symbol/op_conf_symbol.cpp Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: NTwice <i@twice.moe>
-
- 15 9月, 2021 1 次提交
-
-
由 qq_22305325 提交于
* mv_boxing_folder_to_core * minor fix * cpu_eager_s0_to_s0 * refine * refine * Update naive_s0_to_s0_boxing.cpp * refine * refine * Update eager_s0_to_s0_op.cpp * refine * refine * minor fix * refine * refactor with TensorSliceView && support asymmetric S(x) To S(y) * refine * refine * refine * minor fix * rename files * gpu naive s to s support * add micro judge * del outdate head file * minor fix * refine * refine * refine * refine * refine Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 10 9月, 2021 1 次提交
-
-
由 leaves-zwx 提交于
* change * ofrecord_dataset compitable * add comment * specify 1n1d sbp Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 03 9月, 2021 1 次提交
-
-
由 Li Xinqi 提交于
* GetBroadcastGroup * fix comment typo. * broadcast shape and dtype * 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal * fix wrong use of LocalRank * 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta. * don't set consistent_id when recursively calling eager consistent op interpreter. * decompose nd_sbp boxing * disable checking consistent tensor meta recursively. * GetDecomposableEquivalent * fix a unittest case bug * fix a bug in unittest * fix compiler complain * add unitests for CalcDecomposableEquivalentShapeAndNdSbpPair * InitNdSbpValidTransformationAxisSequence * DecomposeIntoNaiveTransformations * fix compiler complains * move several unitests in parallel_desc_test.cpp into placement_sbp_util_test.cpp * abstract_consistent_to_consistent_op_expr * fix compiler complaint * refactor consistent-to-consistent eager consisitent op interpreter * fix compiler complaint * refactor ConsistentToConsistentOpExpr * lazy interpreter (#5903) * fix bugs about consistent_id * refactor functional::ToConsistent * refactor GetNdSbp * fix compiler complaints * upgrade gtest and fix static check error * update head file index * fix bug * modify path of gtest lib * refactor NaiveNdSbpBoxingInterpreter to BoxingExpr(symmetric-nd-sbp-to-nd-sbp) * fix compiler complaints * Update gmock_headers.txt * Update gtest_headers.txt * fix bug about disable checking consistent meta in local to consistent functor * fix include bug Co-authored-by: clackhan <han_binbin@163.com> Co-authored-by: Nleaves-zwx <kunta0932@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Nliufengwei <2472937968@qq.com> Co-authored-by: NTwice <i@twice.moe> Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
-
- 31 8月, 2021 1 次提交
-
-
由 liufengwei0103 提交于
* add loop p2b test * refine * refine * use more data * use more data * refactor test_sync_and_async_allreduce.py * sequential all comm_net ops * remove unused a bash test file * reset pool_size of async_launced_nccl to high water mark * Don't sequentail stream comm_net and stream async_launched_nccl * sequential nccl if defined(WITH_CUDA) * EnvGlobalObjectsScope has no responsibility for MakeParallelDesc4Device * auto format by CI * Update object_msg_core.h NOLINT * auto format by CI * default op_device * refactor flow.F.xxx to flow._C.xxx Co-authored-by: NXinqi Li <lixinqi0703106@163.com> Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
-
- 30 8月, 2021 1 次提交
-
-
由 ZZK 提交于
* fix error * fix unimplemented to unimplementederror * fix Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 20 8月, 2021 2 次提交
-
-
由 Li Xinqi 提交于
* cuda base cpu mpi boxing * cpu_mpi * fix conflicts * add cpu mpi unittests * more checks and unittests * abstract_consistent_to_consistent_op_expr * fix compiler complaint * refactor consistent-to-consistent eager consisitent op interpreter * fix compiler complaint * refactor ConsistentToConsistentOpExpr * lazy interpreter (#5903) * fix bugs about consistent_id * more test_consistent_cast unittests * refactor functional::ToConsistent * refactor GetNdSbp * fix compiler complaints * refactor GetDevice4CurrentProcessCtx * fix error Co-authored-by: clackhan <han_binbin@163.com> Co-authored-by: Nleaves-zwx <kunta0932@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
由 qq_22305325 提交于
* abstract_consistent_to_consistent_op_expr * fix compiler complaint * refactor consistent-to-consistent eager consisitent op interpreter * fix compiler complaint * refactor ConsistentToConsistentOpExpr * lazy interpreter (#5903) * fix bugs about consistent_id * refactor functional::ToConsistent * refactor GetNdSbp * Update eager_consistent_op_interpreter.cpp * Update eager_mirrored_op_interpreter.cpp * fix error * fix error * auto format by CI * Update nd_sbp.h * refine identity boxing * fix sync checkmeta error * avoid consistent id check in lazy Co-authored-by: NXinqi Li <lixinqi0703106@163.com> Co-authored-by: Nleaves-zwx <kunta0932@gmail.com> Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
-
- 13 8月, 2021 1 次提交
-
-
由 Li Xinqi 提交于
* GetBroadcastGroup * fix comment typo. * broadcast shape and dtype * 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal * fix wrong use of LocalRank * 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta. * don't set consistent_id when recursively calling eager consistent op interpreter. * refactor tensor_rpc_util.h * add GlobalProcessCtx::NodeId and GetParallelId4CurrentProcessCtx * fix compiler complain * fix compiler complain * address pr comments Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Ncheng cheng <472491134@qq.com>
-
- 11 8月, 2021 1 次提交
-
-
由 daquexian 提交于
* add consistent tensor meta in stateful local opkernel Signed-off-by: Ndaquexian <daquexian566@gmail.com> * cache parallel ctx Signed-off-by: Ndaquexian <daquexian566@gmail.com> * reformat Signed-off-by: Ndaquexian <daquexian566@gmail.com> * refine Signed-off-by: Ndaquexian <daquexian566@gmail.com> * delete CachedGetParallelContext4CurrentProcessCtx for now Signed-off-by: Ndaquexian <daquexian566@gmail.com> * refine and add tests Signed-off-by: Ndaquexian <daquexian566@gmail.com> * ThreadLocalCopiable -> ThreadLocal Signed-off-by: Ndaquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
-
- 10 8月, 2021 1 次提交
-
-
由 leaves-zwx 提交于
* get parallel_id and parallel_num through rank and world size in dpp * address review * coco reader support parallel distribution * fix that device not set * test parallel for ofrecord reader * update test * update test * erase illegal check * test success * add GPTIndexedBinDataReader module * test distributed GPTIndexedBinDataReader * fix that LogicalTensorDesc4ArgNameAndIndex unimplemented in local kernel init context * graph handle TensorTuple output * fix COCOReader forward * update test_gpt_data_loader * test_coco_reader * auto format by CI * fix block call return not supporting tensor tuple * fix IsMirroredParallelContext * convert TensorTuple to tuple of Tensor * check Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 03 8月, 2021 1 次提交
-
-
由 qq_22305325 提交于
* support_tensor_to/to_local * export consistent_tensor.to_local() * refine code * export tensor.to()... * refine code * refine code * optimize code * refine code * refine * back up * add tensor.to func * make of_format * remove to in pyTensor * sync gpu data * refine * refine * refine * refine * refine * refine * refine * refine * refine * backup * refine * rebase * check in gen py * merge master and fix bugs * address pr comments * address pr comments * auto format by CI * remove boxing * refine * Fix optional * remove to in tensor.cpp * update * Support symbol placement type in functional. * add sbp and sbp list arg * refine * use functional * refactor CastConsistentOpExpr * to_consistent(flow.B) backward * Cache op expr * add EagerNcclOpKernelState * refine * refine * refine * refine * refine * refine * minor fix * capture OpInterpContext * unimplemented apply * add GetNdSbp * add mutex * refine * merge EagerConsistentTensorImpl::NewWithPhyTensor and EagerConsistentTensorImpl::NewWithoutPhyTensor into EagerConsistentTensorImpl::New * rename functiona SyncData to SyncMetaAndData * of_format * add to_local to pybind * add placement_sbp_util * minor fix * sync shape and data when tensor_to_local * fix rpc_token bugs * refactor AsyncRpcCtx * set logical_shape correctly * simplify implementation of consistent_tensor.to_local * initialize rpc_token with zero * refactor grad functions of to_consistent/to_local * reformat and address pr comment * reformat * refactor eager_nccl_reduce lernel Co-authored-by: Ntsai <jackalcooper@gmail.com> Co-authored-by: NXinqi Li <lixinqi0703106@163.com> Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Nhjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 31 7月, 2021 1 次提交
-
-
由 Li Xinqi 提交于
* rebase * check in gen py * merge master and fix bugs * address pr comments * address pr comments * auto format by CI * rebase * address pr comments * auto format by CI * functional python_arg * reuse ctrl rpc token for avoiding long time timeout waiting. * fix compiler complaints * auto format by CI * auto format by CI * remove unused files * fix return type error on gcc 4.8.5 Signed-off-by: Ndaquexian <daquexian566@gmail.com> * auto format by CI * fix return type error in xrt Signed-off-by: Ndaquexian <daquexian566@gmail.com> * fix tick ibn sbp signature * auto format by CI Co-authored-by: Ntsai <jackalcooper@gmail.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Ndaquexian <daquexian566@gmail.com>
-
- 30 7月, 2021 1 次提交
-
-
由 Li Xinqi 提交于
* rebase * check in gen py * merge master and fix bugs * address pr comments * address pr comments * auto format by CI * functional python_arg * auto format by CI * remove unused files * fix return type error on gcc 4.8.5 Signed-off-by: Ndaquexian <daquexian566@gmail.com> * auto format by CI * fix return type error in xrt Signed-off-by: Ndaquexian <daquexian566@gmail.com> * fix tick ibn sbp signature * auto format by CI Co-authored-by: Ntsai <jackalcooper@gmail.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Ndaquexian <daquexian566@gmail.com>
-
- 01 7月, 2021 1 次提交
-
-
由 Li Xinqi 提交于
* Device::compute_dep_object_ * sequantialize instructions in the same stream. * refactor AttrMap * refactor Tensor * Export ConsistentTensor::is_cuda * remove ConsistentTensor::blob_object * refactor TensorImpl * minor fix * fix compiler' complains * Implements EagerConsistentTensorImpl::New * minor fix * fix compiler complains * remove unused code * skip test_creating_consistent_tensor * backup code * Symbol::shared_from_symbol * remove redundant header file includes * fix bug in Symbol::shared_from_symbol * symbolize ParallelDesc and ParallelDistribution * symbolize Scope::GetParallelDesc() * IsScalarType * fix compiler complains * InputConsistentTensorMeta * refactor Scope with PlacementScope * fix bug in exporting Scope to python * backup code * refactor DType * fix compiler complains * backup code * DType is only allowed to be used in python code * backup code * dtype api bugfix * fix error on exiting Signed-off-by: Ndaquexian <daquexian566@gmail.com> * lazily get rank Signed-off-by: Ndaquexian <daquexian566@gmail.com> * Export const DType* into python * minor fix * fix bug * refine * refactor signature of OpExpr::InferLogicalShapeAndDtype * fix bug * backup_code * fix bug * refactor SbpXXX to cfg::SbpXXX * merge refactor_sbp_to_cfg_sbp * fix bug * Infer ConsistentTensorMeta * Implement EagerConsistentInterpret::ApplyImpl * 1) move XXXTensorMeta into the new file tensor_meta.h; 2) add new Class ConsistentTensorInferCache * add class ConsistentTensorInferResult * remove unused OpArgMutConsistentTensorMeta::parallel_distribution_ * fix stack-overflow bug in Tensor::mut_eager_mirrored_tensor_impl * ignore empty parallel distribution constaint * fix bug * add explicit of cfg * fix xla compile bug * auto format by CI * fix according comment * fix bug Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: clackhan <han_binbin@163.com> Co-authored-by: Ndaquexian <daquexian566@gmail.com> Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
-
- 24 4月, 2021 1 次提交
-
-
由 daquexian 提交于
* PhyInstrOperand * CHECK_NOTNULL * LocalCallOpKernelUtil * implement LocalCallOpKernelUtil * fix WithOpInferContext/WithComputeInferContext * fix tensor->blob_object() to tensor->eager_blob_object() * init commit Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update Signed-off-by: Ndaquexian <daquexian566@gmail.com> * init more ctx in constructor Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update Signed-off-by: Ndaquexian <daquexian566@gmail.com> * test Signed-off-by: Ndaquexian <daquexian566@gmail.com> * set parallel_desc according to scope Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update Signed-off-by: Ndaquexian <daquexian566@gmail.com> * LogicalRun -> PhysicalRun * refine stateful op kernel * refine * refine * build eager blob object list before calling builder, rename TensorsPtr Signed-off-by: Ndaquexian <daquexian566@gmail.com> * Fix device CHECK_EQ * update * update Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update * refine * code style updates * add const quantifiers Signed-off-by: Ndaquexian <daquexian566@gmail.com> * fix comments * update tests Signed-off-by: Ndaquexian <daquexian566@gmail.com> * revert api/python/symbol/placement_symbol.cpp Signed-off-by: Ndaquexian <daquexian566@gmail.com> * update ForEachOutputTensor, replace auto with const auto& Signed-off-by: Ndaquexian <daquexian566@gmail.com> * add local dep objects in local opkernel Signed-off-by: Ndaquexian <daquexian566@gmail.com> Co-authored-by: Nlixinqi <lixinqi0703106@163.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 31 3月, 2021 1 次提交
-
-
由 cheng cheng 提交于
* Insert NCCL logical op pass support hierarchy * Add NCCL logical 2D SBP op/kernel support (*P)->(*B) * Add NCCL logical 2D SBP op/kernel support (P*)->(B*) * Fix bug and support (*, S(0)) -> (*, B) [dim1:AllGather] and (*, S(in)) -> (*, S(out)) [dim1:All2All] * Fix BUG and runnable * fix hierarchy equal bug
-
- 23 3月, 2021 1 次提交
-
-
由 qq_22305325 提交于
* add_node_size_config * add IsThisProcessMaster * fix error * add CHECK_EQ * mutil_process_on_single_client * fix error * minor fix * fix error * fix bug * new_ci_about_mutil_process_on_single_clien * fix unittest.py * refactor parallel_desc * fix bug * backup code * fix ResourceDesc::ResourceDesc * make of_format * fix test bug * fix bug * backup * minor fix * backup * minor fix * del useless log * optimize code * add ci test * Update test.yml * minor fix * backup * run callback_notifier after all critical sections done * group ticks originated from different critical sections * minor fix about plan display * backup * refine code * minor fix * add comment * fix path error * add CHECK * do not use oneflow_worker * minor fix Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Nlixinqi <lixinqi0703106@163.com> Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
-
- 10 3月, 2021 1 次提交
-
-
由 guo ran 提交于
* parallel_conf add hierarchy * refine * fix * fix * refine * fix * fix Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 06 1月, 2021 1 次提交
-
-
由 qq_22305325 提交于
* remove BlobDef * fix code format * del uneless line * del uneless line * fix mistake * tmp storage * fix bug * refactor ArgBlobDef * fix bug * refactor_consist_blob * fix bug * fix code format * fix distribute test bug * fix op test bug * refactor lazy consist blob * fix distribute test bug * fix according to comment * remove const * fix code format * fix bug * refactor_mirrored_blob * fix bug * fix bug * fix bug * fix bug * rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS * rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS * replace oneflow_api with flow in test file * fix mirrored bug * fix EagerMirroredBlob init bug * fix get_dtype bug
-
- 26 12月, 2020 1 次提交
-
-
由 qq_22305325 提交于
* parallel desc with symbol_id * migrate ParallelDescSymbol * fix code format * fix bug in oneflow_testexe * Make oneflow worker docker stay alive for 6 hours * exception * except in pybind11 and python * finetune api * print traceback * fix bug * fix format * ParallelDesc::cfg_parallel_conf * remove traceback in test_checkpoint * fix python codeformat * del job_build_and_infer_cfg_error.py * optimize api struct * add CompileOptionWrongError * rename OF_COMPLIE_OPTION_EEEOR Co-authored-by: Nlixinqi <lixinqi0703106@163.com> Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- 10 10月, 2020 1 次提交
-
-
由 Li Xinqi 提交于
* TransportStreamType * no constexpr specifier for Clamp<T> * refine signature of Send * GetTransportInstructionParallelConfs * Send/Receive multi blobs by one instruction * refine: static inline -> inline static * refine comment according to google style guide * implement Send/Receive by Grpc * reimplement Send/Receive * oneflow.eager_assign_121 * BoxingInterNodeOneToOne * 2node_test_assign * rename OF_BARRIAER * add eager_2node_test.py * InitLazyGlobalSession if eager execution not enabled * remove Global<LbiDiffWatcherInfo> * add TODO() comments for OF_SESSION_BARRIER under directory core/comm_network * import atexit in python/framework/unittest.py * fix minor bug in test_assign.py * fix macro name error: PLATFORM_POSIX -> OF_PLATFORM_POSIX
-
- 07 8月, 2020 1 次提交
-
-
由 Juncheng 提交于
-
- 26 7月, 2020 1 次提交
-
-
由 qq_22305325 提交于
* refactor ParallelConf * refactor parallel_conf * fix parallel_conf init * refactor opkernel_instruction_type init/compute * fix test_cpu_only_user_op * remove notes * fux non_distributed_optimizer_pass.cpp bug * fix code style * fix a small bug Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
-
- 23 7月, 2020 1 次提交
-
-
由 Shenghang Tsai 提交于
* add license at root dir * check in empty files * rm space * check in script * update script * fix bug * add print * fix * add exit * add to of_format * add CI task * fix license * Revert "fix license" This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57. * only add once * quick fix * fix script * dont fmt empty file * fix * quick fix * fix py * add license * fix exit * add license for hpp * add license * license new vm files Co-authored-by: Ntsai <caishenghang@oneflow.org>
-
- 04 7月, 2020 2 次提交
- 30 6月, 2020 1 次提交
-
-
由 Liang Depeng 提交于
* make op kernel lookup failure message more user friendly * make op kernel lookup failure message more user friendly Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
-
- 24 6月, 2020 2 次提交
-
-
由 OuYang Yu 提交于
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
-
由 Depeng Liang 提交于
* feat: add high order bool implementationfor cpp * use 'SetIsMatchedHob' to substitute 'SetIsMatchedPred' in user_op kernel registration
-
- 23 6月, 2020 1 次提交
-
-
由 OuYang Yu 提交于
* Add vm code * Remove duplicate include
-
- 26 11月, 2019 1 次提交
-
-
由 Li Xinqi 提交于
* cmake find python note when version less 3.14 (#2286) * fix bug: reduce split kernel inplace (#2297) * Dev bias add (#2299) * use bias add * fix * bias_add * bias add half * fix * reinterpret_cast * fix half * HALF * fix * ADD_DEFAULT_KERNEL_CREATOR * fix * format * Fix dev python test (#2294) * add decode random * fix decode random actor * fix dev_python test scripts * fix batch_size test scripts * fix * Memory Version 2.0 Step 2: MemSharedAndReused between jobs (#2267) * MemBlockProto and ChunkProto * create mem block and chunk after improver * interface merge mem block and chunk between sub plans * merge chunk between jobs for memory reuse * using memory zone unique id replace memory case hash * merge interface op mem block between jobs for mem shared * gen GlobalCriticalSection by mem block id and chunk id * check mem block and chunk valid before runtime * Refactor: RegstMgr ; allocate memory by mem block and chunk instead of regst * fix bug; and pass test * fig bug: init chunk_id_count in id_manager * reuse copyHd out mem between jobs * PushPlan and PullPlan for memblock and chunk * refine merge mem block / chunk in oneflow.cpp * at(i); * GetOpName2JobId2TaskProtos functional * using output ptr; pass test AlexNet and Resnet * Dev cuda 9 arch 70 (#2318) * kCudaAlignSize = 256 * always compute_70 * __CUDA_API_VERSION >= 10000 * __CUDA_API_VERSION >= 10000 * disable_all_reduce_sequence * Fix cuda9 cudnn turing issue (#2329) * fix cuda 9 issus on turing device * CUDA_VERSION * no cuda check * bias add kernel gpu half (#2330) * mem_block=>header_mem_block (#2338) * speedup oneflow compilation * identity_sbp_conf * DropOut Version2 (#2355) * random mask like op conf; refine dropout op in python * remove useless dropout kernel conf * implement of random mask like op * refine dropout op * refine dropout grad op * refine generate dropout backward * random mask like kernel * refine dropout (grad) kernel * fix link problem for template separated compile * fix bug and pass test * dropout kernel for half * add check for dropout mask input data type * bugfixs * Remove IsOpFloat32() in auto_mixed_precision.cpp (#2358) * fuse op/kernl to 1 cpp * refine for review * fix bug * Refactor Kernel Registry for more flexible registration (#2363) * feat: update KernelRegistration and add KernelRegValProto * Refactor Kernel Registry for more flexible registration * Remove unused kernel_reg_value.proto * Memory Version 2.0 Step 3: MemReused in job (#2319) * use_memory_allocation_algorithm_v2 for switch improver mem block id * reuse plan task graph and ctrl edge for inferred mem block * refine interface; InJobMemSharingUtil * navie merge memory big chain; gen regst apply/release queue; handle for inplace hint regst * generate regst 2 mutual exclusion regsts * bugfix: apply should before release * interface for multi-thread run algorithm get mem block offset result * selet best algorithm to set mem block id and mem block offset * set mem block for inplace consumer regst * 3 algorithm interface * half implement of algo 1 * implement of algorithm0_OfColorImproved * runnable in 1 machine 1 device * Memory Chain * merge MemoryChain and pass Correctness test of alexnet and resnet50 * bugfixs: continues inplace consume relationship in bert-base fp16 * erase useless info in MemoryChain * implement of BfcAllocator and Tf_Bfc algorithm * use bfc algo and fix bug * only use default algo * renme in_job_* => intra_job_* * rename: InJob* => IntraJob* * rename: 1) apply_regsts_queue => alloc_regsts_queue; 2) release_regsts_queue => free_regsts_queue * rename function name in job/intra_job_mem_sharing_util.cpp * rename variable names in job/intra_job_mem_sharing_util.cpp: 1) *apply* => *alloc*; 2) *release* => *free* * refactor FindFreeOffset => FindFreeOffsetAndNewBufferSize * rename method: DeallocateRaw => FreeRaw * rename varable for review * use enum for mem reused algorithm and add python interface * fix sbp infer (#2373) * mv addr calculation out of decoder (#2374) * use tmp blob for temp storage (#2375) * INDEX_DATA_TYPE_SEQ (#2381) * refine include (#2382) * refine include * format format * element_wise_mul (#2383) * gather refine (#2384) * Dev fix sbp (#2388) * fix sbp * fix sbp * remove VirtualGenKernelConf * rename Read to ReadFully (#2389) * Dev parallel cast (#2391) * parallel cast * op_conf * refine * Dev auto zero padding (#2393) * auto_zero_padding * auto_zero_padding * fix * fix input_mask and token_type_id (#2398) * fix job launch (#2401) * fix sbp bug (#2402) * fix sbp * fix * add missing header files (#2410) * refactor cnn model tests (#2411) * refactor cnn model tests * reformat README.md * reformat README.md * refactor ndarray_reduce (#2412) * fix inplace reachability bug (#2413) * refactor gpu relu (#2414) * refactor gpu relu * CHECK_KERNEL_SAFE_INT32 * there may be a subtle cuda bug in ((float) x < 0) * refactor ndarray_reduce (#2405) * refactor ndarray_reduce * refactor relu/bias_add * refactor relu * refactor relu * refactor bias_add * refactor relu/bias_add * fix inplace_lbi bug * refactor addition * IsKernelSafeInt32 * CUDA_1D_KERNEL_LOOP_T * CUDA_1D_KERNEL_LOOP_T * If add (#2415) * refactor ndarray_reduce * refactor relu/bias_add * refactor relu * refactor relu * refactor bias_add * refactor relu/bias_add * fix inplace_lbi bug * refactor addition * IsKernelSafeInt32 * CUDA_1D_KERNEL_LOOP_T * CUDA_1D_KERNEL_LOOP_T * add unless oprand is nonzero * Clear session (#2416) * oneflow.clear_default_session * fix bugs in oneflow.config.machine * refactor function return type (#2417) * fix for py2 (#2418) * blob parallel conf * Pr watch scope (#2419) * pr oneflow.watch* * merge more code to pass watch_scope.py * TODO: input_blob_def.parallel_conf * fix reexport of identity op * merge dev_quick_dirty_object_detection * oneflow.cluster (#2423) * oneflow.cluster * no alias for oneflow.cluster.* * mv cpp_logging_conf from config_proto to cluster_proto * rename: cluster => env * rename: Environment => Session * Free port (#2427) * oneflow.cluster * no alias for oneflow.cluster.* * mv cpp_logging_conf from config_proto to cluster_proto * rename: cluster => env * rename: Environment => Session * auto find a free port for single node environment * localhost only * Dev single processor test (#2430) * oneflow.cluster * no alias for oneflow.cluster.* * mv cpp_logging_conf from config_proto to cluster_proto * rename: cluster => env * rename: Environment => Session * auto find a free port for single node environment * localhost only * single process test * Cluster::WorkerLoop * delete unnecessary OF_BARRIER_ALL * no longer fork children processes to run tests * format * fix align byte size bug (#2436) * fix align bugs (#2440) * fix: GetNumOfLoDLevels lack return * minor script fix and update * update script * remove redundant function
-
- 17 11月, 2019 1 次提交
-
-
由 lixinqi 提交于
-
- 17 10月, 2019 1 次提交
-
-
由 lixinqi 提交于
-
- 12 10月, 2019 1 次提交
-
-
由 lixinqi 提交于
-
- 11 10月, 2019 1 次提交
-
-
由 lixinqi 提交于
-
- 10 10月, 2019 1 次提交
-
-
由 lixinqi 提交于
-
- 24 9月, 2019 1 次提交
-
-
由 Niu Chong 提交于
* Dev actor msg queue (#2225) * async msg queue * EnqueueAsyncMsg * Merge wnd python (#2226) * not ready yet * segment fix * fix segment_sum bugs * 1st wide_n_deep push * Fix tick in multi node parallel (#2042) * check in fixes * fix by adding boxing method * register tick op * move code and add more check * fix typo * fix bug when filtering op nodes before adding tick * fix wheel build not adding .so (#2052) * color plan dot VERSION-2 (#2045) * run sucessfully on single GPU * fix 121 for tick (#2069) * delete unncessary multiply_grad class * speed up generate time for dot2svg (#2083) * Add axis conf to bias_add for any axis channel (#2087) * bias_add completion * follow comment * make conf axis required * Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091) This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47. * updated * fix segment_sum_grad * fix sbp * fix segment_sum impl for data parallel * fix * remove useless code in segment_kernel_util.h * add python interface * fix sigmoid conf * fix naming error * fix typo * temp mod loss sbp * add LazyAdam * Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep * rm useless code * unsorted_segment_sum * refactor sigmoid_cross_entropy_loss_kernel to high performance * Improve sigmoid cross entropy loss grad (#2207) * remove for loop called cuda kernel * minor fix * ../oneflow/python/ops/data_ops.py (#2209) * fix lazy_adam * Merge wnd and python (#2214) * rm ActivationType from op/kernel (#2205) * refactor sigmoid_cross_entropy_loss * fix SigmoidGrad::InferBatchAxis * support part_name_prefix and part_name_suffix_length (#2208) * rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus * oneflow.watch for debug * Dev decode batch size (#2206) * rm batch_size and piece_size * merge dev_python * Update reshape_like_op.cpp (#2213) * oneflow.parallel (#2211) * oneflow.parallel * refactor split_axis => parallel * rename parallel => distribute * fix typo: *Parallel => *Distribute * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute() * merge dev_python * fix boxing: P->S(0) * check in docker build scripts (#2216) * Dev python widedeep docker (#2218) * check in docker build scripts * check in .dockerignore * rm oneflow.segment_sum * remove segment_sum * rm unused file * rm debug code * rm debug code * rm double empty lines * remove useless comments * fix send msg (#2227) * fix reduction_coefficient (#2228) * refactor ndarray for eq/ne/... * Dev kernel launch synchronized (#2230) * IsKernelLaunchSynchronized * virtual * refine * refine * seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC * more static_assert * remove unused task related dot function (#2236) * remove unused task related dot function * do not output dot rank info * Dev non distributed optimizer js (#2234) * op&kernel&actor * job * job_completer * graph * format * fix pd * fix * ignore DelPlacementByOpName * fix auto tick * JobBuilder * fix * config util * fix * fix opgrade * broadcast tick * fix allreduce * balance by model size * GetSoleOutBlobSize * async_actor_msg_deque * group * AddOrMutOpsOnlyOnce * fix NcclTupleBroadcastGrad * order * set nccl order hint * op_conf * grad hint * NcclTupleBroadcastReduceSequencePass * add missed mutops * order fix * try kMdUpdtArea * fix nccl_order_hint * fix * add ti * tuple_identity_op * remove useless * group * fix dead lock * force ctrl in * sc broadcast * sort obn * group nccl * config group_size_mbyte * non_distributed_optimizer_group_size_mbyte * format * stop check * rm message sending optimization * refine lazy adam (#2244) * refine lazy adam * update * memory version 2 step 1: replace original concept about mem sharing (#2242) * mem_shared_id -> mem_block_id; mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem * memory version 2 step 1: replace original concept about mem sharing * record reader multi thread (#2246) * multi thread * ComputeThreadPoolSize * python api
-
- 20 9月, 2019 1 次提交
-
-
由 Juncheng 提交于
* op&kernel&actor * job * job_completer * graph * format * fix pd * fix * ignore DelPlacementByOpName * fix auto tick * JobBuilder * fix * config util * fix * fix opgrade * broadcast tick * fix allreduce * balance by model size * GetSoleOutBlobSize * async_actor_msg_deque * group * AddOrMutOpsOnlyOnce * fix NcclTupleBroadcastGrad * order * set nccl order hint * op_conf * grad hint * NcclTupleBroadcastReduceSequencePass * add missed mutops * order fix * try kMdUpdtArea * fix nccl_order_hint * fix * add ti * tuple_identity_op * remove useless * group * fix dead lock * force ctrl in * sc broadcast * sort obn * group nccl * config group_size_mbyte * non_distributed_optimizer_group_size_mbyte * format * stop check * rm message sending optimization
-
- 17 9月, 2019 1 次提交
-
-
由 cheng cheng 提交于
* remove parallel policy * rm FC/rnn/embedding_look_up op/kernel * add check data parallel for conv/layer_norm op * bugfix: bias add + use math_add when batch size = 1
-