1. 01 11月, 2021 1 次提交
    • Z
      Change maybe to optional (#6611) · 380d2414
      Zhanghuihong Guan 提交于
      * initial commit, add code for async construct tensor from numpy array
      
      * inital commit to change Maybe to Optional
      
      * delete redundant code
      
      * replace Maybe with Optional
      
      * fix compile errors
      
      * format code
      
      * changes based on review
      
      * format code, fix based on review
      
      * format code
      
      * fix multiclient type
      
      * changes based on review
      
      * changes based on review
      
      * unify calling to IsMultiClirnt
      
      * refector multi_client related code
      
      * restore InMultiClient interface
      
      * double check for unnecessary changes
      
      * remove unnecessary changes
      
      * format code
      
      * Update oneflow/api/python/symbol/job_conf_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_conf_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_node_signature_symbol.cpp
      
      * Update oneflow/core/common/optional.h
      
      * Update oneflow/api/python/symbol/string_symbol.cpp
      
      * Update oneflow/api/python/symbol/scope_symbol.cpp
      
      * Update oneflow/api/python/symbol/placement_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_conf_symbol.cpp
      Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
      Co-authored-by: NTwice <i@twice.moe>
      380d2414
  2. 15 9月, 2021 1 次提交
    • qq_22305325's avatar
      Naive Eager S to S (#6186) · 5c079988
      qq_22305325 提交于
      * mv_boxing_folder_to_core
      
      * minor fix
      
      * cpu_eager_s0_to_s0
      
      * refine
      
      * refine
      
      * Update naive_s0_to_s0_boxing.cpp
      
      * refine
      
      * refine
      
      * Update eager_s0_to_s0_op.cpp
      
      * refine
      
      * refine
      
      * minor fix
      
      * refine
      
      * refactor with TensorSliceView && support asymmetric S(x) To S(y)
      
      * refine
      
      * refine
      
      * refine
      
      * minor fix
      
      * rename files
      
      * gpu naive s to s support
      
      * add micro judge
      
      * del outdate head file
      
      * minor fix
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      5c079988
  3. 10 9月, 2021 1 次提交
  4. 03 9月, 2021 1 次提交
    • L
      Decompose nd sbp boxing (#5800) · 9c464a31
      Li Xinqi 提交于
      * GetBroadcastGroup
      
      * fix comment typo.
      
      * broadcast shape and dtype
      
      * 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal
      
      * fix wrong use of LocalRank
      
      * 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta.
      
      * don't set consistent_id when recursively calling eager consistent op interpreter.
      
      * decompose nd_sbp boxing
      
      * disable checking consistent tensor meta recursively.
      
      * GetDecomposableEquivalent
      
      * fix a unittest case bug
      
      * fix a bug in unittest
      
      * fix compiler complain
      
      * add unitests for CalcDecomposableEquivalentShapeAndNdSbpPair
      
      * InitNdSbpValidTransformationAxisSequence
      
      * DecomposeIntoNaiveTransformations
      
      * fix compiler complains
      
      * move several unitests in parallel_desc_test.cpp into placement_sbp_util_test.cpp
      
      * abstract_consistent_to_consistent_op_expr
      
      * fix compiler complaint
      
      * refactor consistent-to-consistent eager consisitent op interpreter
      
      * fix compiler complaint
      
      * refactor ConsistentToConsistentOpExpr
      
      * lazy interpreter (#5903)
      
      * fix bugs about consistent_id
      
      * refactor functional::ToConsistent
      
      * refactor GetNdSbp
      
      * fix compiler complaints
      
      * upgrade gtest and fix static check error
      
      * update head file index
      
      * fix bug
      
      * modify path of gtest lib
      
      * refactor NaiveNdSbpBoxingInterpreter to BoxingExpr(symmetric-nd-sbp-to-nd-sbp)
      
      * fix compiler complaints
      
      * Update gmock_headers.txt
      
      * Update gtest_headers.txt
      
      * fix bug about disable checking consistent meta in local to consistent functor
      
      * fix include bug
      Co-authored-by: qq_22305325's avatarclackhan <han_binbin@163.com>
      Co-authored-by: Nleaves-zwx <kunta0932@gmail.com>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      Co-authored-by: Nliufengwei <2472937968@qq.com>
      Co-authored-by: NTwice <i@twice.moe>
      Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
      9c464a31
  5. 31 8月, 2021 1 次提交
  6. 30 8月, 2021 1 次提交
  7. 20 8月, 2021 2 次提交
  8. 13 8月, 2021 1 次提交
    • L
      Decorator 4 disable recursive boxing call (#5796) · c071635f
      Li Xinqi 提交于
      * GetBroadcastGroup
      
      * fix comment typo.
      
      * broadcast shape and dtype
      
      * 1) rm THREAD_LOCAL_CACHED; 2) fix bugs in ThreadLocal
      
      * fix wrong use of LocalRank
      
      * 1) a decorator for disabling recursive boxing call; 2) a decorator for checking consistent tensor meta.
      
      * don't set consistent_id when recursively calling eager consistent op interpreter.
      
      * refactor tensor_rpc_util.h
      
      * add GlobalProcessCtx::NodeId and GetParallelId4CurrentProcessCtx
      
      * fix compiler complain
      
      * fix compiler complain
      
      * address pr comments
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      Co-authored-by: Ncheng cheng <472491134@qq.com>
      c071635f
  9. 11 8月, 2021 1 次提交
  10. 10 8月, 2021 1 次提交
    • L
      Get parallel_id and parallel_num through rank and world size in DDP (#5717) · 963cc01b
      leaves-zwx 提交于
      * get parallel_id and parallel_num through rank and world size in dpp
      
      * address review
      
      * coco reader support parallel distribution
      
      * fix that device not set
      
      * test parallel for ofrecord reader
      
      * update test
      
      * update test
      
      * erase illegal check
      
      * test success
      
      * add GPTIndexedBinDataReader module
      
      * test distributed GPTIndexedBinDataReader
      
      * fix that LogicalTensorDesc4ArgNameAndIndex unimplemented in local kernel init context
      
      * graph handle TensorTuple output
      
      * fix COCOReader forward
      
      * update test_gpt_data_loader
      
      * test_coco_reader
      
      * auto format by CI
      
      * fix block call return not supporting tensor tuple
      
      * fix IsMirroredParallelContext
      
      * convert TensorTuple to tuple of Tensor
      
      * check
      Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      963cc01b
  11. 03 8月, 2021 1 次提交
    • qq_22305325's avatar
      Support tensor.to()/to_local() (#5271) · a72c21d9
      qq_22305325 提交于
      * support_tensor_to/to_local
      
      * export consistent_tensor.to_local()
      
      * refine code
      
      * export tensor.to()...
      
      * refine code
      
      * refine code
      
      * optimize code
      
      * refine code
      
      * refine
      
      * back up
      
      * add tensor.to func
      
      * make of_format
      
      * remove to in pyTensor
      
      * sync gpu data
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * backup
      
      * refine
      
      * rebase
      
      * check in gen py
      
      * merge master and fix bugs
      
      * address pr comments
      
      * address pr comments
      
      * auto format by CI
      
      * remove boxing
      
      * refine
      
      * Fix optional
      
      * remove to in tensor.cpp
      
      * update
      
      * Support symbol placement type in functional.
      
      * add sbp and sbp list arg
      
      * refine
      
      * use functional
      
      * refactor CastConsistentOpExpr
      
      * to_consistent(flow.B) backward
      
      * Cache op expr
      
      * add EagerNcclOpKernelState
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * minor fix
      
      * capture OpInterpContext
      
      * unimplemented apply
      
      * add GetNdSbp
      
      * add mutex
      
      * refine
      
      * merge EagerConsistentTensorImpl::NewWithPhyTensor and EagerConsistentTensorImpl::NewWithoutPhyTensor into EagerConsistentTensorImpl::New
      
      * rename functiona SyncData to SyncMetaAndData
      
      * of_format
      
      * add to_local to pybind
      
      * add placement_sbp_util
      
      * minor fix
      
      * sync shape and data when tensor_to_local
      
      * fix rpc_token bugs
      
      * refactor AsyncRpcCtx
      
      * set logical_shape correctly
      
      * simplify implementation of consistent_tensor.to_local
      
      * initialize rpc_token with zero
      
      * refactor grad functions of to_consistent/to_local
      
      * reformat and address pr comment
      
      * reformat
      
      * refactor eager_nccl_reduce lernel
      Co-authored-by: Ntsai <jackalcooper@gmail.com>
      Co-authored-by: NXinqi Li <lixinqi0703106@163.com>
      Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
      Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
      Co-authored-by: Nhjchen2 <chenhoujiangcug@gmail.com>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      a72c21d9
  12. 31 7月, 2021 1 次提交
  13. 30 7月, 2021 1 次提交
  14. 01 7月, 2021 1 次提交
    • L
      Infer consistent tensor meta (#5118) · c3238bfd
      Li Xinqi 提交于
      * Device::compute_dep_object_
      
      * sequantialize instructions in the same stream.
      
      * refactor AttrMap
      
      * refactor Tensor
      
      * Export ConsistentTensor::is_cuda
      
      * remove ConsistentTensor::blob_object
      
      * refactor TensorImpl
      
      * minor fix
      
      * fix compiler' complains
      
      * Implements EagerConsistentTensorImpl::New
      
      * minor fix
      
      * fix compiler complains
      
      * remove unused code
      
      * skip test_creating_consistent_tensor
      
      * backup code
      
      * Symbol::shared_from_symbol
      
      * remove redundant header file includes
      
      * fix bug in Symbol::shared_from_symbol
      
      * symbolize ParallelDesc and ParallelDistribution
      
      * symbolize Scope::GetParallelDesc()
      
      * IsScalarType
      
      * fix compiler complains
      
      * InputConsistentTensorMeta
      
      * refactor Scope with PlacementScope
      
      * fix bug in exporting Scope to python
      
      * backup code
      
      * refactor DType
      
      * fix compiler complains
      
      * backup code
      
      * DType is only allowed to be used in python code
      
      * backup code
      
      * dtype api bugfix
      
      * fix error on exiting
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * lazily get rank
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * Export const DType* into python
      
      * minor fix
      
      * fix bug
      
      * refine
      
      * refactor signature of OpExpr::InferLogicalShapeAndDtype
      
      * fix bug
      
      * backup_code
      
      * fix bug
      
      * refactor SbpXXX to cfg::SbpXXX
      
      * merge refactor_sbp_to_cfg_sbp
      
      * fix bug
      
      * Infer ConsistentTensorMeta
      
      * Implement EagerConsistentInterpret::ApplyImpl
      
      * 1) move XXXTensorMeta into the new file tensor_meta.h; 2) add new Class ConsistentTensorInferCache
      
      * add class ConsistentTensorInferResult
      
      * remove unused OpArgMutConsistentTensorMeta::parallel_distribution_
      
      * fix stack-overflow bug in Tensor::mut_eager_mirrored_tensor_impl
      
      * ignore empty parallel distribution constaint
      
      * fix bug
      
      * add explicit of cfg
      
      * fix xla compile bug
      
      * auto format by CI
      
      * fix according comment
      
      * fix bug
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      Co-authored-by: qq_22305325's avatarclackhan <han_binbin@163.com>
      Co-authored-by: Ndaquexian <daquexian566@gmail.com>
      Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
      Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
      c3238bfd
  15. 24 4月, 2021 1 次提交
  16. 31 3月, 2021 1 次提交
    • C
      Feat: NCCL use compute stream support 2D SBP (#4533) · 8abe18ea
      cheng cheng 提交于
      * Insert NCCL logical op pass support hierarchy
      
      * Add NCCL logical 2D SBP op/kernel support (*P)->(*B)
      
      * Add NCCL logical 2D SBP op/kernel support (P*)->(B*)
      
      * Fix bug and support (*, S(0)) -> (*, B) [dim1:AllGather] and (*, S(in)) -> (*, S(out)) [dim1:All2All]
      
      * Fix BUG and runnable
      
      * fix hierarchy equal bug
      8abe18ea
  17. 23 3月, 2021 1 次提交
  18. 10 3月, 2021 1 次提交
  19. 06 1月, 2021 1 次提交
    • qq_22305325's avatar
      Refactor python remote blob (#4081) · d399fd49
      qq_22305325 提交于
      * remove BlobDef
      
      * fix code format
      
      * del uneless line
      
      * del uneless line
      
      * fix mistake
      
      * tmp storage
      
      * fix bug
      
      * refactor ArgBlobDef
      
      * fix bug
      
      * refactor_consist_blob
      
      * fix bug
      
      * fix code format
      
      * fix distribute test bug
      
      * fix op test bug
      
      * refactor lazy consist blob
      
      * fix distribute test bug
      
      * fix according to comment
      
      * remove const
      
      * fix code format
      
      * fix bug
      
      * refactor_mirrored_blob
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS
      
      * rename HAS_NO_SPLIT_AXIS and HAS_NO_BATCH_AXIS
      
      * replace oneflow_api with flow in test file
      
      * fix mirrored bug
      
      * fix EagerMirroredBlob init bug
      
      * fix get_dtype bug
      d399fd49
  20. 26 12月, 2020 1 次提交
  21. 10 10月, 2020 1 次提交
    • L
      Eager transport (#3598) · 5181feed
      Li Xinqi 提交于
      * TransportStreamType
      
      * no constexpr specifier for Clamp<T>
      
      * refine signature of Send
      
      * GetTransportInstructionParallelConfs
      
      * Send/Receive multi blobs by one instruction
      
      * refine: static inline -> inline static
      
      * refine comment according to google style guide
      
      * implement Send/Receive by Grpc
      
      * reimplement Send/Receive
      
      * oneflow.eager_assign_121
      
      * BoxingInterNodeOneToOne
      
      * 2node_test_assign
      
      * rename OF_BARRIAER
      
      * add eager_2node_test.py
      
      * InitLazyGlobalSession if eager execution not enabled
      
      * remove Global<LbiDiffWatcherInfo>
      
      * add TODO() comments for OF_SESSION_BARRIER under directory core/comm_network
      
      * import atexit in python/framework/unittest.py
      
      * fix minor bug in test_assign.py
      
      * fix macro name error: PLATFORM_POSIX -> OF_PLATFORM_POSIX
      5181feed
  22. 07 8月, 2020 1 次提交
  23. 26 7月, 2020 1 次提交
    • qq_22305325's avatar
      refactor ParallelConf (#3268) · dd0786dd
      qq_22305325 提交于
      * refactor ParallelConf
      
      * refactor parallel_conf
      
      * fix parallel_conf init
      
      * refactor opkernel_instruction_type init/compute
      
      * fix test_cpu_only_user_op
      
      * remove notes
      
      * fux non_distributed_optimizer_pass.cpp bug
      
      * fix code style
      
      * fix a small bug
      Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com>
      dd0786dd
  24. 23 7月, 2020 1 次提交
    • S
      Dev apache2 license (#3266) · d0bdbd5d
      Shenghang Tsai 提交于
      * add license at root dir
      
      * check in empty files
      
      * rm space
      
      * check in script
      
      * update script
      
      * fix bug
      
      * add print
      
      * fix
      
      * add exit
      
      * add to of_format
      
      * add CI task
      
      * fix license
      
      * Revert "fix license"
      
      This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57.
      
      * only add once
      
      * quick fix
      
      * fix script
      
      * dont fmt empty file
      
      * fix
      
      * quick fix
      
      * fix py
      
      * add license
      
      * fix exit
      
      * add license for hpp
      
      * add license
      
      * license new vm files
      Co-authored-by: Ntsai <caishenghang@oneflow.org>
      d0bdbd5d
  25. 04 7月, 2020 2 次提交
  26. 30 6月, 2020 1 次提交
  27. 24 6月, 2020 2 次提交
  28. 23 6月, 2020 1 次提交
  29. 26 11月, 2019 1 次提交
    • L
      Merge quick dirty from obj detect (#2444) · f5937569
      Li Xinqi 提交于
      * cmake find python note when version less 3.14 (#2286)
      
      * fix bug: reduce split kernel inplace (#2297)
      
      * Dev bias add (#2299)
      
      * use bias add
      
      * fix
      
      * bias_add
      
      * bias add half
      
      * fix
      
      * reinterpret_cast
      
      * fix half
      
      * HALF
      
      * fix
      
      * ADD_DEFAULT_KERNEL_CREATOR
      
      * fix
      
      * format
      
      * Fix dev python test (#2294)
      
      * add decode random
      
      * fix decode random actor
      
      * fix dev_python test scripts
      
      * fix batch_size test scripts
      
      * fix
      
      * Memory Version 2.0 Step 2:  MemSharedAndReused between jobs (#2267)
      
      * MemBlockProto and ChunkProto
      
      * create mem block and chunk after improver
      
      * interface merge mem block and chunk between sub plans
      
      * merge chunk between jobs for memory reuse
      
      * using memory zone unique id replace memory case hash
      
      * merge interface op mem block between jobs for mem shared
      
      * gen GlobalCriticalSection by mem block id and chunk id
      
      * check mem block and chunk valid before runtime
      
      * Refactor: RegstMgr ;  allocate memory by mem block and chunk instead of regst
      
      * fix bug; and pass test
      
      * fig bug: init chunk_id_count in id_manager
      
      * reuse copyHd out mem between jobs
      
      * PushPlan and PullPlan for memblock and chunk
      
      * refine merge mem block / chunk in oneflow.cpp
      
      * at(i);
      
      * GetOpName2JobId2TaskProtos functional
      
      * using output ptr; pass test AlexNet and Resnet
      
      * Dev cuda 9 arch 70 (#2318)
      
      * kCudaAlignSize = 256
      
      * always compute_70
      
      * __CUDA_API_VERSION >= 10000
      
      * __CUDA_API_VERSION >= 10000
      
      * disable_all_reduce_sequence
      
      * Fix cuda9 cudnn turing issue (#2329)
      
      * fix cuda 9 issus on turing device
      
      * CUDA_VERSION
      
      * no cuda check
      
      * bias add kernel gpu half (#2330)
      
      * mem_block=>header_mem_block (#2338)
      
      * speedup oneflow compilation
      
      * identity_sbp_conf
      
      * DropOut Version2 (#2355)
      
      * random mask like op conf; refine dropout op in python
      
      * remove useless dropout kernel conf
      
      * implement of random mask like op
      
      * refine dropout op
      
      * refine dropout grad op
      
      * refine generate dropout backward
      
      * random mask like kernel
      
      * refine dropout (grad) kernel
      
      * fix link problem for template separated compile
      
      * fix bug and pass test
      
      * dropout kernel for half
      
      * add check for dropout mask input data type
      
      * bugfixs
      
      * Remove IsOpFloat32() in auto_mixed_precision.cpp (#2358)
      
      * fuse op/kernl to 1 cpp
      
      * refine for review
      
      * fix bug
      
      * Refactor Kernel Registry for more flexible registration (#2363)
      
      * feat: update KernelRegistration and add KernelRegValProto
      
      * Refactor Kernel Registry for more flexible registration
      
      * Remove unused kernel_reg_value.proto
      
      * Memory Version 2.0 Step 3: MemReused in job (#2319)
      
      * use_memory_allocation_algorithm_v2 for switch improver mem block id
      
      * reuse plan task graph and ctrl edge for inferred mem block
      
      * refine interface; InJobMemSharingUtil
      
      * navie merge memory big chain; gen regst apply/release queue; handle for inplace hint regst
      
      * generate regst 2 mutual exclusion regsts
      
      * bugfix: apply should before release
      
      * interface for multi-thread run algorithm get mem block offset result
      
      * selet best algorithm to set mem block id and mem block offset
      
      * set mem block for inplace consumer regst
      
      * 3 algorithm interface
      
      * half implement of algo 1
      
      * implement of algorithm0_OfColorImproved
      
      * runnable in 1 machine 1 device
      
      * Memory Chain
      
      * merge MemoryChain and pass Correctness test of alexnet and resnet50
      
      * bugfixs: continues inplace consume relationship in bert-base fp16
      
      * erase useless info in MemoryChain
      
      * implement of BfcAllocator and Tf_Bfc algorithm
      
      * use bfc algo and fix bug
      
      * only use default algo
      
      * renme in_job_* => intra_job_*
      
      * rename: InJob* => IntraJob*
      
      * rename: 1) apply_regsts_queue => alloc_regsts_queue; 2) release_regsts_queue => free_regsts_queue
      
      * rename function name in job/intra_job_mem_sharing_util.cpp
      
      * rename variable names in job/intra_job_mem_sharing_util.cpp: 1) *apply* => *alloc*; 2) *release* => *free*
      
      * refactor FindFreeOffset => FindFreeOffsetAndNewBufferSize
      
      * rename method: DeallocateRaw => FreeRaw
      
      * rename varable for review
      
      * use enum for mem reused algorithm and add python interface
      
      * fix sbp infer (#2373)
      
      * mv addr calculation out of decoder (#2374)
      
      * use tmp blob for temp storage (#2375)
      
      * INDEX_DATA_TYPE_SEQ (#2381)
      
      * refine include (#2382)
      
      * refine include
      
      * format
      
      
      format
      
      * element_wise_mul (#2383)
      
      * gather refine (#2384)
      
      * Dev fix sbp (#2388)
      
      * fix sbp
      
      * fix sbp
      
      * remove VirtualGenKernelConf
      
      * rename Read to ReadFully (#2389)
      
      * Dev parallel cast (#2391)
      
      * parallel cast
      
      * op_conf
      
      * refine
      
      * Dev auto zero padding (#2393)
      
      * auto_zero_padding
      
      * auto_zero_padding
      
      * fix
      
      * fix input_mask and token_type_id (#2398)
      
      * fix job launch (#2401)
      
      * fix sbp bug (#2402)
      
      * fix sbp
      
      * fix
      
      * add missing header files (#2410)
      
      * refactor cnn model tests (#2411)
      
      * refactor cnn model tests
      
      * reformat README.md
      
      * reformat README.md
      
      * refactor ndarray_reduce (#2412)
      
      * fix inplace reachability bug (#2413)
      
      * refactor gpu relu (#2414)
      
      * refactor gpu relu
      
      * CHECK_KERNEL_SAFE_INT32
      
      * there may be a subtle cuda bug in ((float) x < 0)
      
      * refactor ndarray_reduce (#2405)
      
      * refactor ndarray_reduce
      
      * refactor relu/bias_add
      
      * refactor relu
      
      * refactor relu
      
      * refactor bias_add
      
      * refactor relu/bias_add
      
      * fix inplace_lbi bug
      
      * refactor addition
      
      * IsKernelSafeInt32
      
      * CUDA_1D_KERNEL_LOOP_T
      
      * CUDA_1D_KERNEL_LOOP_T
      
      * If add (#2415)
      
      * refactor ndarray_reduce
      
      * refactor relu/bias_add
      
      * refactor relu
      
      * refactor relu
      
      * refactor bias_add
      
      * refactor relu/bias_add
      
      * fix inplace_lbi bug
      
      * refactor addition
      
      * IsKernelSafeInt32
      
      * CUDA_1D_KERNEL_LOOP_T
      
      * CUDA_1D_KERNEL_LOOP_T
      
      * add unless oprand is nonzero
      
      * Clear session (#2416)
      
      * oneflow.clear_default_session
      
      * fix bugs in oneflow.config.machine
      
      * refactor function return type (#2417)
      
      * fix for py2 (#2418)
      
      * blob parallel conf
      
      * Pr watch scope (#2419)
      
      * pr oneflow.watch*
      
      * merge more code to pass watch_scope.py
      
      * TODO: input_blob_def.parallel_conf
      
      * fix reexport of identity op
      
      * merge dev_quick_dirty_object_detection
      
      * oneflow.cluster (#2423)
      
      * oneflow.cluster
      
      * no alias for oneflow.cluster.*
      
      * mv cpp_logging_conf from config_proto to cluster_proto
      
      * rename: cluster => env
      
      * rename: Environment => Session
      
      * Free port (#2427)
      
      * oneflow.cluster
      
      * no alias for oneflow.cluster.*
      
      * mv cpp_logging_conf from config_proto to cluster_proto
      
      * rename: cluster => env
      
      * rename: Environment => Session
      
      * auto find a free port for single node environment
      
      * localhost only
      
      * Dev single processor test (#2430)
      
      * oneflow.cluster
      
      * no alias for oneflow.cluster.*
      
      * mv cpp_logging_conf from config_proto to cluster_proto
      
      * rename: cluster => env
      
      * rename: Environment => Session
      
      * auto find a free port for single node environment
      
      * localhost only
      
      * single process test
      
      * Cluster::WorkerLoop
      
      * delete unnecessary OF_BARRIER_ALL
      
      * no longer fork children processes to run tests
      
      * format
      
      * fix align byte size bug (#2436)
      
      * fix align bugs (#2440)
      
      * fix: GetNumOfLoDLevels lack return
      
      * minor script fix and update
      
      * update script
      
      * remove redundant function
      f5937569
  30. 17 11月, 2019 1 次提交
  31. 17 10月, 2019 1 次提交
  32. 12 10月, 2019 1 次提交
  33. 11 10月, 2019 1 次提交
  34. 10 10月, 2019 1 次提交
  35. 24 9月, 2019 1 次提交
    • N
      merge with dev_python (#2249) · 3960d2cb
      Niu Chong 提交于
      * Dev actor msg queue (#2225)
      
      * async msg queue
      
      * EnqueueAsyncMsg
      
      * Merge wnd python (#2226)
      
      * not ready yet
      
      * segment fix
      
      * fix segment_sum bugs
      
      * 1st wide_n_deep push
      
      * Fix tick in multi node parallel (#2042)
      
      * check in fixes
      
      * fix by adding boxing method
      
      * register tick op
      
      * move code and add more check
      
      * fix typo
      
      * fix bug when filtering op nodes before adding tick
      
      * fix wheel build not adding .so (#2052)
      
      * color plan dot VERSION-2 (#2045)
      
      * run sucessfully on single GPU
      
      * fix 121 for tick (#2069)
      
      * delete unncessary multiply_grad class
      
      * speed up generate time for dot2svg (#2083)
      
      * Add axis conf to bias_add for any axis channel (#2087)
      
      * bias_add completion
      
      * follow comment
      
      * make conf axis required
      
      * Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091)
      
      This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47.
      
      * updated
      
      * fix segment_sum_grad
      
      * fix sbp
      
      * fix segment_sum impl for data parallel
      
      * fix
      
      * remove useless code in segment_kernel_util.h
      
      * add python interface
      
      * fix sigmoid conf
      
      * fix naming error
      
      * fix typo
      
      * temp mod loss sbp
      
      * add LazyAdam
      
      * Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep
      
      * rm useless code
      
      * unsorted_segment_sum
      
      * refactor sigmoid_cross_entropy_loss_kernel to high performance
      
      * Improve sigmoid cross entropy loss grad (#2207)
      
      * remove for loop called cuda kernel
      
      * minor fix
      
      * ../oneflow/python/ops/data_ops.py (#2209)
      
      * fix lazy_adam
      
      * Merge wnd and python (#2214)
      
      * rm ActivationType from op/kernel (#2205)
      
      * refactor sigmoid_cross_entropy_loss
      
      * fix SigmoidGrad::InferBatchAxis
      
      * support part_name_prefix and part_name_suffix_length (#2208)
      
      * rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus
      
      * oneflow.watch for debug
      
      * Dev decode batch size (#2206)
      
      * rm batch_size and piece_size
      
      * merge dev_python
      
      * Update reshape_like_op.cpp (#2213)
      
      * oneflow.parallel (#2211)
      
      * oneflow.parallel
      
      * refactor split_axis => parallel
      
      * rename parallel => distribute
      
      * fix typo: *Parallel => *Distribute
      
      * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()
      
      * merge dev_python
      
      * fix boxing: P->S(0)
      
      * check in docker build scripts (#2216)
      
      * Dev python widedeep docker (#2218)
      
      * check in docker build scripts
      
      * check in .dockerignore
      
      * rm oneflow.segment_sum
      
      * remove segment_sum
      
      * rm unused file
      
      * rm debug code
      
      * rm debug code
      
      * rm double empty lines
      
      * remove useless comments
      
      * fix send msg (#2227)
      
      * fix reduction_coefficient (#2228)
      
      * refactor ndarray for eq/ne/...
      
      * Dev kernel launch synchronized (#2230)
      
      * IsKernelLaunchSynchronized
      
      * virtual
      
      * refine
      
      * refine
      
      * seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC
      
      * more static_assert
      
      * remove unused task related dot function (#2236)
      
      * remove unused task related dot function
      
      * do not output dot rank info
      
      * Dev non distributed optimizer js (#2234)
      
      * op&kernel&actor
      
      * job
      
      * job_completer
      
      * graph
      
      * format
      
      * fix pd
      
      * fix
      
      * ignore DelPlacementByOpName
      
      * fix auto tick
      
      * JobBuilder
      
      * fix
      
      * config util
      
      * fix
      
      * fix opgrade
      
      * broadcast tick
      
      * fix allreduce
      
      * balance by model size
      
      * GetSoleOutBlobSize
      
      * async_actor_msg_deque
      
      * group
      
      * AddOrMutOpsOnlyOnce
      
      * fix NcclTupleBroadcastGrad
      
      * order
      
      * set nccl order hint
      
      * op_conf
      
      * grad hint
      
      * NcclTupleBroadcastReduceSequencePass
      
      * add missed mutops
      
      * order fix
      
      * try kMdUpdtArea
      
      * fix nccl_order_hint
      
      * fix
      
      * add ti
      
      * tuple_identity_op
      
      * remove useless
      
      * group
      
      * fix dead lock
      
      * force ctrl in
      
      * sc broadcast
      
      * sort obn
      
      * group nccl
      
      * config group_size_mbyte
      
      * non_distributed_optimizer_group_size_mbyte
      
      * format
      
      * stop check
      
      * rm message sending optimization
      
      * refine lazy adam (#2244)
      
      * refine lazy adam
      
      * update
      
      * memory version 2 step 1: replace original concept about mem sharing (#2242)
      
      * mem_shared_id -> mem_block_id;  mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem
      
      * memory version 2 step 1: replace original concept about mem sharing
      
      * record reader multi thread (#2246)
      
      * multi thread
      
      * ComputeThreadPoolSize
      
      * python api
      3960d2cb
  36. 20 9月, 2019 1 次提交
    • J
      Dev non distributed optimizer js (#2234) · 2b7c50b0
      Juncheng 提交于
      * op&kernel&actor
      
      * job
      
      * job_completer
      
      * graph
      
      * format
      
      * fix pd
      
      * fix
      
      * ignore DelPlacementByOpName
      
      * fix auto tick
      
      * JobBuilder
      
      * fix
      
      * config util
      
      * fix
      
      * fix opgrade
      
      * broadcast tick
      
      * fix allreduce
      
      * balance by model size
      
      * GetSoleOutBlobSize
      
      * async_actor_msg_deque
      
      * group
      
      * AddOrMutOpsOnlyOnce
      
      * fix NcclTupleBroadcastGrad
      
      * order
      
      * set nccl order hint
      
      * op_conf
      
      * grad hint
      
      * NcclTupleBroadcastReduceSequencePass
      
      * add missed mutops
      
      * order fix
      
      * try kMdUpdtArea
      
      * fix nccl_order_hint
      
      * fix
      
      * add ti
      
      * tuple_identity_op
      
      * remove useless
      
      * group
      
      * fix dead lock
      
      * force ctrl in
      
      * sc broadcast
      
      * sort obn
      
      * group nccl
      
      * config group_size_mbyte
      
      * non_distributed_optimizer_group_size_mbyte
      
      * format
      
      * stop check
      
      * rm message sending optimization
      2b7c50b0
  37. 17 9月, 2019 1 次提交