1. 01 11月, 2021 1 次提交
    • Z
      Change maybe to optional (#6611) · 380d2414
      Zhanghuihong Guan 提交于
      * initial commit, add code for async construct tensor from numpy array
      
      * inital commit to change Maybe to Optional
      
      * delete redundant code
      
      * replace Maybe with Optional
      
      * fix compile errors
      
      * format code
      
      * changes based on review
      
      * format code, fix based on review
      
      * format code
      
      * fix multiclient type
      
      * changes based on review
      
      * changes based on review
      
      * unify calling to IsMultiClirnt
      
      * refector multi_client related code
      
      * restore InMultiClient interface
      
      * double check for unnecessary changes
      
      * remove unnecessary changes
      
      * format code
      
      * Update oneflow/api/python/symbol/job_conf_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_conf_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_node_signature_symbol.cpp
      
      * Update oneflow/core/common/optional.h
      
      * Update oneflow/api/python/symbol/string_symbol.cpp
      
      * Update oneflow/api/python/symbol/scope_symbol.cpp
      
      * Update oneflow/api/python/symbol/placement_symbol.cpp
      
      * Update oneflow/api/python/symbol/op_conf_symbol.cpp
      Co-authored-by: NHoujiang Chen <chenhoujiangcug@gmail.com>
      Co-authored-by: NTwice <i@twice.moe>
      380d2414
  2. 23 9月, 2021 1 次提交
  3. 11 9月, 2021 1 次提交
  4. 08 9月, 2021 1 次提交
  5. 06 9月, 2021 1 次提交
  6. 17 8月, 2021 1 次提交
  7. 15 8月, 2021 1 次提交
  8. 12 8月, 2021 1 次提交
  9. 02 8月, 2021 1 次提交
    • L
      0-dim tensor support (#5552) · 62a8cd84
      Luyang 提交于
      * 0-dim tensor support
      
      * test case
      
      * add more test
      
      * refine
      
      * update
      
      * update default constructor
      
      * reconstuct
      
      * merge master
      
      * remove notes
      
      * remove useless codes
      
      * fix comments
      
      * fix comment
      
      * add test case
      
      * format
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * MirroredTensorMeta::MirroredTensorMeta()
      
      * support 0-dim slice
      
      * support 0-dim slice grad
      
      * refine
      
      * auto format by CI
      
      * refine
      
      * refine
      
      * auto format by CI
      
      * refine
      
      * fix slice bug
      
      * auto format by CI
      
      * fix resnet50 0-im loss uasge
      
      * fix 0-dim tensor usage in test cases
      
      * add skip test
      
      * auto format by CI
      
      * fix test_dataset
      
      * check blobdesc.shape init
      
      * auto format by CI
      
      * remove useless empty shape init
      
      * fix l1loss 0-dim error
      
      * auto format by CI
      
      * fix argmax op test
      
      * fix add_n op test
      
      * auto format by CI
      
      * fix bce loss op test
      
      * auto format by CI
      
      * fix squeeze op test
      
      * fix conv2d op test
      
      * fix xpu_shape for clip_grad_norm
      
      * auto format by CI
      
      * resolve confilct
      
      * fix multi-cpu slice_copier 0-dim bug
      
      * auto format by CI
      
      * add memory copy for 0-dim
      
      * auto format by CI
      
      * support copy0dim
      
      * refine
      
      * auto format by CI
      
      * remove unuse codes
      
      * fix check for kldivloss
      
      * gpu 0-dim copy
      
      * auto format by CI
      
      * fix clip_grad_norm doctest
      
      * fix reduce_ops doctest
      
      * fix argmax doctest
      
      * fix loss module doctests
      
      * fix math_ops doctests
      
      * fix norm modules doctest
      Co-authored-by: NXinqi Li <lixinqi0703106@163.com>
      Co-authored-by: Noneflow-ci-bot <ci-bot@oneflow.org>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      62a8cd84
  10. 16 7月, 2021 1 次提交
  11. 21 6月, 2021 1 次提交
    • L
      Refactor Memory Zone (#5072) · 50f32b61
      leaves-zwx 提交于
      * MemZoneId
      
      
      Former-commit-id: 7550a129f15554c5a6e480b728079e431c00be25
      
      * move mem zone id source code
      
      
      Former-commit-id: 3859fc2a0fcda2fb23e57e886a0e3f1c0833d111
      
      * revert
      
      
      Former-commit-id: 5cf3ad7caebe787918d1ca1c0467415656d9b491
      
      * refine GetProxyNode using MemZoneId
      
      
      Former-commit-id: fba035f20b44b1acce2900b86b5bd24654e0d982
      
      * refactor MemZoneId121
      
      
      Former-commit-id: 0868a6139f1cf20dc7474d0a88714e03721c8e8e
      
      * replace using IDMgr interface
      
      
      Former-commit-id: 98b5db9ed879cd1d8197efd174c6d680bec69560
      
      * fix linkage
      
      * rm useless comment
      
      * replace IsGpuMemZone
      
      * format
      
      * rm deprecated mem zone api in IDMgr
      
      * fix merge conflict error
      
      * refine mem zone id to include node index
      
      * revert added header
      
      * direct init device_id
      
      * address review
      
      * address review
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      50f32b61
  12. 15 6月, 2021 1 次提交
  13. 05 5月, 2021 1 次提交
  14. 29 4月, 2021 1 次提交
    • C
      Pipeline Parallelism by stage buffer (#4666) · 080d8eab
      cheng cheng 提交于
      * Pipeline Parallelism: checkpointing insert identity buffer op
      
      * fix complier err
      
      * identity buffer op custom out regst num
      
      * fix bug and runnable
      
      * Chain merge divide fw/bw; MemChain ignore merge; copyhd regst num hack
      
      * Pipeline buffer pass
      
      * Pipeline runnable
      
      * rollback NOT merge mem chain hack
      
      * pipeline_stage_id_hint and rollback checkpointing buffer
      
      * Pipeline buffer only. test pass.
      
      * rollback repeat hack
      
      * Remove CopyHd Hack; Add buffer cross label loader and loss
      
      * refine code for review & fix for new dtype infer
      
      * add note
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      080d8eab
  15. 19 4月, 2021 2 次提交
  16. 12 4月, 2021 1 次提交
  17. 07 4月, 2021 1 次提交
  18. 31 3月, 2021 1 次提交
    • C
      Feat: NCCL use compute stream support 2D SBP (#4533) · 8abe18ea
      cheng cheng 提交于
      * Insert NCCL logical op pass support hierarchy
      
      * Add NCCL logical 2D SBP op/kernel support (*P)->(*B)
      
      * Add NCCL logical 2D SBP op/kernel support (P*)->(B*)
      
      * Fix bug and support (*, S(0)) -> (*, B) [dim1:AllGather] and (*, S(in)) -> (*, S(out)) [dim1:All2All]
      
      * Fix BUG and runnable
      
      * fix hierarchy equal bug
      8abe18ea
  19. 25 3月, 2021 1 次提交
  20. 23 3月, 2021 1 次提交
  21. 19 3月, 2021 1 次提交
  22. 16 3月, 2021 1 次提交
  23. 15 3月, 2021 2 次提交
    • W
      Refine stream index getter (#4349) · 20029059
      Wang Tuo 提交于
      * XXId structs and IdUtil
      
      * rm useless header
      
      * update id_util by discuss
      
      * update generate common thrd id and independent thrd id by IdUtil api
      
      * minor update
      
      * use IdUtil to generate task id in UpdateTaskId
      
      * Global<IdUtil>
      
      * emplace CommNetThrdId and TickTockThrdId call
      
      * implement IDMgr MemZoneId related api with IdUtil MemZoneId api
      
      * add GenerateChainId api
      
      * replace IDMgr api with IdUtil
      
      * rm useless header
      
      * revert IDMgr mem_zone_id api
      
      * rm redefinition of GetGpuPhyIdFromMemZoneId
      
      * modify by review comment
      
      * safety modification
      
      * def TaskType hash function
      
      * XXId structs and IdUtil
      
      * rm useless header
      
      * update id_util by discuss
      
      * update generate common thrd id and independent thrd id by IdUtil api
      
      * minor update
      
      * use IdUtil to generate task id in UpdateTaskId
      
      * Global<IdUtil>
      
      * emplace CommNetThrdId and TickTockThrdId call
      
      * implement IDMgr MemZoneId related api with IdUtil MemZoneId api
      
      * add GenerateChainId api
      
      * replace IDMgr api with IdUtil
      
      * rm useless header
      
      * revert IDMgr mem_zone_id api
      
      * rm redefinition of GetGpuPhyIdFromMemZoneId
      
      * modify by review comment
      
      * safety modification
      
      * def TaskType hash function
      
      * rm old test
      
      * fix by self review
      
      * change name
      
      * fix typo and enhance error info
      
      * refactor thread manager
      
      * more check
      
      * rm AllocateCpuThrdIdEvenly
      
      * refactor StreamId and rm IdUtil
      
      * stream index generator
      
      * modify by review
      
      * update stream index
      
      * update id util
      
      * update comm net task node
      
      * add TaskIdGenerator
      
      * update task id generation
      
      * replace gen thrd_in in logical node
      
      * replace GetGpuComputeThrdId in boxing sub task graph builder
      
      * replace h2d and d2h thrd_id in CopyHdTaskNode
      
      * replace h2d and d2h thrd_id in SliceBoxingSubTskGphBuilder
      
      * update id_util header
      
      * CHECK NOTNULL stream index generator
      
      * add chain_id_generator
      
      * rm IdUtil Glabol New
      
      * rm stream type in thread manager
      
      * CHECK_NOTNULL stream_index_generator in logical node
      
      * update id manager
      
      * update id_util
      
      * fix compile errors
      
      * tidy code
      
      * tidy code
      
      * revert format
      
      * mv std::hash<TaskType> to task_node.h
      
      * use unique_ptr to manage thread
      
      * fix typo
      
      * format
      
      * modify by review
      
      * start up
      
      * rm chain id generator
      
      * move id serialization to independent implementation
      
      * rm useless friend
      
      * fix compile error under gcc 4.8
      
      * rm IsXxxStreamIndex
      
      * rm deprecated api in IDMgr
      
      * fix bug in CPUStreamIndexGenerator::GenerateComputeStreamIndex
      
      * refine id structs
      
      * refine id struct serialization
      
      * refine task id generator
      
      * refine StreamIndexGeneratorManager
      
      * refine copy task node
      
      * refine collective boxing sub task graph builder
      
      * refine slice boxing sub task graph builder
      
      * refine naive b2p sub task graph builder
      
      * refine logical node
      
      * refine id manager
      
      * refine thread manager
      
      * rm useless comment
      
      * remove magic number
      
      * revise header to be compatible with cpu-only compilation
      
      * more readable
      
      * fix bug
      
      * refine code
      
      * use HashCombine
      
      * replace type of bit shift const value with size_t
      
      * add testcase for fake dev
      
      * refactor mem_zone_id
      
      * reformat
      
      * add fake device allocator/deallocator
      
      * task_node InitProducedRegstMemCase add fakedev
      
      * Add stream_index_getter
      
      * format and fix tick tock task type
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * skip fake device test for now
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * refine Memcpy for fake dev
      
      * update for debug
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * some update for fake device
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * remove debug code
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * reg for fake device creating thread
      
      * minor fix
      
      * format
      
      * refine stream index getter
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * for debug
      
      * refine stream_index_getter
      
      * fix the code
      
      * delete fakedev unit test script
      
      * delete the code which is no relationship with stream_index_getter
      
      * delete test_tmp_dir
      
      * fix format
      
      * move xxx_compute_task_node.h from folder graph_impl to folder graph
      Co-authored-by: Nleaves-zwx <kunta0932@gmail.com>
      Co-authored-by: Nyaochi <later@usopp.net>
      Co-authored-by: NLdpe2G <liangdepeng@gmail.com>
      Co-authored-by: Ndaquexian <daquexian566@gmail.com>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      20029059
    • C
  24. 10 3月, 2021 1 次提交
  25. 02 3月, 2021 2 次提交
    • C
      Remove AreaId (#4283) · 330cf3b8
      cheng cheng 提交于
      * Remove AreaId
      
      * refine check for scope symbol id
      
      * refine logical node macro
      
      * rollback error change in group_boxing_by_dst_parallel
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      330cf3b8
    • L
      DeviceId/StreamId/TaskId part 1 (#4226) · 7c07181c
      leaves-zwx 提交于
      * XXId structs and IdUtil
      
      * rm useless header
      
      * update id_util by discuss
      
      * update generate common thrd id and independent thrd id by IdUtil api
      
      * minor update
      
      * use IdUtil to generate task id in UpdateTaskId
      
      * Global<IdUtil>
      
      * emplace CommNetThrdId and TickTockThrdId call
      
      * implement IDMgr MemZoneId related api with IdUtil MemZoneId api
      
      * add GenerateChainId api
      
      * replace IDMgr api with IdUtil
      
      * rm useless header
      
      * revert IDMgr mem_zone_id api
      
      * rm redefinition of GetGpuPhyIdFromMemZoneId
      
      * modify by review comment
      
      * safety modification
      
      * def TaskType hash function
      
      * XXId structs and IdUtil
      
      * rm useless header
      
      * update id_util by discuss
      
      * update generate common thrd id and independent thrd id by IdUtil api
      
      * minor update
      
      * use IdUtil to generate task id in UpdateTaskId
      
      * Global<IdUtil>
      
      * emplace CommNetThrdId and TickTockThrdId call
      
      * implement IDMgr MemZoneId related api with IdUtil MemZoneId api
      
      * add GenerateChainId api
      
      * replace IDMgr api with IdUtil
      
      * rm useless header
      
      * revert IDMgr mem_zone_id api
      
      * rm redefinition of GetGpuPhyIdFromMemZoneId
      
      * modify by review comment
      
      * safety modification
      
      * def TaskType hash function
      
      * rm old test
      
      * fix by self review
      
      * change name
      
      * fix typo and enhance error info
      
      * refactor thread manager
      
      * more check
      
      * rm AllocateCpuThrdIdEvenly
      
      * refactor StreamId and rm IdUtil
      
      * stream index generator
      
      * modify by review
      
      * update stream index
      
      * update id util
      
      * update comm net task node
      
      * add TaskIdGenerator
      
      * update task id generation
      
      * replace gen thrd_in in logical node
      
      * replace GetGpuComputeThrdId in boxing sub task graph builder
      
      * replace h2d and d2h thrd_id in CopyHdTaskNode
      
      * replace h2d and d2h thrd_id in SliceBoxingSubTskGphBuilder
      
      * update id_util header
      
      * CHECK NOTNULL stream index generator
      
      * add chain_id_generator
      
      * rm IdUtil Glabol New
      
      * rm stream type in thread manager
      
      * CHECK_NOTNULL stream_index_generator in logical node
      
      * update id manager
      
      * update id_util
      
      * fix compile errors
      
      * tidy code
      
      * tidy code
      
      * revert format
      
      * mv std::hash<TaskType> to task_node.h
      
      * use unique_ptr to manage thread
      
      * fix typo
      
      * format
      
      * modify by review
      
      * rm chain id generator
      
      * move id serialization to independent implementation
      
      * rm useless friend
      
      * fix compile error under gcc 4.8
      
      * rm IsXxxStreamIndex
      
      * rm deprecated api in IDMgr
      
      * fix bug in CPUStreamIndexGenerator::GenerateComputeStreamIndex
      
      * refine id structs
      
      * refine id struct serialization
      
      * refine task id generator
      
      * refine StreamIndexGeneratorManager
      
      * refine copy task node
      
      * refine collective boxing sub task graph builder
      
      * refine slice boxing sub task graph builder
      
      * refine naive b2p sub task graph builder
      
      * refine logical node
      
      * refine id manager
      
      * refine thread manager
      
      * rm useless comment
      
      * remove magic number
      
      * revise header to be compatible with cpu-only compilation
      
      * more readable
      
      * fix bug
      
      * refine code
      
      * use HashCombine
      
      * replace type of bit shift const value with size_t
      
      * rm ProcessId and make rank as member of DeviceId
      
      * update id serialization with ProcessId update
      
      * make type definition local namespace
      
      * rm ProcessId in task graph
      
      * update DeviceId usage in logical node
      
      * update DeviceId usage in id manager
      
      * update rank usage in ThreadMgr
      
      * minor change
      
      * detail modification
      
      * tidy header
      
      * tidy header
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      7c07181c
  26. 26 2月, 2021 1 次提交
  27. 20 2月, 2021 2 次提交
  28. 19 2月, 2021 1 次提交
  29. 18 2月, 2021 1 次提交
    • C
      NCCL use compute stream to memory cost & speed up (#4221) · 45697b0c
      cheng cheng 提交于
      * Enable insert nccl logical op pass
      
      * FindMaxConnectedSubgraphForGpuExecOrder~
      
      * through order and interface
      
      * implement of insert nccl logical op in pass
      
      * add nccl logical op using UserOp Implement and EagerNcclCommMgr
      
      * add NCCL ReduceScatter op/kernel; refine pass impl of topo order
      
      * add NCCL logical op/kernel AllGather
      
      * fix bug of reduce scatter/ all gather infer shape
      
      * refine log and note
      
      * fix complier err build with CPU ONLY
      
      * support NCCL ALL2ALL and test pass of alexnet model parallel
      
      * rollback of diff in checkpointing_pass.cpp
      
      * rename to nccl_use_compute_stream; ResourceDesc::nccl_use_compute_stream; refine name for review; create nccl_comm_ in KernelCompute;
      
      * refine code for review
      
      * add unittest for nccl use compute stream
      
      * format test scripts
      
      * refine align
      45697b0c
  30. 08 2月, 2021 1 次提交
  31. 05 2月, 2021 1 次提交
  32. 03 2月, 2021 1 次提交
  33. 26 1月, 2021 1 次提交
  34. 30 11月, 2020 1 次提交
  35. 27 11月, 2020 1 次提交
    • C
      New Chain (#3874) · 65c75854
      cheng cheng 提交于
      * using new chain aglorithm
      
      * fix bug of chain merge
      
      * fix bug of bfs search
      
      * fix order of rm empty adn chain merge
      
      * Try NOT merge in MemChain
      
      * using DfsTopoForEachNodeSortByDistanceToSink for set order in graph
      
      * fix compile err
      
      * rollback for topo order
      
      * using area id split optimizer with fw/bw chain
      
      * NOT consider tick in merge chain
      
      * use area id to split optimizer chain and fw/bw chain
      
      * remove note
      
      * refine code for review
      
      * make docker container stay live 1 hour
      Co-authored-by: NOuYang Yu <xuanjiuye@gmail.com>
      Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
      65c75854
  36. 21 10月, 2020 1 次提交