1. 01 10月, 2018 2 次提交
  2. 30 9月, 2018 1 次提交
    • N
      Refactor Actor (#1259) · 9fda43bf
      Niu Chong 提交于
      * feat(register_slot): add the RegstSlot
      
      * feat(register_slot): update RegstSlot if
      
      * feat(actor): update member of Actor to use RegstSlot
      
      * fix(register_slot): fix the available_regst_desc_cnt init val
      
      * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId
      
      * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst
      
      * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst
      
      * fix(register_slot): fix the CHECK empty
      
      * feat: remove actual_writeable_regst_desc_id_ from Actor, add Naive/CustomizedProducedRegst
      
      * fix(normal_model_update_actor): bug: not send customized regst to consumer when SendIntialModel
      
      * fix(normal_forward_compute_actor): bug: not add kLoss/kAccuracy produced regst to NaiveProducedRegst
      
      * fix(actor): UNIMPLEMENTED() for AsyncSendCustomizedProducedRegstMsgToConsumer
      
      * fix(normal_forward_compute_actor): set const_buf_regst to nullptr when recv from consumers
      
      * fix(actor): total_reading_data_regst_cnt, not total_reading_ctrl_regst_cnt
      
      * refactor: update GetNaiveConsumedRegstDescName to GetNaiveOrCustomizedConsumedRegstDescName(same for Produced)
      
      * feat: combine data_regst and ctrl_regst in Actor
      
      * fix: fix bugs
      
      * fix: fix bugs
      
      * fix: remove .swp files and unused LOG
      
      * feat: split Act and SendMsg (#1255)
      
      * feat: split Act and SendMsg
      
      * refine: rename HandleProduced/ConsumedDataRegst.. to HandleProduced/ConsumedNaiveDatRegst..
      
      * fix(input_wise_comp_actor): bug: not set piece id
      
      * fix(actor): potential bug: produced msg with no allowed actor still pop from queue
      
      * refactor: mv some protected member function to private
      
      * fix(actor): fix the condition about sending EORD msg
      
      * refactor(input_wise_actor): use RegstSlot in InputWiseActor
      
      * fix(copy_comm_net_actor): rename piece_id2regst_ctx to piece_id2regst_ctx_
      
      * refactor: rename Name2RegstDescId to Name2RegstDescIds
      
      * refactor(naive_actor): "override final" instead of only "final"
      
      * refine(actor): little refine
      
      * feat: update the return type of GetNaiveOrCustomizedNamesRegstDescName to enum class RegstNameType
      
      
      Former-commit-id: e042befc
      9fda43bf
  3. 26 9月, 2018 2 次提交
    • S
      add impl of lars (#1163) · 388b945f
      Shiyuan Shang-Guan 提交于
      * add lars set
      
      * add lars
      
      * override ibn&obn to lbi
      
      * make model update consistent
      
      * check cuda stream sync
      
      * add LARSUpdateModelGpu
      
      * checkout naive & momentum model update
      
      * use cublas::dot compute SumOfSquare
      
      * update lars for master
      
      * refine lars for master
      
      
      Former-commit-id: 9518970b
      388b945f
    • qq_22305325's avatar
      Hinge loss test (#1263) · 3343e9b5
      qq_22305325 提交于
      * hinge_loss_kernel_test
      
      * fix opkernel_test
      
      * fix test file
      
      * optimize test file
      
      * opyimize opkernel test
      
      * complete opkernel test interface
      
      
      Former-commit-id: 7faf75a6
      3343e9b5
  4. 25 9月, 2018 2 次提交
  5. 24 9月, 2018 1 次提交
    • J
      Dev use nccl (#1198) · 9201b815
      Jinhui Yuan 提交于
      * add nccl dependency
      
      * add nccl comm handle
      
      * nccl allreduce works
      
      * NcclAllreduce -> NcclAllReduce
      
      * fix header guard
      
      * add NcclReduceScatter, NcclAllGather
      
      * complete ReduceScatter and AllGather, (with cuda error)
      
      * change variable name
      
      * reduce-scatter, all-gather works
      
      * add NcclScatter and NcclGather work type
      
      * Dev use nccl add nccl comm manager (#1206)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * format change
      
      * Dev use nccl merge reduce share mem (#1216)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * ReduceGather
      
      * GlobalAdd
      
      * ReduceScatter
      
      * EnableIfNeed
      
      * ConcatSplit
      
      * EnableMemSharing for pred if need
      
      
      EnableMemSharing for pred if need
      
      * CtrlEdge for Gather
      
      * CtrlEdge for GlobalAdd
      
      * LocalAdd CtrlEdge
      
      * CollectReduceTaskNode
      
      * reverse nodes
      
      * local_add_mem_sharing
      
      
      local add mem sharing
      
      * global add mem sharing
      
      * reduce_mem_sharing
      
      * bugfix
      
      * refine
      
      * format change (remove empty lines)
      
      * format change
      
      * fix local_add and gather issues
      
      * Dev refactor reduce add (#1218)
      
      * change ReduceGlobalAdd to ReduceAdd
      
      * rm ReduceLocalAdd
      
      * no mem sharing case works
      
      * let ReduceAddCompActor decide whether it is local or global
      
      * multi machine multi gpus Nccl and Oneflow allreduce works
      
      * refine
      
      * extract SortEdges
      
      * make EdgeInfo protected
      
      * Dev use nccl refine (#1220)
      
      * const qualifier
      
      * PoorCompTaskNode=>PipeCompTaskNode
      
      * int=>int32_t
      
      * refine ReduceMemSharingCtx
      
      * NcclDeviceCtx and NcclActor
      
      
      NcclDeviceCtx and NcclActor
      
      * empty line
      
      * CudaDeviceCtx<-NcclDeviceCtx
      
      * fix wrong rank_id in reduce_add_actor (#1229)
      
      * fix wrong rank_id in reduce_add_actor
      
      * rm device_num_of_each_machine from parallel_ctx
      
      * fix reduce gather control edge (#1235)
      
      * fix reduce gather control edge
      
      * extract FindNearestReduceAddCompTaskNode
      
      * extract method ReduceCompTaskNodeIf::FindPredRduceTaskNodeIf
      
      * CHECK nearest_add_copy_d2h
      
      * Dev use nccl cross machine nccl all reduce (#1246)
      
      * support ncclAllReduce cross machine
      
      * fix rank_id and rank_num for mix
      
      * reformat
      
      * reformat
      
      * simplify nccl_kernel (#1256)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD (#1260)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD
      
      * note
      
      * Dev use nccl reduce ranking ctx (#1252)
      
      * reformat
      
      * compute rank_id and rank_num with FixCompTaskNode
      
      * reformat
      
      * fix rank_id for reduceadd
      
      * ReduceRankingCtx
      
      * New Ranking and MemSharing for Reduce
      
      * DECLARE_REDUCE_LOGICAL_NODE
      
      * Ranking4NcclAllReduce
      
      * fix ranking
      
      * remove AsTaskNode
      
      * reformat
      
      * runtime rank ctx
      
      * rank_set
      
      * bugfix
      
      * bugfix
      
      * unittest
      
      * change use_nccl_all_reduce_cross_machine to use_nccl_inter_node_communication
      
      * refine
      
      
      refine
      
      * move BuildCtrlRegstBetweenReduceCopyNodes to ReduceAddCompTaskNode
      
      * CHECK mem_size_
      
      
      Former-commit-id: 55496813
      9201b815
  6. 23 9月, 2018 1 次提交
  7. 19 9月, 2018 2 次提交
  8. 18 9月, 2018 1 次提交
    • L
      Dev define test blob (#1247) · 8ebe859c
      Li Xinqi 提交于
      * define_test_blob
      
      * decode random compute task node
      
      * rename define_test_blob_conf.name => define_test_blob_conf.out
      
      * decode random task node color
      
      
      Former-commit-id: 0476d2c2
      8ebe859c
  9. 17 9月, 2018 6 次提交
    • L
      moving model (#1234) · 3d5244c8
      Li Xinqi 提交于
      * moving model
      
      * moving_model => forward_model
      
      * add todo commit
      
      * two model save node
      
      * let md_updt actor handle forward_model
      
      * remove useless code
      
      * rename local variable
      
      
      Former-commit-id: baa146bd
      3d5244c8
    • S
      refine model update conf (#1240) · 33868c01
      Shiyuan Shang-Guan 提交于
      * refine model update conf
      
      * make todo
      
      * add primary_lr and secondary_lr
      
      
      Former-commit-id: 5ccd29d7
      33868c01
    • S
      b3286301
    • J
      Dev refactor channel (#1181) · b012dc22
      Juncheng 提交于
      * add enum ChannelStatus
      
      * merge CloseSendEnd and CloseReceiveEnd
      
      * update channel_test
      
      
      Former-commit-id: fda25987
      b012dc22
    • J
      Refine runtime (#1108) · 03c635ba
      Jinhui Yuan 提交于
      * only master machine saves plan and has event logger
      
      * separate Data, Persistence, Cache, Log FileSystem config
      
      * refine
      
      * only specify data and snapshot path conf
      
      * forbit multiple machines use localfs as snapshot fs
      
      * networkfs as localfs
      
      * refine
      
      * Store log to snapshot (#1109)
      
      * use machine id, drop machine name
      
      * ensure setting machine id
      
      * allow save snapshot to localfs for distributed training (#1113)
      
      * Snapshot to master (#1116)
      
      * allow save snapshot to localfs for distributed training
      
      * fix mdSave to master for model parallel
      
      * fix review comment issues
      
      * add sanity check for machine id
      
      * rm useless comments
      
      * update example
      
      * Dev refine runtime add log stream mgr (#1142)
      
      * add LogStreamMgr
      
      * refine and refactor OutStream=>LogStream
      
      * bugfix
      
      * use LogStreamMgr to write graph, dot, plan, profile and proto
      
      * refine
      
      * simplify, remove LogStreamMgr (#1243)
      
      * simplify, remove LogStreamMgr
      
      * TeePersistentLogStream add static factory (#1244)
      
      
      Former-commit-id: d76513b3
      03c635ba
    • C
      fix bug of forward model -> copyD2H conflict with out regst (#1242) · b3f6e061
      cheng cheng 提交于
      * fix bug of forward model -> copyD2H conflict with out regst
      
      * use 1 line
      
      
      Former-commit-id: 0da0646c
      b3f6e061
  10. 16 9月, 2018 2 次提交
  11. 15 9月, 2018 2 次提交
    • L
      pb list data type (#1237) · d66ad601
      Li Xinqi 提交于
      
      
      Former-commit-id: 58f43ff5
      d66ad601
    • S
      separate model for update (#1232) · 9f22ecaa
      Shiyuan Shang-Guan 提交于
      * make each blob of the packed blob be updated separately in the ModelUpdate
      
      * make blob descs in regst be consistent in bw->md_diff_acc->shared_md_diff_add->md_update->fw
      
      * copy lbi2blob_descs from model
      
      * add shared_model_diff_add kernel
      
      * refine model_update actor and kernel
      
      * rm useless TODO
      
      * add shared_model_diff_add kernel
      
      * refine code
      
      
      Former-commit-id: 11408363
      9f22ecaa
  12. 14 9月, 2018 2 次提交
  13. 13 9月, 2018 1 次提交
  14. 10 9月, 2018 2 次提交
  15. 09 9月, 2018 1 次提交
  16. 07 9月, 2018 3 次提交
    • N
      feat: update the data members to use RegstSlot in Actor (#1208) · d0f50ede
      Niu Chong 提交于
      * feat(register_slot): add the RegstSlot
      
      * feat(register_slot): update RegstSlot if
      
      * feat(actor): update member of Actor to use RegstSlot
      
      * fix(register_slot): fix the available_regst_desc_cnt init val
      
      * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId
      
      * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst
      
      * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst
      
      * fix(register_slot): fix the CHECK empty
      
      
      Former-commit-id: 38a50de4
      d0f50ede
    • J
      Dev allreduce2 (#1211) · e1b30bd5
      Jinhui Yuan 提交于
      * add ReduceScatter2, ReduceAdd2, ReduceGather2 op and kernel
      
      * add ReduceScatter2, ReduceAdd2, ReduceGather2 task node and actor
      
      * complete Reduce2 op
      
      * TODO: complete ReduceAdd2 kernel
      
      * add ReduceScatter2 task to accept model_diff
      
      * sketch of connecting ReduceScatter2/Add2/Gather2
      
      * build allreduce2 logical graph
      
      * connect allreduce2 task graph
      
      * ReduceScatter2 task node
      
      * complete ReduceAdd2, ReduceGather2 task node
      
      * simplify ReduceAdd2 actor
      
      * refactor ReduceAdd2 task node
      
      * let global add -> gather share path
      
      * separate ReduceLocalAdd2 and ReduceGlobalAdd2
      
      * connect AllReduce2 task graph
      
      * complete ReduceGlobalAdd2 op
      
      * refine ReduceLocalAdd2 task node
      
      * complete ReduceGlobalAdd2 task node
      
      * global AllReduce2 works
      
      * add device_num_of_each_machine to parallel_context
      
      * simplify ReduceGlobalAdd2 runtime
      
      * multi machine multi gpus AllReduce2 works
      
      * add mem sharing and ctrl edge for AllReduce2
      
      * single machine multiple gpu mem sharing works
      
      * refine
      
      * remove the previous allreduce
      
      * change AllReduce2 to AllReduce variable convention
      
      * change filename
      
      * complete transfer to allreduce2
      
      * remove unnecessary format change
      
      * remove unnecessary format change
      
      * simplify
      
      * simplify mem sharing rule for reduce add and gather
      
      * check for local add
      
      * fix reduce_global_add actor bug
      
      * refine reduce task node
      
      * refine variable name
      
      * refine
      
      * refine
      
      
      Former-commit-id: 5909cc43
      e1b30bd5
    • J
      fix bug in add kernel of allreduce (#1214) · a76f47b3
      Jinhui Yuan 提交于
      
      
      Former-commit-id: 34ce4862
      a76f47b3
  17. 06 9月, 2018 1 次提交
  18. 04 9月, 2018 5 次提交
  19. 03 9月, 2018 2 次提交
  20. 02 9月, 2018 1 次提交