1. 13 11月, 2018 1 次提交
    • J
      Dev jinyi offline build (#1476) · b4258477
      Jin Yi 提交于
      * chore: remove pre compiler funcs
      
      * chore: add submoudles
      
      * fix: fix project build URL from git_url -> submodule_dir_url
      
      * fix: fix submodule commit id
      
      * fix: fix .gitmodules
      
      * chore: mv third_party dir
      
      * chore: remove test-driver(glog#188) link in glog submodule
      
      * fix: update glog from: da816ea70645e463aa04f9564544939fa327d5a7 ==> to: 4f3e18bf26cdb794fc66cec348f57b5838a0c929
      
      * chore: update README.md
      
      
      Former-commit-id: 8cc052f38cfd53c40186dc487df41b0c1f4a7189
      b4258477
  2. 06 11月, 2018 1 次提交
    • C
      Dev crop with random size (#1468) · 5d034a39
      cheng cheng 提交于
      * random size crop proto
      
      * ImagePreprocessImpl::<kCropWithRandomSize>
      
      * clang format
      
      * MaxVal
      
      
      Former-commit-id: c027432320cc0f03248f9165994150fce058f00a
      5d034a39
  3. 05 11月, 2018 1 次提交
  4. 30 10月, 2018 1 次提交
    • Q
      Fix normlization epsilon check (#1441) · 9e6347a0
      QiaoJing 提交于
      * fix normlization epsilon check
      
      * remove check, fix eplison value in op_conf
      
      
      Former-commit-id: 8ad160577179646a4d83f47a40d5de275ad19952
      9e6347a0
  5. 29 10月, 2018 1 次提交
  6. 26 10月, 2018 2 次提交
  7. 25 10月, 2018 1 次提交
  8. 21 10月, 2018 1 次提交
  9. 20 10月, 2018 1 次提交
  10. 19 10月, 2018 1 次提交
    • J
      feat: enhance cmake download & options (#1281) · b3fb9acf
      Jin Yi 提交于
      * feat: enhance cmake download & options
      
      * feat(tools/): add share libs build scripts
      
      * fix: add cmake options
      
      * feat: add 3rd party download
      
      * chore: updat README
      
      * fix: fix protobuf & cmake repo
      
      * fix: fix options name
      
      * chore: merge 3rd_party.cmake & third_party.cmake
      
      * chore: revert pre cmake URL fix
      
      * chore: update ExternalProject check
      
      * fix: fix typo & missing download
      
      * fix: fix download url
      
      * chore: update readme
      
      * chore: fix typo
      
      * fix: fix bugs
      
      * fix: fix bugs
      
      * fix: fix pre
      
      * print all third party libs
      
      * refine readme
      
      * DOWNLOAD_THIRD_PARTY -> PRECOMPILED_THIRD_PARTY
      
      * refine readme
      
      * minor typo fix
      
      
      Former-commit-id: d7d1ec98a868c32e3a43658823ae136caa73feb5
      b3fb9acf
  11. 17 10月, 2018 1 次提交
    • S
      Fix snapshot (#1320) · 71d34a97
      Shiyuan Shang-Guan 提交于
      * fix bug of snapshot
      
      * refine distribute.sh
      
      * use more accurate function calls
      
      * rename function
      
      * update for model parallel
      
      * refine code
      
      
      Former-commit-id: e0c2ad2b2dad82e0cb3adce6de9fba98f0c4434c
      71d34a97
  12. 14 10月, 2018 1 次提交
    • J
      gpu (#1310) · e5764885
      Juncheng 提交于
      
      
      Former-commit-id: 82681d523fa9e521e2c04b5fd32e6f435f9ba722
      e5764885
  13. 12 10月, 2018 3 次提交
  14. 11 10月, 2018 1 次提交
  15. 09 10月, 2018 3 次提交
  16. 05 10月, 2018 2 次提交
  17. 03 10月, 2018 2 次提交
  18. 02 10月, 2018 2 次提交
  19. 01 10月, 2018 5 次提交
    • L
      Dev pod desc (#1268) · 1c29eb42
      Li Xinqi 提交于
      * available instance num
      
      * import shape.proto
      
      * PodProto
      
      * rename message
      
      * union pod is useless
      
      * PodPtr
      
      * rename: PodPtr::get() => PodPtr::Get()
      
      * BlobDescProto.pod
      
      * mv register_desc.time_shape into another pr
      
      * pod_helper.h
      
      * FieldAlignedByteSize
      
      * pod_desc
      
      * PodDesc copy constructor
      
      * BlobDesc::body_shape_pod_desc_
      
      * add BlobDesc::opaque_header_pod_desc_
      
      * align_shift => alignment
      
      * default alignment
      
      * add field Blob::header_pod_ptr_
      
      * rename AlignedFieldPodProto => FieldPodProto
      
      * bugfix
      
      * check
      
      * FieldId
      
      * simplify RtBlobDesc
      
      * simplify Blob
      
      * ShapedPod => TensorPod
      
      * refine ComputePackedBlobDesc
      
      
      Former-commit-id: 8800da93
      1c29eb42
    • N
      fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g.... · 09761973
      Niu Chong 提交于
      fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst (#1274)
      
      * fix(normal_model_update_compute_actor): fix send forward_model_regst_ to consumer
      
      * fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst
      
      
      Former-commit-id: 139c2241
      09761973
    • S
      refine cudnn_limit_buf (#1271) · 8626f4c2
      Shiyuan Shang-Guan 提交于
      * refine cudnn_limit_buf
      
      * rename default_cudnn_buf_limit_mbyte -> cudnn_buf_limit_mbyte
      
      
      Former-commit-id: 7390c2f7
      8626f4c2
    • N
      fix(normal_forward_compute_actor): fix SendMsgToForwardModelSaveActor() (#1270) · 99d64b78
      Niu Chong 提交于
      * fix(normal_forward_compute_actor): fix SendMsgToForwardModelSaveActor()
      
      * refine(normal_forward_compute_actor)
      
      
      Former-commit-id: d746016e
      99d64b78
    • J
      enlarge the cudnn buf to 4GB (#1269) · ce674856
      Jinhui Yuan 提交于
      
      
      Former-commit-id: 28f981eb
      ce674856
  20. 30 9月, 2018 1 次提交
    • N
      Refactor Actor (#1259) · 9fda43bf
      Niu Chong 提交于
      * feat(register_slot): add the RegstSlot
      
      * feat(register_slot): update RegstSlot if
      
      * feat(actor): update member of Actor to use RegstSlot
      
      * fix(register_slot): fix the available_regst_desc_cnt init val
      
      * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId
      
      * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst
      
      * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst
      
      * fix(register_slot): fix the CHECK empty
      
      * feat: remove actual_writeable_regst_desc_id_ from Actor, add Naive/CustomizedProducedRegst
      
      * fix(normal_model_update_actor): bug: not send customized regst to consumer when SendIntialModel
      
      * fix(normal_forward_compute_actor): bug: not add kLoss/kAccuracy produced regst to NaiveProducedRegst
      
      * fix(actor): UNIMPLEMENTED() for AsyncSendCustomizedProducedRegstMsgToConsumer
      
      * fix(normal_forward_compute_actor): set const_buf_regst to nullptr when recv from consumers
      
      * fix(actor): total_reading_data_regst_cnt, not total_reading_ctrl_regst_cnt
      
      * refactor: update GetNaiveConsumedRegstDescName to GetNaiveOrCustomizedConsumedRegstDescName(same for Produced)
      
      * feat: combine data_regst and ctrl_regst in Actor
      
      * fix: fix bugs
      
      * fix: fix bugs
      
      * fix: remove .swp files and unused LOG
      
      * feat: split Act and SendMsg (#1255)
      
      * feat: split Act and SendMsg
      
      * refine: rename HandleProduced/ConsumedDataRegst.. to HandleProduced/ConsumedNaiveDatRegst..
      
      * fix(input_wise_comp_actor): bug: not set piece id
      
      * fix(actor): potential bug: produced msg with no allowed actor still pop from queue
      
      * refactor: mv some protected member function to private
      
      * fix(actor): fix the condition about sending EORD msg
      
      * refactor(input_wise_actor): use RegstSlot in InputWiseActor
      
      * fix(copy_comm_net_actor): rename piece_id2regst_ctx to piece_id2regst_ctx_
      
      * refactor: rename Name2RegstDescId to Name2RegstDescIds
      
      * refactor(naive_actor): "override final" instead of only "final"
      
      * refine(actor): little refine
      
      * feat: update the return type of GetNaiveOrCustomizedNamesRegstDescName to enum class RegstNameType
      
      
      Former-commit-id: e042befc
      9fda43bf
  21. 26 9月, 2018 2 次提交
    • S
      add impl of lars (#1163) · 388b945f
      Shiyuan Shang-Guan 提交于
      * add lars set
      
      * add lars
      
      * override ibn&obn to lbi
      
      * make model update consistent
      
      * check cuda stream sync
      
      * add LARSUpdateModelGpu
      
      * checkout naive & momentum model update
      
      * use cublas::dot compute SumOfSquare
      
      * update lars for master
      
      * refine lars for master
      
      
      Former-commit-id: 9518970b
      388b945f
    • qq_22305325's avatar
      Hinge loss test (#1263) · 3343e9b5
      qq_22305325 提交于
      * hinge_loss_kernel_test
      
      * fix opkernel_test
      
      * fix test file
      
      * optimize test file
      
      * opyimize opkernel test
      
      * complete opkernel test interface
      
      
      Former-commit-id: 7faf75a6
      3343e9b5
  22. 25 9月, 2018 2 次提交
  23. 24 9月, 2018 1 次提交
    • J
      Dev use nccl (#1198) · 9201b815
      Jinhui Yuan 提交于
      * add nccl dependency
      
      * add nccl comm handle
      
      * nccl allreduce works
      
      * NcclAllreduce -> NcclAllReduce
      
      * fix header guard
      
      * add NcclReduceScatter, NcclAllGather
      
      * complete ReduceScatter and AllGather, (with cuda error)
      
      * change variable name
      
      * reduce-scatter, all-gather works
      
      * add NcclScatter and NcclGather work type
      
      * Dev use nccl add nccl comm manager (#1206)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * format change
      
      * Dev use nccl merge reduce share mem (#1216)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * ReduceGather
      
      * GlobalAdd
      
      * ReduceScatter
      
      * EnableIfNeed
      
      * ConcatSplit
      
      * EnableMemSharing for pred if need
      
      
      EnableMemSharing for pred if need
      
      * CtrlEdge for Gather
      
      * CtrlEdge for GlobalAdd
      
      * LocalAdd CtrlEdge
      
      * CollectReduceTaskNode
      
      * reverse nodes
      
      * local_add_mem_sharing
      
      
      local add mem sharing
      
      * global add mem sharing
      
      * reduce_mem_sharing
      
      * bugfix
      
      * refine
      
      * format change (remove empty lines)
      
      * format change
      
      * fix local_add and gather issues
      
      * Dev refactor reduce add (#1218)
      
      * change ReduceGlobalAdd to ReduceAdd
      
      * rm ReduceLocalAdd
      
      * no mem sharing case works
      
      * let ReduceAddCompActor decide whether it is local or global
      
      * multi machine multi gpus Nccl and Oneflow allreduce works
      
      * refine
      
      * extract SortEdges
      
      * make EdgeInfo protected
      
      * Dev use nccl refine (#1220)
      
      * const qualifier
      
      * PoorCompTaskNode=>PipeCompTaskNode
      
      * int=>int32_t
      
      * refine ReduceMemSharingCtx
      
      * NcclDeviceCtx and NcclActor
      
      
      NcclDeviceCtx and NcclActor
      
      * empty line
      
      * CudaDeviceCtx<-NcclDeviceCtx
      
      * fix wrong rank_id in reduce_add_actor (#1229)
      
      * fix wrong rank_id in reduce_add_actor
      
      * rm device_num_of_each_machine from parallel_ctx
      
      * fix reduce gather control edge (#1235)
      
      * fix reduce gather control edge
      
      * extract FindNearestReduceAddCompTaskNode
      
      * extract method ReduceCompTaskNodeIf::FindPredRduceTaskNodeIf
      
      * CHECK nearest_add_copy_d2h
      
      * Dev use nccl cross machine nccl all reduce (#1246)
      
      * support ncclAllReduce cross machine
      
      * fix rank_id and rank_num for mix
      
      * reformat
      
      * reformat
      
      * simplify nccl_kernel (#1256)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD (#1260)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD
      
      * note
      
      * Dev use nccl reduce ranking ctx (#1252)
      
      * reformat
      
      * compute rank_id and rank_num with FixCompTaskNode
      
      * reformat
      
      * fix rank_id for reduceadd
      
      * ReduceRankingCtx
      
      * New Ranking and MemSharing for Reduce
      
      * DECLARE_REDUCE_LOGICAL_NODE
      
      * Ranking4NcclAllReduce
      
      * fix ranking
      
      * remove AsTaskNode
      
      * reformat
      
      * runtime rank ctx
      
      * rank_set
      
      * bugfix
      
      * bugfix
      
      * unittest
      
      * change use_nccl_all_reduce_cross_machine to use_nccl_inter_node_communication
      
      * refine
      
      
      refine
      
      * move BuildCtrlRegstBetweenReduceCopyNodes to ReduceAddCompTaskNode
      
      * CHECK mem_size_
      
      
      Former-commit-id: 55496813
      9201b815
  24. 23 9月, 2018 1 次提交
  25. 19 9月, 2018 2 次提交