1. 19 8月, 2018 3 次提交
  2. 18 8月, 2018 4 次提交
  3. 17 8月, 2018 2 次提交
    • C
      remove boxing/121 out regst (#1126) · ce5aa7fb
      cheng cheng 提交于
      * BldSubGrpBy 121/boxing use same buf task
      
      * remove 121 boxing regst from loss/decode compute task node
      
      * remove B121 regst in fw/bw/compute task node, remove lbi_121/boxing in logical node
      
      * remove b121
      
      * fix bug in bw task node
      ce5aa7fb
    • S
      refine ChainGraph (#1123) · 8616e0d4
      strickland12 提交于
      * rm TryMerge
      
      * rm extra loop in CollectAncestorsForEachNode
      8616e0d4
  4. 16 8月, 2018 3 次提交
  5. 15 8月, 2018 4 次提交
  6. 13 8月, 2018 1 次提交
    • N
      feat: update reduce kernel to support inplace compute (#1105) · e15fea94
      Niu Chong 提交于
      * feat(reduce_scatter_kernel): reduce scatter kernel is as before for cpu, and do nothing for gpu
      
      * feat(reduce_local_add_kernel): update reduce_local_add kernel for in-place on GPU
      
      * feat: update reduce_local_add actor to support inplace kernel
      
      * feat: add support for inplace in reduce_global_add/gather
      
      * fix: use ibn.substr(3) to specify inplace in_blob other than in_bn_id
      e15fea94
  7. 12 8月, 2018 1 次提交
    • N
      feat: enable mem sharing in ReduceStruct (#1097) · 25ddb833
      Niu Chong 提交于
      * feat(register_desc): add mem_shared_offset
      
      * feat(regst_desc): set the default val of mem_shared_offset as -1
      
      * feat(register_manager): add offset in register_manager
      
      * feat: add EnableMemSharingInReduceStruct(), set the mem_shared_id/offset of reduce regst
      
      * feat(task_graph): AddCtrlEdge4MemSharingInOneReduce()
      
      * feat: set mem_shared of regst produced by reduce/copy task nodes as true
      
      * refactor(copy_task_node): regsts produced by CopyH2D and consumed by reduce_task are able to share mem, others stays unable
      
      * fix(copy_task_node): not use SoleOutEdge() for copy_task_node
      
      * fix(task_graph): fix the compile bug
      
      * feat: set min/max regst_num of regst produced by reduce/copy task_node as 1
      
      * feat: support mem_shared_offset in improver
      
      * fix(copy_task_node): set max_regst_num of copy node succeed reduce nodes as 1
      
      * refactor(task_graph): remove FindSuccReduceTaskNode() from member function, just as lambda function
      
      * refactor(task_graph): refine EnableMemSharingInOneReduce()
      
      * fix(reduce_scatter_task_node): fix the bug of wrong out_regst_name when parallel_num==machine_num
      
      * refine: refine due to comment
      25ddb833
  8. 08 8月, 2018 2 次提交
  9. 06 8月, 2018 1 次提交
    • J
      Seperate thread (#1045) · 18bda5ca
      Jinhui Yuan 提交于
      * add tuple switch;
      
      * 分离record_load和loss_print线程,动态分配线程id;
      让persistence,comm_net线程和其创建顺序解耦.
      
      * seperate persistence thread; dynamically setting the thread id of the persistence tasknode and creating persistence threads.
      
      * improve seperate thread
      
      * improve algorithm; add unit test;
      
      * simplify algorithm
      
      * merge master
      
      * solve conflict
      
      * add braces for if statement
      
      * improve code
      
      * change persitence_work_num to max_mdsave_work_num; add CHECK; add a unique function(remove duplicated elements) into util.h
      
      * use std::unique; improve code; update unit test.
      
      * simplify code: use std::set to unique; replace "+=" into "=";
      18bda5ca
  10. 04 8月, 2018 5 次提交
    • J
      fix: 1,4d packed_blob 2, collect_act_event (#1087) · 6cb2ef86
      Jinhui Yuan 提交于
      6cb2ef86
    • J
      af808c6e
    • J
      refine collect active event pre-condition (#1084) · e3f87ee3
      Jinhui Yuan 提交于
      e3f87ee3
    • S
      fixed MdSave Op DeviceType (#1086) · 0410f099
      strickland12 提交于
      0410f099
    • N
      feat: modify the allreduce to build base for supportting in-place reduce op (#1080) · f8c88f27
      Niu Chong 提交于
      * feat(task_graph.cpp): reconnect the edge between ReduceLocalAdd and ReduceGlobalAdd
      
      * feat: round up the packed_blob_desc of md_diff_regst to parallel_num
      
      * feat: round up the packed_blob_desc of md_diff_acc regst, and remove MutPackedBlobDesc() interface from RegstDesc
      
      * fix(task_graph.cpp): remove the CHECK of same machine when building edge between ReduceScatter and ReduceGlobalAdd
      
      * feat(task_graph.cpp): connect same pair of ReduceScatter and ReduceLocalAdd with duplicate edges
      
      * refactor(reduce_scatter_comp_task_node): update the bind of produced_regst and obn
      
      * chore: add some check
      
      * feat(reduce_scatter_op): update the InferBlobDescs()
      
      * fix(reduce_scatter_compute_task_node): fix the bug of set produced out regst name
      
      * refactor(reduce_scatter_comp_task_node): rename the produced out regst
      
      * fix(reduce_scatter_comp_task_node): fix the CHECK macro
      
      * refine(reduce_scatter_op): refine the CHECK macro
      
      * feat: udpate reduce_local_add tasknode/op/op_conf to support new connecting
      
      * feat: update the actor for reduce and reduce_local kernel
      
      * fix: cannot find lbi in reduce_local_add, so identify regsts by name_in_producer
      
      * fix(normal_backward_comp_task_node): consider normal backward with no model_diff
      
      * fix(input_wise_compute_actor): fix the bug of update member status after send to consumer
      
      * fix merge conflicts
      
      * refactor: deduct the relation between ibn and in_regst in Build() other than ConsumeRegst() for ReduceLocalAdd
      
      * refine: refine the CHECK macro
      f8c88f27
  11. 03 8月, 2018 1 次提交
    • J
      Dev face cudnn conv test (#1066) · deed0655
      Jinhui Yuan 提交于
      * modify the cudnn algorithm selction API
      
      * remove the redundant setting of bwd_data
      
      * modify the conv algo seletion API to the *Ex
      
      * modify the cudnn max workspace to 1g
      
      * pre-allocate work space
      
      * fix empty fw_cudnn_buf issue
      deed0655
  12. 02 8月, 2018 7 次提交
    • C
      rm IsBwClone (#1078) · 8c895ee9
      cheng cheng 提交于
      8c895ee9
    • C
      Dev blob header cpu (#1056) · 8d2daef3
      chengtbf 提交于
      * MemBlobDesc
      
      * refine blob runtime
      
      * refine copyHd clone reshape kernel
      
      * move blob header to cpu
      
      * refine RoundUp of Blob size
      
      * fix bug of runtime regst desc packed blob
      
      * refine (#1059)
      
      * add loss test data id note
      
      * Dev blob header cpu3 (#1062)
      
      * add blob body desc
      
      * make blob body desc work
      
      * make blob header desc work
      
      * rm MemBlobDesc
      
      * add CellDesc
      
      * add is_packed
      
      * add chunk desc
      
      * rm CellDesc
      
      * chunk desc -> field desc
      
      * add RtBlobDesc
      
      * refine
      
      * refine
      
      * body_desc -> body_field
      
      * Infer header field desc in BlobDesc.ToProto()
      
      * refine
      
      * RuntimeBlobDesc constructor
      
      * runtime uses RtBlobDesc workable, still has data_id bug
      
      * make some accessors of BlobDesc private
      
      * rm header_byte_size
      
      * fix stupid bug
      
      * refine is_packed -> header_is_packed
      
      * header_is_packed -> header_is_opaque
      
      * refactor BlobDesc.Proto
      
      * add constructor BlobDesc -> RtBlobDesc
      
      * refactor, remove useless code
      
      * refine
      
      * fix bug of regst manager allocate separated mem
      
      * refine blob_desc constructer
      
      * remove BlobDesc(const BLobDescProto&)
      
      * refine code
      
      * delete test code
      
      * remove useless code
      
      * add BlobDesc()from proto
      
      * make opaque header mutual exclusive (#1077)
      
      * make opaque header mutual exclusive
      
      * refine kernel.proto
      
      * has_data_id -> need_do_data_id
      8d2daef3
    • C
      refine activation and remove RemoveBackwardAdd (#1076) · ed46b7c2
      chengtbf 提交于
      * half impl
      
      * refine activation
      
      * kernel if with activation
      
      * remove virtual of infer if
      ed46b7c2
    • J
      fix empty bw_cudnn_buf (#1075) · b23d18f0
      Jinhui Yuan 提交于
      b23d18f0
    • J
      fix fw_buf of reduce_sum op (#1074) · bdceef38
      Jinhui Yuan 提交于
      bdceef38
    • C
      add fw/bw buf blob,remove thread/device buf (#1072) · c8853f81
      chengtbf 提交于
      * half impl
      
      * op inferbwbufblobdesc
      
      * fw_buf bw_buf regst
      
      * buf_blob instead of thread_buf for conv_op
      
      * cudnn fw bw buf runnable
      
      * fw bw buf for softmax
      
      * remove buf of softmax loss
      
      * remove buf_size of reduce_sum
      
      * remove thread buf & device buf
      
      * use fw_buf for reduce_sum_op fw_tmp blob
      c8853f81
    • C
      51962720
  13. 01 8月, 2018 1 次提交
  14. 28 7月, 2018 1 次提交
  15. 27 7月, 2018 2 次提交
  16. 25 7月, 2018 2 次提交
    • C
      add jpeg encoder (#1052) · 098db839
      chengtbf 提交于
      098db839
    • S
      Chain act graph (#1019) · 14698b5b
      strickland12 提交于
      * DFS ChainActGraph
      
      * provide interface to improver
      
      * Add RegstAct
      
      * rename to chain_act_graph
      
      * add new file
      
      * Add ForEachRegstActDuration
      
      * use RegstActCtx
      
      * use constructor
      
      * use pair instead of node_str
      
      * Construct RegstAct togather
      
      * change ParseActEvents to std::list<std::unique_ptr<ActEvent>>
      
      * fake_producer_outs
      
      * use Duration4ActEvent
      
      * add CHECK
      
      * delete act_graph.cpp
      
      * use node->ForEach
      
      * add CHECK
      
      * remove false CHECK
      14698b5b