- 15 8月, 2018 3 次提交
-
-
由 Li Xinqi 提交于
-
由 Niu Chong 提交于
-
由 Jinhui Yuan 提交于
-
- 13 8月, 2018 1 次提交
-
-
由 Niu Chong 提交于
* feat(reduce_scatter_kernel): reduce scatter kernel is as before for cpu, and do nothing for gpu * feat(reduce_local_add_kernel): update reduce_local_add kernel for in-place on GPU * feat: update reduce_local_add actor to support inplace kernel * feat: add support for inplace in reduce_global_add/gather * fix: use ibn.substr(3) to specify inplace in_blob other than in_bn_id
-
- 12 8月, 2018 1 次提交
-
-
由 Niu Chong 提交于
* feat(register_desc): add mem_shared_offset * feat(regst_desc): set the default val of mem_shared_offset as -1 * feat(register_manager): add offset in register_manager * feat: add EnableMemSharingInReduceStruct(), set the mem_shared_id/offset of reduce regst * feat(task_graph): AddCtrlEdge4MemSharingInOneReduce() * feat: set mem_shared of regst produced by reduce/copy task nodes as true * refactor(copy_task_node): regsts produced by CopyH2D and consumed by reduce_task are able to share mem, others stays unable * fix(copy_task_node): not use SoleOutEdge() for copy_task_node * fix(task_graph): fix the compile bug * feat: set min/max regst_num of regst produced by reduce/copy task_node as 1 * feat: support mem_shared_offset in improver * fix(copy_task_node): set max_regst_num of copy node succeed reduce nodes as 1 * refactor(task_graph): remove FindSuccReduceTaskNode() from member function, just as lambda function * refactor(task_graph): refine EnableMemSharingInOneReduce() * fix(reduce_scatter_task_node): fix the bug of wrong out_regst_name when parallel_num==machine_num * refine: refine due to comment
-
- 08 8月, 2018 2 次提交
-
-
由 cheng cheng 提交于
* RmUselessConsumeRelationshipBetweenFwBw impl * add need in/out blob when backward in ops * remove NormalBackwardActor rely to out regst desc id * remove log and fix warning * fix code
-
由 Jinhui Yuan 提交于
-
- 06 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* add tuple switch; * 分离record_load和loss_print线程,动态分配线程id; 让persistence,comm_net线程和其创建顺序解耦. * seperate persistence thread; dynamically setting the thread id of the persistence tasknode and creating persistence threads. * improve seperate thread * improve algorithm; add unit test; * simplify algorithm * merge master * solve conflict * add braces for if statement * improve code * change persitence_work_num to max_mdsave_work_num; add CHECK; add a unique function(remove duplicated elements) into util.h * use std::unique; improve code; update unit test. * simplify code: use std::set to unique; replace "+=" into "=";
-
- 04 8月, 2018 5 次提交
-
-
由 Jinhui Yuan 提交于
-
由 Jinhui Yuan 提交于
-
由 Jinhui Yuan 提交于
-
由 strickland12 提交于
-
由 Niu Chong 提交于
* feat(task_graph.cpp): reconnect the edge between ReduceLocalAdd and ReduceGlobalAdd * feat: round up the packed_blob_desc of md_diff_regst to parallel_num * feat: round up the packed_blob_desc of md_diff_acc regst, and remove MutPackedBlobDesc() interface from RegstDesc * fix(task_graph.cpp): remove the CHECK of same machine when building edge between ReduceScatter and ReduceGlobalAdd * feat(task_graph.cpp): connect same pair of ReduceScatter and ReduceLocalAdd with duplicate edges * refactor(reduce_scatter_comp_task_node): update the bind of produced_regst and obn * chore: add some check * feat(reduce_scatter_op): update the InferBlobDescs() * fix(reduce_scatter_compute_task_node): fix the bug of set produced out regst name * refactor(reduce_scatter_comp_task_node): rename the produced out regst * fix(reduce_scatter_comp_task_node): fix the CHECK macro * refine(reduce_scatter_op): refine the CHECK macro * feat: udpate reduce_local_add tasknode/op/op_conf to support new connecting * feat: update the actor for reduce and reduce_local kernel * fix: cannot find lbi in reduce_local_add, so identify regsts by name_in_producer * fix(normal_backward_comp_task_node): consider normal backward with no model_diff * fix(input_wise_compute_actor): fix the bug of update member status after send to consumer * fix merge conflicts * refactor: deduct the relation between ibn and in_regst in Build() other than ConsumeRegst() for ReduceLocalAdd * refine: refine the CHECK macro
-
- 03 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* modify the cudnn algorithm selction API * remove the redundant setting of bwd_data * modify the conv algo seletion API to the *Ex * modify the cudnn max workspace to 1g * pre-allocate work space * fix empty fw_cudnn_buf issue
-
- 02 8月, 2018 7 次提交
-
-
由 cheng cheng 提交于
-
由 chengtbf 提交于
* MemBlobDesc * refine blob runtime * refine copyHd clone reshape kernel * move blob header to cpu * refine RoundUp of Blob size * fix bug of runtime regst desc packed blob * refine (#1059) * add loss test data id note * Dev blob header cpu3 (#1062) * add blob body desc * make blob body desc work * make blob header desc work * rm MemBlobDesc * add CellDesc * add is_packed * add chunk desc * rm CellDesc * chunk desc -> field desc * add RtBlobDesc * refine * refine * body_desc -> body_field * Infer header field desc in BlobDesc.ToProto() * refine * RuntimeBlobDesc constructor * runtime uses RtBlobDesc workable, still has data_id bug * make some accessors of BlobDesc private * rm header_byte_size * fix stupid bug * refine is_packed -> header_is_packed * header_is_packed -> header_is_opaque * refactor BlobDesc.Proto * add constructor BlobDesc -> RtBlobDesc * refactor, remove useless code * refine * fix bug of regst manager allocate separated mem * refine blob_desc constructer * remove BlobDesc(const BLobDescProto&) * refine code * delete test code * remove useless code * add BlobDesc()from proto * make opaque header mutual exclusive (#1077) * make opaque header mutual exclusive * refine kernel.proto * has_data_id -> need_do_data_id
-
由 chengtbf 提交于
* half impl * refine activation * kernel if with activation * remove virtual of infer if
-
由 Jinhui Yuan 提交于
-
由 Jinhui Yuan 提交于
-
由 chengtbf 提交于
* half impl * op inferbwbufblobdesc * fw_buf bw_buf regst * buf_blob instead of thread_buf for conv_op * cudnn fw bw buf runnable * fw bw buf for softmax * remove buf of softmax loss * remove buf_size of reduce_sum * remove thread buf & device buf * use fw_buf for reduce_sum_op fw_tmp blob
-
由 chengtbf 提交于
-
- 01 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
-
- 28 7月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* use BfsTopo instead. DfsTopo causes issues in ReduceStruct * add comment
-
- 27 7月, 2018 2 次提交
-
-
由 Jinhui Yuan 提交于
* remove staleness * ensure balanced placement for data and model parallel * add device_num_of_each_machine
-
由 chengtbf 提交于
-
- 25 7月, 2018 2 次提交
-
-
由 chengtbf 提交于
-
由 strickland12 提交于
* DFS ChainActGraph * provide interface to improver * Add RegstAct * rename to chain_act_graph * add new file * Add ForEachRegstActDuration * use RegstActCtx * use constructor * use pair instead of node_str * Construct RegstAct togather * change ParseActEvents to std::list<std::unique_ptr<ActEvent>> * fake_producer_outs * use Duration4ActEvent * add CHECK * delete act_graph.cpp * use node->ForEach * add CHECK * remove false CHECK
-
- 24 7月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* optimzie chain merge with bitset * fix task_uid_cnt_ initialization * parallel chain merging * refine * extract MergeTaskNodes * shrink the size of bitset with local task_uid
-
- 23 7月, 2018 2 次提交
- 22 7月, 2018 4 次提交
-
-
由 Jinhui Yuan 提交于
-
由 chengtbf 提交于
* half impl * remove eigen * remove unsupported eigen * remove blob implement * remove BlobIf * fix ptr bug of blob copy from * set_regst_desc only used by regst manager
-
由 Jinhui Yuan 提交于
-
由 chengtbf 提交于
-
- 21 7月, 2018 2 次提交
-
-
由 Li Xinqi 提交于
* check oom for experimental phrase * refine code
-
由 ShawnXuan 提交于
* initial attempt of removing backward add * workable * refine * add backward_activation for KernelConf * add 3 functions to support move backward activation. * simplify 3 functions to 1 * add backward_activation in operator * add GetBackwardActivationType, refine GetActivationType. * add AfterBackwardActivation in KernelIf * SetEnumValue support in protobuf.h set activation func in operator * bug fix: set cur_node activation after pre_node * check cur_node op_vec size * set bw_node_ nullptr for add_fw_node * pre_node is in backward area or has loss op * AfterBackwardActivation -> PostBackwardActivation * remove activation blob in kernel * keep removing activation blob ... * post backward activation for loss kernel * Retrieve BuildAccuracyPrintStruct * modify post backward activation * resolve code review issues * refine * refine AddOneBackwardClone * refine RemoveOneBackwardAdd * add forward_activation for kernel proto * add ibn blob for bw clone task node * rm NeedOutWhenBackward * remove
-
- 19 7月, 2018 2 次提交
-
-
由 qicosmos 提交于
* add tuple switch; * simplify code of add/clone kernel * add static_assert to check error at compile time * remove extral parameter * rename function object
-
由 Jinhui Yuan 提交于
support using synthetic data for training (working around IO bottleneck for testing purpose) (#1036)
-
- 18 7月, 2018 1 次提交
-
-
由 Niu Chong 提交于
fix(actor.cpp): fix the bug that return wrong consumed ctrl Regst in AsyncSendCtrlRegstMsg() (#1031)
-
- 17 7月, 2018 1 次提交
-
-
由 Li Xinqi 提交于
-