- 07 9月, 2018 3 次提交
-
-
由 Niu Chong 提交于
* feat(register_slot): add the RegstSlot * feat(register_slot): update RegstSlot if * feat(actor): update member of Actor to use RegstSlot * fix(register_slot): fix the available_regst_desc_cnt init val * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst * fix(register_slot): fix the CHECK empty Former-commit-id: 38a50de4
-
由 Jinhui Yuan 提交于
* add ReduceScatter2, ReduceAdd2, ReduceGather2 op and kernel * add ReduceScatter2, ReduceAdd2, ReduceGather2 task node and actor * complete Reduce2 op * TODO: complete ReduceAdd2 kernel * add ReduceScatter2 task to accept model_diff * sketch of connecting ReduceScatter2/Add2/Gather2 * build allreduce2 logical graph * connect allreduce2 task graph * ReduceScatter2 task node * complete ReduceAdd2, ReduceGather2 task node * simplify ReduceAdd2 actor * refactor ReduceAdd2 task node * let global add -> gather share path * separate ReduceLocalAdd2 and ReduceGlobalAdd2 * connect AllReduce2 task graph * complete ReduceGlobalAdd2 op * refine ReduceLocalAdd2 task node * complete ReduceGlobalAdd2 task node * global AllReduce2 works * add device_num_of_each_machine to parallel_context * simplify ReduceGlobalAdd2 runtime * multi machine multi gpus AllReduce2 works * add mem sharing and ctrl edge for AllReduce2 * single machine multiple gpu mem sharing works * refine * remove the previous allreduce * change AllReduce2 to AllReduce variable convention * change filename * complete transfer to allreduce2 * remove unnecessary format change * remove unnecessary format change * simplify * simplify mem sharing rule for reduce add and gather * check for local add * fix reduce_global_add actor bug * refine reduce task node * refine variable name * refine * refine Former-commit-id: 5909cc43
-
由 Jinhui Yuan 提交于
Former-commit-id: 34ce4862
-
- 06 9月, 2018 1 次提交
-
- 04 9月, 2018 5 次提交
-
-
由 qq_22305325 提交于
* add hinge loss * add hinge loss test * hack hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss Former-commit-id: 87db37ed
-
由 qq_22305325 提交于
* add matmul & dot & multiply * optimize dot kernel * fix multiply kernel code style * optimize matmul kernel Former-commit-id: 6ab4006f
-
由 qq_22305325 提交于
* add embedding look up infer blob desc * optimize inifer blob desc Former-commit-id: 6c92495a
-
由 qq_22305325 提交于
* add hinge loss * add hinge loss test * hack hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss Former-commit-id: e2da4ecf
-
- 03 9月, 2018 2 次提交
-
- 02 9月, 2018 3 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: 2ebe0205
- 01 9月, 2018 2 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: ccc3b389
- 31 8月, 2018 1 次提交
-
- 30 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: 40c299bc
-
- 29 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* sketch of merge reduce project * add reduce_concat, reduce_split in logical graph (#1160) * add reduce_concat, reduce_split in logical graph * init ReduceTaskNodes in CollectReduceTaskNodes * add CompTaskNode for ReduceConcat & ReduceSplit * set ReduceConcat/Split color index * copy blob desc from ReduceConcat in to ReduceSplit out * refine CollectReduceTaskNodes * SetMemSharing for ReduceConcat, ReduceSplit regst * complete ReduceConcat & ReduceSplit op * fill ReduceConcat & ReduceSplit kernel * simplify ReduceConcatCompActor * make ReduceScatter & ReduceSplit as input-wise actor * reduce_scatter & reduce_split use is_inplace * use ByteSizeOfBlobBody for reduce related packed blob * Fix dev merge reduce (#1168) * check concat and split occur simultaneously * fix ReduceScatter & ReduceSplit as Inputwise actor * ReduceConcat & ReduceSplit works * fix single gpu issue * Refactor reduce (#1170) * backup, not complete yet * remove reduce_id * rm useless comment * add reduce_graph (#1169) * add reduce_graph * fix iter * add IsLogicalNodeMergeable and fix bug * remove needless constructor calls * node VisualStr may conflict, using node_id_str instead * reduce group works (#1171) * refine * sort nodes in topo (#1172) * add reduce_group_size in job_conf, fix 121 config of ReduceSplit and MdUpdt * resolve code review issues (variable names) * refine variable names * Dev merge reduce rename reduce group (#1174) * ReduceGraph=>ChainLogicalGraph * rename Group=>Chain * reformat * use pointer instead of reference for mutable argument * format change * worker node only pull sub_plan (#1176) * log compile time * use c++11 member initialization syntax * FixPackedBlobDescOfProducedRegst for ReduceSplit * Dev merge reduce refine chain logical graph (#1177) * remove IsMerageable * split TryMergeOneChain and rename to TryMergeTwoChains * reformat * resolve review issues Former-commit-id: 3aa79c70
-
- 27 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: dc6fbefc
-
- 25 8月, 2018 2 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: a8b7dedb
-
由 Jinhui Yuan 提交于
* refactor EraseEmptyRegst (no dependence on weak_ptr) * weak_ptr -> shared_ptr * refine Former-commit-id: e585bba0
-
- 24 8月, 2018 3 次提交
-
-
由 strickland12 提交于
Former-commit-id: 55b46427
-
由 strickland12 提交于
* use resize() * use .size to calc bitset_num Former-commit-id: 400e277e
- 22 8月, 2018 3 次提交
-
-
由 strickland12 提交于
* if UseRelayPlacement * judge if there is only one gpu parallel_conf * refine * fix naive error Former-commit-id: 3ea8ae21
-
由 strickland12 提交于
* use Special judgment in InitNodeProducedRegstAct * abandon kMdUpdtArea ActEvents Former-commit-id: e853cef6
- 21 8月, 2018 1 次提交
-
-
由 Jinhui Yuan 提交于
* clear act_event_logger act_event_bin_filename * cluster_thrd_ids_key * simplify ofrecord_decoder multi-thread * let decoder use AllocateCpuThrdIdEvenly * let ofrecord_decoder use local thread pool Former-commit-id: a4860e5b
-
- 20 8月, 2018 6 次提交
-
-
由 Jinhui Yuan 提交于
Former-commit-id: cae14ff3
-
由 Jinhui Yuan 提交于
* caching the cudnn conv algorithm to eliminate duplicate calculation * refine cudnn conv algo ctx cache Former-commit-id: ccb7f43b
-
由 strickland12 提交于
Former-commit-id: b611c93d
-
由 strickland12 提交于
* rm collect kMdUpdtArea Ancestor * refine AddOrderingCtrlEdgeInSameChain * mv ChainGraph to SetChainIdAndOrderInGraphForEachNode * rm task_node ancestors * rm emplace() Former-commit-id: 691704b7
-
由 Jinhui Yuan 提交于
* fix typo * rm useless IsThisMachineMaster * refine the var name of naive_plan, mem_shared_plan, improved_plan * refactor PushPlan and PullPlan * let master node broadcast subplans instead the whole plan * remove useless code * rm useless code * use total_mbn_name_key Former-commit-id: b21c190b
- 19 8月, 2018 5 次提交
-
-
由 Li Xinqi 提交于
* backpropogate model_diff only if is trainable * bugfix: consume bw task node only if trainable * bugfix: connect md_updt and bw_node when bw_node is not null * bugfix: md_updt enter HandlerNormal only if there is model to train * set all op trainable = false when predicting Former-commit-id: be213666
-
由 strickland12 提交于
* rm MdUpdt chain merge * use area_id == kMdUpdtArea * rm judgement * refine IsSubset Former-commit-id: f9fe1ee0
-
由 Jinhui Yuan 提交于
Former-commit-id: 3654d164
-
由 Jinhui Yuan 提交于
* refine act_id order condition * strict act id check (excluding model regst) * add TODO: figure out the ActNumForEachOutput of model regsts to MdSave area Former-commit-id: 5be84c50
-
由 Jinhui Yuan 提交于
* remove blob_inited check * fix inplace feature of reduce add actor and kernel * rm useless code * add EnableInplace, support CPU allreduce Former-commit-id: 40a9b9a5
-