提交 · faf727b54a7457f4ab9e8a43cde95e9caafbe8bb · Oneflow-Inc / oneflow

23 9月, 2018 1 次提交
- N
  chore(of_submit): update of_submit to support recently proto updates (#1258) · faf727b5
  由 Niu Chong 提交于 9月 23, 2018
```
Former-commit-id: 9fdcf61f
```
  faf727b5
19 9月, 2018 2 次提交
- L
  no out_diff then no backward node (#1250) · 1d7eae6f
  由 Li Xinqi 提交于 9月 19, 2018
```
Former-commit-id: cf84a6e8
```
  1d7eae6f
- S
  make save_download_file_conf false (#1254) · fe7764e1
  由 Shiyuan Shang-Guan 提交于 9月 19, 2018
```
Former-commit-id: 31693ec1
```
  fe7764e1
18 9月, 2018 1 次提交

由 Li Xinqi 提交于 9月 18, 2018

* define_test_blob

* decode random compute task node

* rename define_test_blob_conf.name => define_test_blob_conf.out

* decode random task node color


Former-commit-id: 0476d2c2

8ebe859c

17 9月, 2018 6 次提交

moving model (#1234) · 3d5244c8

由 Li Xinqi 提交于 9月 17, 2018

* moving model

* moving_model => forward_model

* add todo commit

* two model save node

* let md_updt actor handle forward_model

* remove useless code

* rename local variable


Former-commit-id: baa146bd

3d5244c8

refine model update conf (#1240) · 33868c01

由 Shiyuan Shang-Guan 提交于 9月 17, 2018

* refine model update conf

* make todo

* add primary_lr and secondary_lr


Former-commit-id: 5ccd29d7

33868c01

S
fix loss/accuracy print op placement during logical graph construction (#1175) · b3286301
由 scxfjiang 提交于 9月 17, 2018
```
Former-commit-id: 4168d55e
```
b3286301

Dev refactor channel (#1181) · b012dc22

由 Juncheng 提交于 9月 17, 2018

* add enum ChannelStatus

* merge CloseSendEnd and CloseReceiveEnd

* update channel_test


Former-commit-id: fda25987

b012dc22

Refine runtime (#1108) · 03c635ba

由 Jinhui Yuan 提交于 9月 17, 2018

* only master machine saves plan and has event logger

* separate Data, Persistence, Cache, Log FileSystem config

* refine

* only specify data and snapshot path conf

* forbit multiple machines use localfs as snapshot fs

* networkfs as localfs

* refine

* Store log to snapshot (#1109)

* use machine id, drop machine name

* ensure setting machine id

* allow save snapshot to localfs for distributed training (#1113)

* Snapshot to master (#1116)

* allow save snapshot to localfs for distributed training

* fix mdSave to master for model parallel

* fix review comment issues

* add sanity check for machine id

* rm useless comments

* update example

* Dev refine runtime add log stream mgr (#1142)

* add LogStreamMgr

* refine and refactor OutStream=>LogStream

* bugfix

* use LogStreamMgr to write graph, dot, plan, profile and proto

* refine

* simplify, remove LogStreamMgr (#1243)

* simplify, remove LogStreamMgr

* TeePersistentLogStream add static factory (#1244)


Former-commit-id: d76513b3

03c635ba

C
fix bug of forward model -> copyD2H conflict with out regst (#1242) · b3f6e061
由 cheng cheng 提交于 9月 17, 2018
```
* fix bug of forward model -> copyD2H conflict with out regst

* use 1 line


Former-commit-id: 0da0646c
```
b3f6e061

16 9月, 2018 2 次提交
- L
  loss print has no in_diff (#1239) · de9e601d
  由 Li Xinqi 提交于 9月 16, 2018
```
Former-commit-id: f4e8f0fc
```
  de9e601d
- L
  Dev pb data type encode (#1241) · 77805f2d
  由 Li Xinqi 提交于 9月 16, 2018
```
* patch protobuf encode/decode files

* patch EncodeConf

* patch common/preprocessor.h


Former-commit-id: 517e1533
```
  77805f2d
15 9月, 2018 2 次提交

L
pb list data type (#1237) · d66ad601
由 Li Xinqi 提交于 9月 15, 2018
```
Former-commit-id: 58f43ff5
```
d66ad601

separate model for update (#1232) · 9f22ecaa

由 Shiyuan Shang-Guan 提交于 9月 15, 2018

* make each blob of the packed blob be updated separately in the ModelUpdate

* make blob descs in regst be consistent in bw->md_diff_acc->shared_md_diff_add->md_update->fw

* copy lbi2blob_descs from model

* add shared_model_diff_add kernel

* refine model_update actor and kernel

* rm useless TODO

* add shared_model_diff_add kernel

* refine code


Former-commit-id: 11408363

9f22ecaa

14 9月, 2018 2 次提交
- S
  refine op_type order (#1233) · a4461f07
  由 Shiyuan Shang-Guan 提交于 9月 14, 2018
```
Former-commit-id: c63bd8f3
```
  a4461f07
- L
  blob slice dptr (#1225) · 4e4abed6
  由 Li Xinqi 提交于 9月 14, 2018
```
* enable dptr<T>(...) if T is not void

* simplify dptr(...) by parameter packing


Former-commit-id: 642f1ba8
```
  4e4abed6
13 9月, 2018 1 次提交
- L
  mdupdt delayed topo (#1227) · cbf36fb9
  由 Li Xinqi 提交于 9月 13, 2018
```
Former-commit-id: 317267a0
```
  cbf36fb9
10 9月, 2018 2 次提交
- N
  fix(actor): bug: do not send consumed ctrl regst msg to producer rightly (#1222) · f6de1c9a
  由 Niu Chong 提交于 9月 10, 2018
```
Former-commit-id: 592227a1
```
  f6de1c9a
- J
  sketch: let input-wise actor and send input-wise ctrl msg (#1219) · 3d58474c
  由 Jinhui Yuan 提交于 9月 10, 2018
```
Former-commit-id: 9ddf4c53
```
  3d58474c
09 9月, 2018 1 次提交
- J
  fix: can not access the regst_desc_id of network regst (#1217) · 1d9ed416
  由 Jinhui Yuan 提交于 9月 09, 2018
```
Former-commit-id: 85496886
```
  1d9ed416
07 9月, 2018 3 次提交

feat: update the data members to use RegstSlot in Actor (#1208) · d0f50ede

由 Niu Chong 提交于 9月 07, 2018

* feat(register_slot): add the RegstSlot

* feat(register_slot): update RegstSlot if

* feat(actor): update member of Actor to use RegstSlot

* fix(register_slot): fix the available_regst_desc_cnt init val

* refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId

* feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst

* feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst

* fix(register_slot): fix the CHECK empty


Former-commit-id: 38a50de4

d0f50ede

Dev allreduce2 (#1211) · e1b30bd5

由 Jinhui Yuan 提交于 9月 07, 2018

* add ReduceScatter2, ReduceAdd2, ReduceGather2 op and kernel

* add ReduceScatter2, ReduceAdd2, ReduceGather2 task node and actor

* complete Reduce2 op

* TODO: complete ReduceAdd2 kernel

* add ReduceScatter2 task to accept model_diff

* sketch of connecting ReduceScatter2/Add2/Gather2

* build allreduce2 logical graph

* connect allreduce2 task graph

* ReduceScatter2 task node

* complete ReduceAdd2, ReduceGather2 task node

* simplify ReduceAdd2 actor

* refactor ReduceAdd2 task node

* let global add -> gather share path

* separate ReduceLocalAdd2 and ReduceGlobalAdd2

* connect AllReduce2 task graph

* complete ReduceGlobalAdd2 op

* refine ReduceLocalAdd2 task node

* complete ReduceGlobalAdd2 task node

* global AllReduce2 works

* add device_num_of_each_machine to parallel_context

* simplify ReduceGlobalAdd2 runtime

* multi machine multi gpus AllReduce2 works

* add mem sharing and ctrl edge for AllReduce2

* single machine multiple gpu mem sharing works

* refine

* remove the previous allreduce

* change AllReduce2 to AllReduce variable convention

* change filename

* complete transfer to allreduce2

* remove unnecessary format change

* remove unnecessary format change

* simplify

* simplify mem sharing rule for reduce add and gather

* check for local add

* fix reduce_global_add actor bug

* refine reduce task node

* refine variable name

* refine

* refine


Former-commit-id: 5909cc43

e1b30bd5

J
fix bug in add kernel of allreduce (#1214) · a76f47b3
由 Jinhui Yuan 提交于 9月 07, 2018
```
Former-commit-id: 34ce4862
```
a76f47b3

06 9月, 2018 1 次提交
- G
  fix Div function (#1212) · bc8b50c2
  由 guo ran 提交于 9月 06, 2018
```
Former-commit-id: 91432cb5
```
  bc8b50c2
04 9月, 2018 5 次提交

Dev hinge loss (#1207) · 9cdea308

由 qq_22305325 提交于 9月 04, 2018

* add hinge loss

* add hinge loss test

* hack hinge loss

* optimize hinge loss

* optimize hinge loss

* optimize hinge loss

* optimize hinge loss


Former-commit-id: 87db37ed

9cdea308

Dev matmul dot multiply (#1189) · 8100cf84

由 qq_22305325 提交于 9月 04, 2018

* add matmul & dot & multiply

* optimize dot kernel

* fix multiply kernel code style

* optimize matmul kernel


Former-commit-id: 6ab4006f

8100cf84

L
call cudnnBatchNormalizationForwardInference if trainable == flase (#1197) · d5c6eecb
由 Li Xinqi 提交于 9月 04, 2018
```
Former-commit-id: a21dea46
```
d5c6eecb

Dev embedding hb (#1188) · 5fa70913

由 qq_22305325 提交于 9月 04, 2018

* add embedding look up infer blob desc

* optimize inifer blob desc


Former-commit-id: 6c92495a

5fa70913

Dev hinge loss (#1190) · f676d774

由 qq_22305325 提交于 9月 04, 2018

* add hinge loss

* add hinge loss test

* hack hinge loss

* optimize hinge loss

* optimize hinge loss

* optimize hinge loss

* optimize hinge loss


Former-commit-id: e2da4ecf

f676d774

03 9月, 2018 2 次提交
- L
  split sources when infer shape (#1202) · a5f1e505
  由 Li Xinqi 提交于 9月 03, 2018
```
Former-commit-id: 34fb73fe
```
  a5f1e505
- L
  two pass to infer shape (#1200) · dd9be365
  由 Li Xinqi 提交于 9月 03, 2018
```
Former-commit-id: ece6957b
```
  dd9be365
02 9月, 2018 3 次提交
- L
  no blob coping gdb function (#1196) · 27630a89
  由 Li Xinqi 提交于 9月 02, 2018
```
Former-commit-id: da21ecd6
```
  27630a89
- L
  bugfix: bind in regst in backward task nodes (#1193) · 59b9db64
  由 Li Xinqi 提交于 9月 02, 2018
```
Former-commit-id: 400cf2a6
```
  59b9db64
- J
  fix bugs in prediction mode (#1194) · 6c7fb61c
  由 Jinhui Yuan 提交于 9月 02, 2018
```
Former-commit-id: 2ebe0205
```
  6c7fb61c
01 9月, 2018 2 次提交
- L
  gdb breakpoints function (#1192) · b84b880c
  由 Li Xinqi 提交于 9月 01, 2018
```
Former-commit-id: 32053d84
```
  b84b880c
- J
  fix reduce_gather in case of enable_mem_sharing == false (#1186) · 528aeab8
  由 Jinhui Yuan 提交于 9月 01, 2018
```
Former-commit-id: ccc3b389
```
  528aeab8
31 8月, 2018 1 次提交
- J
  fix order of shared model nodes (#1180) · 2436d1a1
  由 Juncheng 提交于 8月 31, 2018
```
Former-commit-id: 28a6fc98
```
  2436d1a1
30 8月, 2018 1 次提交
- J
  rm duplicate ReduceTaskNodes caused by ReduceConcat&Split (#1179) · 6a139c48
  由 Jinhui Yuan 提交于 8月 30, 2018
```
Former-commit-id: 40c299bc
```
  6a139c48
29 8月, 2018 1 次提交

sketch of merge reduce project (#1159) · 0252bca8

由 Jinhui Yuan 提交于 8月 29, 2018

* sketch of merge reduce project

* add reduce_concat, reduce_split in logical graph (#1160)

* add reduce_concat, reduce_split in logical graph

* init ReduceTaskNodes in CollectReduceTaskNodes

* add CompTaskNode for ReduceConcat & ReduceSplit

* set ReduceConcat/Split color index

* copy blob desc from ReduceConcat in to ReduceSplit out

* refine CollectReduceTaskNodes

* SetMemSharing for ReduceConcat, ReduceSplit regst

* complete ReduceConcat & ReduceSplit op

* fill ReduceConcat & ReduceSplit kernel

* simplify ReduceConcatCompActor

* make ReduceScatter & ReduceSplit as input-wise actor

* reduce_scatter & reduce_split use is_inplace

* use ByteSizeOfBlobBody for reduce related packed blob

* Fix dev merge reduce (#1168)

* check concat and split occur simultaneously

* fix ReduceScatter & ReduceSplit as Inputwise actor

* ReduceConcat & ReduceSplit works

* fix single gpu issue

* Refactor reduce (#1170)

* backup, not complete yet

* remove reduce_id

* rm useless comment

* add reduce_graph (#1169)

* add reduce_graph

* fix iter

* add IsLogicalNodeMergeable and fix bug

* remove needless constructor calls

* node VisualStr may conflict, using node_id_str instead

* reduce group works (#1171)

* refine

* sort nodes in topo (#1172)

* add reduce_group_size in job_conf, fix 121 config of ReduceSplit and MdUpdt

* resolve code review issues (variable names)

* refine variable names

* Dev merge reduce rename reduce group (#1174)

* ReduceGraph=>ChainLogicalGraph

* rename Group=>Chain

* reformat

* use pointer instead of reference for mutable argument

* format change

* worker node only pull sub_plan (#1176)

* log compile time

* use c++11 member initialization syntax

* FixPackedBlobDescOfProducedRegst for ReduceSplit

* Dev merge reduce refine chain logical graph (#1177)

* remove IsMerageable

* split TryMergeOneChain and rename to TryMergeTwoChains

* reformat

* resolve review issues


Former-commit-id: 3aa79c70

0252bca8

27 8月, 2018 1 次提交
- J
  fix issue: not unbind bn with empty regst (#1166) · 216c4585
  由 Jinhui Yuan 提交于 8月 27, 2018
```
Former-commit-id: dc6fbefc
```
  216c4585

Oneflow-Inc / oneflow 上一次同步 2 年多

Oneflow-Inc / oneflow
上一次同步 2 年多