1. 29 12月, 2018 3 次提交
  2. 28 12月, 2018 8 次提交
  3. 27 12月, 2018 1 次提交
  4. 26 12月, 2018 2 次提交
  5. 24 12月, 2018 4 次提交
    • L
      Dev refine transpose (#1594) · 966f3871
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      * faster adam kernel
      
      * refine dropout and transpose
      
      
      Former-commit-id: a1dd7c9b36f2114ef18e0c5f6303026d91e6fe6b
      966f3871
    • L
      Dev profiling adam (#1592) · 7cd97069
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      * faster adam kernel
      
      
      Former-commit-id: 5885d1ff7eb09cbd97ca13c22dabe3835af528a6
      7cd97069
    • S
      Fix mem sharing bug (#1593) · 0e43539e
      scxfjiang 提交于
      * fix a mem sharing bug
      
      * refine by review
      
      * remove previous if condition
      
      * refine
      
      
      Former-commit-id: 028244941572194047bfa033aa2fbe7a920c598d
      0e43539e
    • S
      fix a mem sharing bug (#1590) · 3ececc69
      scxfjiang 提交于
      
      
      Former-commit-id: 61ba5b711fc218b45f84f327d33d2ee11841b8bb
      3ececc69
  6. 22 12月, 2018 3 次提交
    • L
      Dev bert profiling (#1586) · 38d625b1
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      
      Former-commit-id: 606e964e640275e354b1280517945b0c95d09747
      38d625b1
    • J
      set dev id (#1583) · 10eb10fe
      jackalcooper 提交于
      
      
      Former-commit-id: c50a06af6f962751cdccebf7851b0a978be6ac7d
      10eb10fe
    • L
      Dev bert cuda event sync (#1581) · f9030def
      Li Xinqi 提交于
      * cudaSetDevice in actor poller threads
      
      * ReduceConcatCompActor ; NaiveActor
      
      
      Former-commit-id: cf98dd810e1b27f6a19270fc9619f52aa4cfa554
      f9030def
  7. 18 12月, 2018 3 次提交
    • J
      Dev bert layer norm (#1574) · 016612f6
      Juncheng 提交于
      * layer norm
      
      * layer_norm
      
      * fix trainable
      
      * fix
      
      * fix trainable
      
      * refine
      
      
      Former-commit-id: 01500a9334f323c988489825ad7366f39701152a
      016612f6
    • J
      gelu (#1578) · c2668a85
      Juncheng 提交于
      
      
      Former-commit-id: 99ab47eeec5c6d2499c45061636ab92df00f79ed
      c2668a85
    • L
      Dev group all reduce by model bytes (#1577) · c828b0ce
      Li Xinqi 提交于
      * group all reduce by model byte size
      
      * mv OpGraph into a seperate file op_graph.h
      
      
      Former-commit-id: f318bfa7c20737eae67213bc0e8ca412a408036a
      c828b0ce
  8. 17 12月, 2018 3 次提交
    • L
      add mdupdt ctrl edges within reduce group (#1575) · 844cfb85
      Li Xinqi 提交于
      
      
      Former-commit-id: 5b7e6be333f2cc8642fbff21e3e63021a6104217
      844cfb85
    • L
      Dev bert profile (#1573) · c9c61b1f
      Li Xinqi 提交于
      * 1) refactor reduce_group; 2) add new stream kReduceCtrl
      
      * 1) allreduce and model_update overlapping; 2) allreduce and fw overlapping
      
      
      Former-commit-id: c076ac09d65c0be738d3b1fe7fdc5889673c6e2b
      c9c61b1f
    • S
      Dev clip by global norm (#1521) · 967914ce
      Shiyuan Shang-Guan 提交于
      * clip_by_global_norm
      
      * update
      
      * refine model_update op
      
      * remove useless code
      
      * fix name
      
      * rename clip_norm
      
      * remove useless code
      
      * force init memory and add CHECK()
      
      * remove useless code and add comment
      
      * fixbug
      
      * refine code
      
      
      Former-commit-id: b59e94b6d897ad39519322f816c55187788b2009
      967914ce
  9. 13 12月, 2018 2 次提交
    • L
      fix tick op in dlnet (#1572) · efccaa67
      Li Xinqi 提交于
      
      
      Former-commit-id: 2175b10f0918afc0265ba0d4ac5b9eb42c920057
      efccaa67
    • N
      feat: add TickOp and BldSubTskGphByTickToSource (#1565) · aadb312c
      Niu Chong 提交于
      * feat: add Tick LogicalNode/TaskNode/Op/Kernel
      
      * feat: remove Tick LogicalNode/TaskNode
      
      * feat: add BldSubTskGphByTickToSource for TickOp
      
      * refine: refine due to comment
      
      * feat: add BldSubTskGphByRecordLoadToTick
      
      * refine: refine due to comment
      
      * refine: due to comment
      
      * refine: remove BldSubTskGphByRecordLoadToTick
      
      
      Former-commit-id: b8e75265059d13f666accb8e63707a772b5edcb3
      aadb312c
  10. 12 12月, 2018 2 次提交
    • L
      Dev tick (#1571) · 7cc6ff31
      Li Xinqi 提交于
      * feat: add Tick LogicalNode/TaskNode/Op/Kernel
      
      * feat: remove Tick LogicalNode/TaskNode
      
      * feat: add BldSubTskGphByTickToSource for TickOp
      
      * refine: refine due to comment
      
      * feat: add BldSubTskGphByRecordLoadToTick
      
      * pr tick op/kernel alone
      
      
      Former-commit-id: f9c2c1ca6f8b6f5e1a3c447602593ce42b3c4004
      7cc6ff31
    • L
      Dev clone boxing (#1568) · 8a10d96d
      Li Xinqi 提交于
      * identity
      
      * reduce clone boxing
      
      * tuple identity
      
      
      Former-commit-id: 1d6bb2f6d50298aeb15daf48be378bb08dc60ab1
      8a10d96d
  11. 11 12月, 2018 5 次提交
  12. 10 12月, 2018 4 次提交