1. 28 1月, 2019 1 次提交
  2. 23 1月, 2019 4 次提交
    • L
      Dev op graph piece size (#1637) · 156a89cc
      Li Xinqi 提交于
      * fix a bug in OpGraph::InferNoParallelBlobDesc
      
      * fix a bug in OpGraph::InferNoParallelBlobDesc
      
      
      Former-commit-id: c0b1071fc6fbe72f1207d02fdb794dc1076eb59a
      156a89cc
    • L
      Dev global op graph (#1636) · 2733025d
      Li Xinqi 提交于
      * Global<OpGraph> is only available duraing compilation
      
      * small record_piece_size for InferNoParallelBlobDesc
      
      
      Former-commit-id: 5eb1012703f8f9389ac8e2f16131bfd36411b0db
      2733025d
    • L
      Dev logical blob dim0 (#1635) · d408be08
      Li Xinqi 提交于
      * mem_shared_hint_id
      
      * sharable memory block
      
      * rm useless code
      
      * remove useless code
      
      * bugfix: no redundant edges
      
      * rename: MemBlockGroup => MemBlock
      
      * put constrcutor of SharableMemBlockNode into header file
      
      * bugfix
      
      * rename field: MemBlock.block_id => MemBlock.mem_block_id
      
      * replace piece_size with logical_blob_dim0
      
      * BlobParallelConf
      
      * BlobParallelDesc
      
      * infer out blob model_split_axis
      
      * int64_t => int32_t
      
      * InferOutBlobParallelDesc
      
      * gather out blob model split (#1624)
      
      * InferBlobParallelDesc
      
      * let variable op support kModelParallel
      
      * rename lbi2blob_desc_ => lbi2no_parallel_blob_desc_
      
      * Global<OpGraph>
      
      * SplitLogicalInputBlobDesc
      
      * ConcatOutputBlobDescs
      
      * rename: BlobDataParallel => DataBlobParallel; BlobModelParallel => ModelBlobParallel; BlobGridParallel => GridBlobParallel
      
      * OpGraph::CheckBlobDescs(...)
      
      * exact division is unnecessary
      
      * fix bugs
      
      * rename InferOutBlob* => InferOutputBlob
      
      * exact division in variable_op is unnecessary
      
      * bug fix
      
      * fix bugs
      
      * fix bugs
      
      * IsInputBlobAllowedModelSplit
      
      * use Global<OpGraph> to InferModelSize
      
      * add OpGraph::GetDataBalancedSplitter and OpGraph::GetModelBalancedSplitter
      
      * fix IdentityOp::IsInputBlobAllowedModelSplit
      
      * no implementation for pure virtual function Operator::IsInputBlobAllowedModelSplit
      
      * refine BlobParallelDesc: replace CopyParallelConf with operator=
      
      * refine ParallelDesc: remove unused functions
      
      * more checks on ParallelDesc
      
      * remove unused function Operator::MaxModelSplitNum
      
      * bugfix: SoleOp() => op_vec().at(0)
      
      
      Former-commit-id: be1f820b2927f7f79f55b7891f6575cdeb4b2053
      d408be08
    • L
      Dev logical blob dim0 (#1625) · d91685b1
      Li Xinqi 提交于
      * mem_shared_hint_id
      
      * sharable memory block
      
      * rm useless code
      
      * remove useless code
      
      * bugfix: no redundant edges
      
      * rename: MemBlockGroup => MemBlock
      
      * put constrcutor of SharableMemBlockNode into header file
      
      * bugfix
      
      * rename field: MemBlock.block_id => MemBlock.mem_block_id
      
      * replace piece_size with logical_blob_dim0
      
      * BlobParallelConf
      
      * BlobParallelDesc
      
      * infer out blob model_split_axis
      
      * int64_t => int32_t
      
      * InferOutBlobParallelDesc
      
      * gather out blob model split (#1624)
      
      * InferBlobParallelDesc
      
      * let variable op support kModelParallel
      
      * rename lbi2blob_desc_ => lbi2no_parallel_blob_desc_
      
      * Global<OpGraph>
      
      * SplitLogicalInputBlobDesc
      
      * ConcatOutputBlobDescs
      
      * rename: BlobDataParallel => DataBlobParallel; BlobModelParallel => ModelBlobParallel; BlobGridParallel => GridBlobParallel
      
      * OpGraph::CheckBlobDescs(...)
      
      * exact division is unnecessary
      
      * fix bugs
      
      * rename InferOutBlob* => InferOutputBlob
      
      * exact division in variable_op is unnecessary
      
      * bug fix
      
      * fix bugs
      
      * fix bugs
      
      * IsInputBlobAllowedModelSplit
      
      * use Global<OpGraph> to InferModelSize
      
      * add OpGraph::GetDataBalancedSplitter and OpGraph::GetModelBalancedSplitter
      
      * fix IdentityOp::IsInputBlobAllowedModelSplit
      
      * no implementation for pure virtual function Operator::IsInputBlobAllowedModelSplit
      
      * refine BlobParallelDesc: replace CopyParallelConf with operator=
      
      * refine ParallelDesc: remove unused functions
      
      * more checks on ParallelDesc
      
      
      Former-commit-id: 2b78c6e1f37e514e39f1dc807ccce455190b00a7
      d91685b1
  3. 22 1月, 2019 1 次提交
  4. 20 1月, 2019 1 次提交
  5. 19 1月, 2019 1 次提交
  6. 17 1月, 2019 1 次提交
  7. 10 1月, 2019 2 次提交
    • S
      refine CHECK in AllReduce (#1618) · 52a6c519
      scxfjiang 提交于
      * refine CHECK in AllReduce
      
      * move ReduceConcatOpCtx definition to .cpp file
      
      
      Former-commit-id: 5a50f692cb92c5a6a7074be2063cbc1ec325c1ca
      52a6c519
    • L
      带策略的寄存器着色 (#1613) · d72a21e2
      Li Xinqi 提交于
      * mem_shared_hint_id
      
      * sharable memory block
      
      * rm useless code
      
      * remove useless code
      
      * bugfix: no redundant edges
      
      * rename: MemBlockGroup => MemBlock
      
      * put constrcutor of SharableMemBlockNode into header file
      
      * bugfix
      
      * rename field: MemBlock.block_id => MemBlock.mem_block_id
      
      
      Former-commit-id: 6a8fc14c2ba6bbe148a84458fa6119af16cbe672
      d72a21e2
  8. 09 1月, 2019 3 次提交
  9. 04 1月, 2019 1 次提交
  10. 02 1月, 2019 2 次提交
    • J
      Dev random shuffle (#1607) · bb0dfaa3
      Juncheng 提交于
      * random shuffle
      
      * fix
      
      * refine
      
      * refine
      
      * single thread
      
      * refine
      
      
      Former-commit-id: 0dbb1f3d7265f9c55a11b07695efd092cd81a83c
      bb0dfaa3
    • S
      Fix jxf reduce concat bug (#1606) · c5310e4c
      scxfjiang 提交于
      * refine logic to infer reduce_concat_op's elem_cnt of out blob, still have bugs...
      
      * add RoundUp in reduce_concat
      
      * CHECK_LE -> CHECK_EQ
      
      * add CHECK
      
      
      Former-commit-id: 962817e2a322ba6452c9966bae87fb5da9d4a86a
      c5310e4c
  11. 29 12月, 2018 5 次提交
  12. 28 12月, 2018 8 次提交
  13. 27 12月, 2018 1 次提交
  14. 26 12月, 2018 2 次提交
  15. 24 12月, 2018 4 次提交
    • L
      Dev refine transpose (#1594) · 966f3871
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      * faster adam kernel
      
      * refine dropout and transpose
      
      
      Former-commit-id: a1dd7c9b36f2114ef18e0c5f6303026d91e6fe6b
      966f3871
    • L
      Dev profiling adam (#1592) · 7cd97069
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      * faster adam kernel
      
      
      Former-commit-id: 5885d1ff7eb09cbd97ca13c22dabe3835af528a6
      7cd97069
    • S
      Fix mem sharing bug (#1593) · 0e43539e
      scxfjiang 提交于
      * fix a mem sharing bug
      
      * refine by review
      
      * remove previous if condition
      
      * refine
      
      
      Former-commit-id: 028244941572194047bfa033aa2fbe7a920c598d
      0e43539e
    • S
      fix a mem sharing bug (#1590) · 3ececc69
      scxfjiang 提交于
      
      
      Former-commit-id: 61ba5b711fc218b45f84f327d33d2ee11841b8bb
      3ececc69
  16. 22 12月, 2018 3 次提交
    • L
      Dev bert profiling (#1586) · 38d625b1
      Li Xinqi 提交于
      * profiling
      
      * all_reduce_* option for performance optimization
      
      
      Former-commit-id: 606e964e640275e354b1280517945b0c95d09747
      38d625b1
    • J
      set dev id (#1583) · 10eb10fe
      jackalcooper 提交于
      
      
      Former-commit-id: c50a06af6f962751cdccebf7851b0a978be6ac7d
      10eb10fe
    • L
      Dev bert cuda event sync (#1581) · f9030def
      Li Xinqi 提交于
      * cudaSetDevice in actor poller threads
      
      * ReduceConcatCompActor ; NaiveActor
      
      
      Former-commit-id: cf98dd810e1b27f6a19270fc9619f52aa4cfa554
      f9030def