1. 02 12月, 2020 1 次提交
    • S
      0.3.1 · 97300fae
      Shenghang Tsai 提交于
      
      Former-commit-id: bdb27546064ce15707b176159885fc7a1777a6fc
      97300fae
  2. 30 11月, 2020 4 次提交
  3. 29 11月, 2020 1 次提交
  4. 28 11月, 2020 4 次提交
  5. 27 11月, 2020 8 次提交
  6. 26 11月, 2020 4 次提交
    • D
      New checkpoint (#3540) · ea0b766b
      daquexian 提交于
      * flow.load/save/get_all_variables without large tensor and multi machine support
      
      * add lazy blob cache and disable blob_cache after writing
      
      * update checkpoint to call the potential slice_assign and read_slice_from_blob method
      
      * reformat
      
      * new checkpoint supports eager
      
      * split mut bn into mutable input bn and output bn
      
      * work in eager mode. deprecate checkpoint.init()
      
      * slice_assign implementation
      
      * new slice op
      
      * check step > 0, add more tests, refine the code
      
      * revert the initializer changes
      
      * remove print
      
      * set y to 0 for partialsum
      
      * check sbp, fix incorrect attr check
      
      * add more tests
      
      * rename slice2->logical_slice
      
      * update tests
      
      * extract common python code into a function
      
      * get_size_in_slice -> GetSizeInSlice, rm unused test file
      
      * minor update about step > 0
      
      * minor update on tests
      
      * add WITH_CUDA guard
      
      * integrate with logical slice/slice_assign
      
      * set scope according to variable op_conf
      
      * initial support of stream init
      
      * read_slice_from_blob/as_numpy return nd_idx and set the cpu:0 placement for created variable
      
      * extract a 'for_every_slice' function
      
      * initializer registration
      
      * one meta file per variable
      
      * remove mis-added file
      
      * code clean
      
      * create model io jobs only if legacy model io enabled, update legacy api
      
      * add legacy model io test
      
      * slice operation optimization
      
      * add and update tests
      
      * barrier for multi node eager
      
      * make sync as a cluster instruction
      
      * update test
      
      * fix life cycle problem
      
      * add python api vm_util.Sync()
      
      * make initializer receive a random_seed
      
      * Add vm_util.Sync(), remove debug code
      
      * resolve TODO, remove __repr__ for now
      
      * use compiled op_conf for getting random_seed
      
      * UserOpAttrVal -> AttrValue, remove debug code
      
      * test another dtype
      
      * remove mis-added )
      
      * fix dtype error when shape[axis+1:] is empty
      
      * add initializers to check_point
      
      * code clean, enable a temporary default checkpoint for test
      
      * move legacy implementation to deprecated/
      
      * update deprecated implementation
      
      * fix bug in eager, add eager tests and some other minor updates
      
      * remove name field in FileBackendBlob, update Load for single variable, and some other minor updates
      
      * remove mis-added file
      
      * move initializer implementation, some minor changes
      
      * disable some bn tests missing checkpoint.init()
      
      * fix dtype conversion bug
      
      * relex the tolerance of layer_norm test
      
      * reformat
      
      * minor code clean
      
      * use new pybind11 eager sync api
      
      * add assignment between memory test
      
      * disable optimizers test for now
      
      * code clean
      
      * reuse CreateEagerVariableBlob
      
      * remove mis-added file
      
      * unify two read slice function
      
      * minor code clean
      
      * add initializer_updated to check_point.py
      
      * fix typo
      
      * resolve merge conflict
      
      * restore bn tests
      
      * add type annotations, add some comments and minor code clean
      
      * add some comments, remove 'need_root_path' parameter
      
      * fixup
      
      * get parallel_conf from job_set instead of op_attribute
      
      * disable two tests involving legacy model io in eager mode
      
      * add InitialzierImpl
      
      * add InitializerImpl
      
      * support load from numpy array, add test
      
      * rename and format
      
      * Add necessary docs and TODO, improve warning message
      
      * ParallelConf4InterfaceOpName->ParallelConf4LazyInterfaceOpName
      
      * address some comments
      
      * rename api
      
      * fix problems
      
      * add test_initializer.py
      
      * remove unused initializers
      
      * remove quantinfo, move new checkpoint to check_point_v2.py
      
      * fix crash on checkpoint.init()
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * restore optimizer test
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * Add GetOpAttributes api
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * restore ParallelConf4LazyOp as parallel desc symbol id in op attr doesn't align with that in job set
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * Add TestResumeTraining, shrink the large model size
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * restore 2n4c ci test
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * code clean
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * add snapshot_done
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * add test_mixed_model, update test_load_numpy
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * add flow.sync_default_session in api implementation
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * change the default value of ignore_mismatch from False to True to align with existing behavior
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * fix wrong initializer in test_mseloss.py and test_bce_loss.py
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * ForEachOpNode -> ForEachNode
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      
      * fix test_partially_load_numpy
      Signed-off-by: Ndaquexian <daquexian566@gmail.com>
      Co-authored-by: Nwanghongsheng <2496533749@qq.com>
      Co-authored-by: Ncheng cheng <472491134@qq.com>
      Former-commit-id: 6a1b2253
      ea0b766b
    • S
      Reshape backward issue with distribute split (#3915) · 7f981518
      ShawnXuan 提交于
      * bak
      
      * rm usless lines
      
      * fix skip
      
      * better blob name
      
      * check in fix
      
      * refine code
      
      * fix fmt
      
      * fix fmt
      
      * rm include
      Co-authored-by: NTsai <caishenghang@oneflow.org>
      Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com>
      Former-commit-id: 3e6f8895
      7f981518
    • S
      bump 0.3b1 · b0ede4ee
      Shenghang Tsai 提交于
      
      Former-commit-id: 6ef63c96
      b0ede4ee
    • J
      Remove NormalModelUpdateOpConf (#3917) · fa5060a2
      Juncheng 提交于
      * Remove NormalModelUpdateOpConf
      
      * Remove useless import
      
      * Remove TaskNode/Actor
      
      * fix
      
      Former-commit-id: bab1c62a
      fa5060a2
  7. 25 11月, 2020 7 次提交
  8. 24 11月, 2020 4 次提交
  9. 23 11月, 2020 3 次提交
  10. 22 11月, 2020 2 次提交
    • L
      Add ssp variable proxy (#3859) · 403931e5
      Li Xinqi 提交于
      * rename UserOpAttrVal to AttrValue
      
      * Scope::GetAttrValue
      
      * add ssp variable proxy pass
      
      * AddSspVariableProxy
      
      * ssp_config_def.cpp
      
      * merge config_def from master
      
      * REGISTER_SCOPE_CONFIG_DEF
      
      * description for ssp_partition_strategy
      
      * fix return type of JobPass::HasState
      
      * FlexDef/FlexValue
      
      * support recursive flex def
      
      * remove field_number
      
      * more cfg files
      
      * instructions builder
      
      * forward declaration instead of include
      
      * more test for cfg
      
      * revert cfg files
      
      * InstructionsBuilder
      
      * using std::function as argument of IdCache::FindOrCreate
      
      * scope op_collection
      
      * include <functional> in framework/interpreter.h
      
      * puts more code into WithOptimizerOpCollectionScope
      
      * IsInOptimizerOpCollection
      
      * include <functional> in symbol_id_cache.h
      
      * calculation pass
      
      * IsInOptimizerOpCollection -> IsInOptimizerPass
      
      * minor refine about spp_config_def.cpp
      
      * test for add_ssp_variable_proxy
      
      * rm framework/flex
      
      * refine add ssp variable proxy pass
      
      * refine Error
      
      * refine Error
      
      * AddScopeToPyStorage
      
      * fix test_watch
      
      * get scope_symbol_id from current scope
      
      * fix assert bug
      
      * no longer use scope_proto.symbol_id
      Co-authored-by: qq_22305325's avatarbinbinHan <han_binbin@163.com>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      Former-commit-id: ef6786d5
      403931e5
    • L
      Op collection (#3833) · d39b4145
      Li Xinqi 提交于
      * more cfg files
      
      * instructions builder
      
      * forward declaration instead of include
      
      * more test for cfg
      
      * revert cfg files
      
      * InstructionsBuilder
      
      * using std::function as argument of IdCache::FindOrCreate
      
      * scope op_collection
      
      * include <functional> in framework/interpreter.h
      
      * puts more code into WithOptimizerOpCollectionScope
      
      * include <functional> in symbol_id_cache.h
      
      * calculation pass
      
      * refine Error
      
      * AddScopeToPyStorage
      
      * fix test_watch
      
      * get scope_symbol_id from current scope
      
      * fix assert bug
      Co-authored-by: qq_22305325's avatarbinbinHan <han_binbin@163.com>
      Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
      Former-commit-id: 1467aecf
      d39b4145
  11. 21 11月, 2020 2 次提交