1. 31 10月, 2022 1 次提交
    • Z
      2.4/fix engine build (#47462) · 4b3589fb
      zhaoyingli 提交于
      * update codestyle
      
      * [AutoParallel] fix fp16 for subblock (#47189)
      
      * [AutoParallel] fix fp16 for subblock
      
      * fix engine
      
      * fix comment
      
      * [AutoParallel] fix engine _build and cost method (#47263)
      
      * fix engine build method
      
      * fix import
      
      * update engine cost
      
      * update raise error
      
      * update cmakelist
      
      * revert optimizer
      
      * revert optimizer
      
      * fix unittest
      
      * fix unittest
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      4b3589fb
  2. 19 10月, 2022 1 次提交
    • Z
      [Cherry-Pick][AutoParallel] auto_parallel cherry-pick to release2.4 (#47145) · 90b31790
      zhaoyingli 提交于
      * [Auto Parallel] Make Engine class callable (#46416)
      
      * [Auto Parallel] Imporve the user-defined fetches and logging
      
      * [Auto Parallel] Make Engine class callable
      
      * [Auto Parallel] Update the data loading of tuner
      
      * Print IPS in auto parallel Engine (#46554)
      
      * [AutoParallel] fix dist_split (#46505)
      
      * [AutoParallel] fix dist_split
      
      * add unittest
      
      * update cmakelist
      
      * [AutoParallel] fix sharding (#46572)
      
      * [AutoParallel] fix process_mesh (#46583)
      
      * [AutoParallel] fix reshard when train with eval (#46605)
      
      * [AutoParallel] fix reshard when train with eval
      
      * fix mppp
      
      * [AutoParallel] fix amp when predict (#46637)
      
      * [Auto Parallel]Update comp cost and completion for gpt auto search (#46387)
      
      * update comp cost and completion for gpt auto search
      
      * add unittest
      
      * [Auto Parallel] Fix bugs caused by the inconsistent outputs of Engine API (#46633)
      
      * [Auto Parallel] Unify the logger and outputs of Engine API
      
      * [Auto Parallel] Fix the bugs of to_static
      
      * [Auto Parallel] Adjust the test_to_static.py
      
      * [Auto Parallel] Improve the fine-grained APIs (#46552)
      
      * [Auto Parallel] Suppport different dataloaders
      
      * [Auto Parallel] Add num_shards config for dataset
      
      * [Auto Parallel] Unify the logger and outputs of Engine API
      
      * [Auto Parallel] Fix the bugs of to_static
      
      * [Auto Parallel] Adjust the test_to_static.py
      
      * [Auto Parallel] Add the prepare API and replace __call__ with run
      
      * [Auto Parallel] Improve the private implementations of Engine
      
      * [Auto Parallel] Set capacity of dataloader for opt tuning
      
      * [Auto Parallel] [WIP] Change the fine-grained API
      
      * [Auto Parallel] Improve APIs to support different user cases
      
      * [Auto Parallel] Add removed config
      
      * [Auto Parallel] Add imports
      
      * [Auto Parallel] Fix bugs for to_static
      
      * [Auto Parallel] Remove unnecessary imports
      
      * bugfix (#46921)
      
      * [Auto Parallel] Fix the bug for None labels (#46987)
      
      * [AutoParallel] adapt for gpt-gen (#46771)
      
      * for gpt-gen
      
      * fix reshard
      
      * adapt assign and shape op
      
      * add dist_assign & unittest
      
      * add conditional block unittest
      
      * rename unittest
      
      * [Auto Parallel] Fix the bug of completion (#47056)
      
      * [Auto Parallel] Fix the bug for None labels
      
      * [Auto Parallel] Fix the completion bug
      
      * [AutoParallel] add callbacks (#47014)
      
      * [AutoParallel] add callbacks
      
      * fix unittest
      
      * fix dist_context
      
      * fix engine
      
      * fix cmakelist
      
      * fix unittest's returns
      
      * fix cmakelist
      
      * [Auto Parallel] Add cost interface (#47043)
      
      * add cost interface
      
      * update inferface and add unittest
      
      * update unittest
      
      * update inferface
      
      * [Auto Parallel]Add parallel tuner (#46189)
      
      * add parallel tuner
      
      * add unittest
      
      * fix unittest
      
      * set timeout of unittest
      
      * set unittest timeout
      
      * fix auto_mode setting
      
      * update unittest
      
      * sync from develop and update unittest
      
      * remove unused import
      
      * update unittest
      
      * update cmakelist
      
      * add unittests
      Co-authored-by: NYulong Ao <aoyulong@baidu.com>
      Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      90b31790
  3. 20 9月, 2022 1 次提交
  4. 19 9月, 2022 1 次提交
    • Y
      [Cherry-pick][Auto Parallel] Improve the APIs (#46164) · c5cc4278
      Yulong Ao 提交于
      * [AutoParallel] adapt gradient merge pass (#45915)
      
      * adapt gradient merge
      
      * fix op_role
      
      * fix strategy
      
      * [Auto Parallel] Gradient Fuse Allreduce (#45643)
      
      * bugfix (#45332)
      
      * dist embedding support lookup table v1
      
      * add unitest
      
      * customize wait_comm
      
      * group gradients
      
      * bugfix
      
      * update program
      
      * [Auto Parallel] Improve the APIs (#45776)
      
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Add the serialization process for dist attrs
      
      * [Auto Parallel] Remove unnecessary comments
      
      * [Auto Parallel] Fix some bugs
      
      * [Auto Parallel] Fix the code style
      
      * [Auto Parallel] Remove unnecessary impls
      
      * [Auto Parallel] Fix the importing error
      
      * [Auto Parallel] Fix the copy from bugs of op dist attr
      
      * [Auto Parallel] Replace the use of constexpr if
      
      * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh
      
      * [Auto Parallel] Change API of the completion unittest
      
      * [Auto Parallel] Fix the bug when set_attr an int
      
      * [Auto Parallel] Add the unittest for the serialization
      
      * [Auto Parallel] Add some unit tests
      
      * [Auto Paralle] Unify the strategy
      
      * [Auto Parallel] Improve the engine api
      
      * [Auto Parallel] Reset the changes made to the framework
      
      * [Auto Parallel] Change the engine unittest
      
      * [Auto Parallel] Update API of the completion and partitioner
      
      * [Auto Parallel] Update unit tests using engine api
      
      * update shard annotation
      
      * [Auto Parallel] Remove the modifications of other modules
      
      * [Auto Parallel] Add docs for APIs
      
      * add new strategy
      
      * [Auto Parallel] Replace the logger
      
      * [Auto Parallel] Restore the test_program.py
      
      * [Auto Parallel] Change the import rules
      
      * [Auto Parallel] Add the examples for Engine
      
      * [Auto Parallel] Do some minor changes
      
      * [Auto Parallel] Remove yaml dependency
      
      * [Auto Parallel] Fix the unittests
      
      * add valid after train
      
      * bug fix
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      
      * [Auto Parallel] Bugfix allreduce fuse for MP (#46086)
      
      * bugfix
      
      * bugfix
      
      * typos fixed
      
      * update strategy (#46138)
      Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      c5cc4278
  5. 09 9月, 2022 1 次提交
  6. 07 9月, 2022 1 次提交
  7. 23 8月, 2022 2 次提交
  8. 18 8月, 2022 1 次提交
  9. 15 8月, 2022 1 次提交
  10. 03 8月, 2022 1 次提交
  11. 29 7月, 2022 1 次提交
  12. 25 7月, 2022 1 次提交
  13. 18 7月, 2022 1 次提交
  14. 13 7月, 2022 2 次提交
  15. 11 7月, 2022 1 次提交
  16. 07 7月, 2022 1 次提交
  17. 29 6月, 2022 1 次提交
  18. 24 6月, 2022 1 次提交
    • Y
      [Auto Parallel] Use a fast completion for data parallelism (#43585) · e64823c1
      Yulong Ao 提交于
      * [Auto Parallel] Use a fast completion for data parallelism
      
      * remove unuse cuSparse function
      
      * [Auto Parallel] Fix some bugs of the fast dp completion
      
      * [Auto Parallel] Add the cmake statements
      
      * [Auto Parallel] Make the unittest adapt to the new interface
      
      * [Auto Parallel] Modify the timeout of the unittest
      
      * [Auto Parallel] Remove unnecessary comments
      Co-authored-by: Nzhouwei25 <zhouwei25@baidu.com>
      e64823c1
  19. 13 6月, 2022 1 次提交
  20. 08 6月, 2022 1 次提交
  21. 05 6月, 2022 1 次提交
    • S
      【code format check upgrade】 step2:yapf (#42944) · a072fca8
      Sing_chan 提交于
      * use yapf to format all python file
      
      * yapf exclude two unittests file for they rely on writing and reading file, and format will break them
      
      * disable diff_py_file because too many diff files cause command following failed
      a072fca8
  22. 02 6月, 2022 1 次提交
  23. 01 6月, 2022 1 次提交
    • Y
      [Auto Parallel] Add miscellaneous improvements (#43108) · 010aba33
      Yulong Ao 提交于
      * [Auto Parallel] Add the parallel tuner
      
      * [Auto Parallel] Improve the parallel tuner and fix some bugs
      
      * upodate cost model
      
      * update import Resharder by dist op
      
      * update cost model
      
      * fix comp cost bug
      
      * update cost model
      
      * [Auto Parallel] Amend the dist attr for #processses=1
      
      * update cost model and tuner
      
      * update cost model and tuner
      
      * update cost model and tuner
      
      * update cluster
      
      * update reshard
      
      * [Auto Parallel] Add the estimation from the cost model
      
      * [Auto Parallel] Reimplement the backup and restore functions
      
      * [Auto Parallel] Fix the bugs of the parallel tuner
      
      * [Auto Parallel] Update the engine api and dist context
      
      * [Auto Parallel] Work around the high order grad problem
      
      * [Auto Parallel] Add some miscellaneous improvements
      
      * [Auto Parallel] Add a unittest for DistributedContext
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      010aba33
  24. 19 5月, 2022 1 次提交
  25. 10 5月, 2022 1 次提交
  26. 07 5月, 2022 1 次提交
  27. 06 5月, 2022 1 次提交
    • Z
      [AutoParallel] adapt for 2d laplace (#41601) · c043a21b
      zhaoyingli 提交于
      * add default_ctx in backward.py
      
      * record grad_var_to_var with grad_times
      
      * fix backward
      
      * update annotation
      
      * add complete_high_order_grad in complete_forward
      
      * add dist slice op
      
      * update grad_var_to_var type
      
      * update partition_block init mapping before loss op
      
      * update compatible for 'XShape' & update 'allreduce_vars'
      
      * add dist reshape op when input dim equal to output dim
      
      * update 'set_grad_var_shape' with grad_var_to_var
      
      * fix dist slice
      
      * fix set_grad_var_shape
      
      * add dist pnorm op
      
      * fix dist pnorm dist_attr
      
      * fix engine startprogram & adapt highorder grad
      
      * fix set_grad_var_shape when mp
      
      * update unittest
      
      * update cmakelist
      
      * default strategy in engine: dp
      
      * bug fix
      
      * tiny fix
      
      * flatten outputs
      
      * fix default strategy
      
      * init default ctx
      
      * tiny fix
      
      * test=allcase
      c043a21b
  28. 18 4月, 2022 1 次提交
  29. 28 3月, 2022 1 次提交
  30. 23 3月, 2022 1 次提交
  31. 16 3月, 2022 1 次提交
    • Y
      [Auto Parallel] Add the support for the auto completion of while_op (#39939) · ec6b8fbd
      Yulong Ao 提交于
      * [Auto Parallel] Support the auto completion of while_op
      
      * [Auto Parallel] Improve the completion algorithms
      
      * [Auto Parallel] Fix bugs for ernie inference
      
      * [Auto Parallel] Remove attrs which cannot be pickled
      
      * [Auto Parallel] make the dims_mappings of LodTensorArray vars empty
      
      * [Auto Parallel] Fix bugs for the ernie inference in the pipeline parallel
      
      * [Auto Parallel] Remove unncessary comments
      
      * [Auto Parallel] Fix a bug of the CMakeLists
      
      * [Auto Parallel] Use the newest APIs to write the unit test
      
      * [Auto Parallel] Remove unnecessary statements
      ec6b8fbd
  32. 07 3月, 2022 1 次提交
  33. 24 2月, 2022 1 次提交
  34. 22 2月, 2022 1 次提交