1. 13 10月, 2021 1 次提交
    • L
      Merge lars op (#35476) · 0c31579c
      limingshu 提交于
      * A leap of try for cudaLaunchCooperativeKernel
      
      * fix bugs
      
      * Totally replace the lar cuda kernel
      
      * Fix bugs
      
      * a test for lars merge
      
      * Adding las_op_momentum infer_shape
      
      * Fix codes
      
      * use avg_numel instead of max_numel to acquire grid num
      
      * modify unittest files about lars op
      
      * Finally converge when merged-lars works
      
      * fix ctest files
      
      * add merged_operation kernel when cuda version is older than 11
      
      * Fix code style
      
      * fix ctest failure
      
      * fix error
      
      * fix all ctest error and change lars compute code of cpu
      
      * fix bugs on v100.
      
      * revert python modififation about lars
      
      * revert python modification codes
      0c31579c
  2. 13 9月, 2021 1 次提交
    • Z
      [RC22] Fix linear with matmul_op replace (#35445) · 53e294ca
      zhulei 提交于
      * [RC22] Fix linear with matmul_op replace
      
      * [RC22] Fix linear with matmul_op replace
      
      * [RC22] Fix linear with matmul_op replace
      
      * [RC22] Fix linear with matmul_op replace
      
      * [RC22] Fix linear with matmul_op replace
      53e294ca
  3. 10 6月, 2021 1 次提交
  4. 03 6月, 2021 1 次提交
  5. 31 5月, 2021 1 次提交
  6. 02 12月, 2020 1 次提交
    • Z
      Add pure fp16 training with master weights. (#27712) · be3777a5
      Zhen Wang 提交于
      * add the weight decay func for the momentum op
      
      * Add the multi_precision function in Momentum Optimizer.
      
      * Make sure that the initial value of master weights are same with the fp16 weights.
      
      * add static loss scaling.
      
      * add the rescale_grad function in the pure fp16 training.
      
      * use the original momentum updating method.
      
      * Polish some codes, such as variable names.
      
      * add docstring for apis.
      
      * update the var creation details of _create_master_weight.
      
      * not modify codes about imperative momentum updating.
      
      * Fix the error of test_dist_sparse_tensor_load_momentum UT.
      
      * add unit test for multi precision fp16 training.
      
      * add more unit tests for CI.
      
      * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
      
      * For CI Coverage Checking.
      be3777a5
  7. 01 12月, 2020 2 次提交
  8. 23 11月, 2020 1 次提交
  9. 29 8月, 2020 1 次提交
    • J
      Adadelta Optimizer (#26590) · a1b99fae
      Jiawei Wang 提交于
      * add doc; notest
      
      * fix doc; notest
      
      * update doc; notest
      
      * refine optimizer && adam
      
      * refine optimizer; notest
      
      * add adam
      
      * fix doc
      
      * fix doc && add adamw; notest
      
      * add error message
      
      * bug fix
      
      * refine rmsprop && adamax
      
      * fix ci
      
      * buf fix
      
      * update comment
      
      * unify arguments place; notest
      
      * fix ut, test=develop
      
      * bug fix
      
      * fix conflicts, test=develop
      
      * add examples code
      
      * bug fix
      
      * fix comments
      
      * fix sample code
      
      * add sample code for Optimizer
      
      * add adamax ut, test=develop
      
      * fix rmsprop ut, test=develop
      
      * add ut for optimizer.py and adamw.py
      
      * first commit of adadelta optimizer
      
      * fix learning rate
      
      * fix adadelta doc and add sgd momentum
      
      * remove unused fluid
      
      * fix codestyle
      
      * Update test_adam_op.py
      
      * Update test_adam_op.py
      
      * fix SGD in 2 unittests
      
      * fix SGD in 2 unittests
      
      * fix ci
      
      * fix ut
      Co-authored-by: NMRXLT <xlt2024@gmail.com>
      Co-authored-by: Nmapingshuo <mps2012@yeah.net>
      a1b99fae
  10. 26 12月, 2018 1 次提交
    • W
      Fp16 training (#14992) · 856f0da0
      Wu Yi 提交于
      * wip
      
      * wip
      
      * wip
      
      * wip for test
      
      * add fp16 tests test=develop
      
      * fix cpu build test=develop
      
      * fix test=develop
      
      * fix py3 tests test=develop
      
      * fix lr_scheduler dtype test=develop
      
      * fix test=dvelop
      
      * test fix ci compile test=develop
      
      * fix build and merge test=develop
      
      * fallback momentumop change to general test=develop
      
      * make fp16 lr schedule simple test=develop
      
      * fix ut test=develop
      
      * fix tests test=develop
      
      * remove fp16 learning rate cast test=develop
      856f0da0
  11. 20 12月, 2018 2 次提交
  12. 29 10月, 2018 1 次提交
    • W
      [1.1] [project] train imagenet using large batch size (#13766) · 26200f2e
      Wu Yi 提交于
      * fix nccl2 lars dist support
      
      * put lars in momentum op
      
      * add tests lars
      
      * fix ci
      
      * fix cpu kernel
      
      * soft warning
      
      * remove lars in test_recognize_digits.py
      
      * move to another op
      
      * add file
      
      * update api.spec test=develop
      
      * update test=develop
      
      * fix api.spec test=develop
      
      * wip
      
      * wip, finish grad merge ops
      
      * wip, finish graph build
      
      * wip test running
      
      * work on 1 gpu
      
      * workable version
      
      * update
      
      * fix tests
      
      * fuse broadcast op
      
      * fix compile failed
      
      * refine
      
      * add batch merge test mnist
      
      * fix CI test=develop
      
      * fix build
      
      * use independent bn params for batch merge test=develop
      
      * update api.spec
      
      * follow comments and for test
      
      * wip
      
      * refine tests test=develop
      
      * follow comments test=develop
      
      * remove startup bn modify test=develop
      
      * follow comments test=develop
      
      * fix merge test=develop
      26200f2e
  13. 17 10月, 2018 1 次提交
  14. 14 10月, 2018 1 次提交
  15. 15 8月, 2018 1 次提交
  16. 26 7月, 2018 2 次提交
  17. 20 7月, 2018 1 次提交
  18. 24 2月, 2018 1 次提交
  19. 13 2月, 2018 1 次提交
    • X
      Run Python OP tests in a single Python process to improve test time. (#8362) · cde6241a
      Xin Pan 提交于
      Currently, our tests run with 2 GPUs, the init time is absurdly long:
      about 4s for each process.  Currently, we run each OP test on
      different processes. This PR:
      
      1. create cmake function py_test_modules which will generate the
      Makefile that runs a list of Python unittest module in a single Python
      process.
      
      2. move all "python unittest compatible" (e.g., used the unittest
      package, not just a regular python file). from fluid/tests to
      fluid/tests/unittests.
      
      3. cmake now will run all OP tests in fluid/tests/unittests in a
      single process, except the time-consuming tests, they are separated
      into different processes to utilize parallelism. Please make sure to
      use the unittest package if you put the python test file in
      fluid/tests/unittests
      
      4. remove all exit(0) from fluid/tests/unittests/*.py, exit(0) is used
      to disable unittest, we can not do it when running all tests in a
      single process since it will terminate the process without running the
      other tests. Instead, the test is disabled in
      fluid/tests/unittests/CMakeLists.txt. FIXME is added for each disabled
      item. Please disable the unittest from
      fluid/tests/unittests/CMakeLists.txt, instead of adding exit(0) to the
      Python file, for all Python file in fluid/tests/unittests/.
      
      5. add an option WITH_FAST_BUNDLE_TEST. When OFF, will run the unit
      tests in separate process so that they can be tested individually.
      cde6241a
  20. 12 2月, 2018 1 次提交
  21. 21 1月, 2018 1 次提交
    • D
      "fix decode bug" (#7711) · e983cc90
      dzhwinter 提交于
      * "fix decode bug"
      
      * "follow commnet"
      
      * "fix error"
      
      * "fix hook bug"
      
      * fix based comment
      
      * fix copyright
      
      * fix based on comment
      e983cc90
  22. 15 1月, 2018 1 次提交
    • D
      Feature/hooks (#7513) · b9b75377
      dzhwinter 提交于
      * add copyright hook
      
      * add copyright hook
      
      * refine copyright hook
      
      * "test copyright hook"
      
      * fix check style
      
      * fix ci
      b9b75377
  23. 14 11月, 2017 1 次提交
  24. 10 11月, 2017 1 次提交
  25. 20 10月, 2017 1 次提交
  26. 06 10月, 2017 1 次提交
  27. 03 10月, 2017 1 次提交