1. 01 3月, 2021 1 次提交
  2. 26 2月, 2021 1 次提交
  3. 23 2月, 2021 1 次提交
  4. 22 2月, 2021 1 次提交
  5. 09 2月, 2021 3 次提交
    • L
      [feature] support npu allocator, part 2 (#30972) · 1201cd2e
      Leo Chen 提交于
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      1201cd2e
    • L
      [feature] support npu operator (#30951) · 7e049108
      Leo Chen 提交于
      [feature] support npu operator
      7e049108
    • L
      [feature] support npu allocator (#30840) · 81138239
      Leo Chen 提交于
      [feature] support npu allocator
      81138239
  6. 21 1月, 2021 2 次提交
  7. 15 1月, 2021 4 次提交
  8. 14 1月, 2021 2 次提交
  9. 13 1月, 2021 1 次提交
  10. 12 1月, 2021 8 次提交
  11. 11 1月, 2021 9 次提交
  12. 10 1月, 2021 2 次提交
  13. 09 1月, 2021 2 次提交
  14. 08 1月, 2021 3 次提交
    • Z
      Support pure fp16 training for AMP API. (#29544) · 7f7dfccf
      Zhen Wang 提交于
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      7f7dfccf
    • L
      use cuda generator in bernoulli cuda kernel (#30199) · 789743e1
      Leo Chen 提交于
      789743e1
    • L
      Fix dtype of ungenerated grad var (#28511) · 8696335f
      Leo Chen 提交于
      * fix dtype of ungenerated grad var
      
      * update ut
      
      * refine code
      
      * set default dtype
      
      * fix could_use_cudnn bug
      
      * remove debug code
      
      * re-implement
      
      * fix bug
      8696335f