1. 22 4月, 2021 11 次提交
  2. 21 4月, 2021 13 次提交
  3. 20 4月, 2021 4 次提交
  4. 19 4月, 2021 4 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
    • S
      [Hybrid Parallel] Support dp & mp in dygraph (#32323) · ffd40860
      ShenLiang 提交于
      * support dp & mp
      ffd40860
    • J
      Fix sublayer (#31824) · 4d69eeaa
      Jiabin Yang 提交于
      * fix sublayer error with include_sublayers=False
      
      * add ut
      
      * refactor include_sublayers related api
      
      * fix ut
      
      * fix ut of transformer
      
      * fix ut of transformer
      
      * remove useless code
      
      * change sublayer api
      
      * polish code
      
      * add test for include_self=True
      4d69eeaa
    • J
  5. 17 4月, 2021 1 次提交
  6. 16 4月, 2021 1 次提交
  7. 15 4月, 2021 6 次提交
    • J
      Update hapi to support AMP (#31417) · fabdb43c
      Jiaqi Liu 提交于
      * make hapi support amp, and add unittest
      
      * make unittest only support GPU
      
      * update parameters for amp in hapi.Model
      
      * update hapi.Model.prepare interface, and update unittest
      
      * fix test_model.py unittest bug
      
      * add grad clear in dygraph
      
      * use_fp16_guard defaults to True, which could avoid nan
      
      * add input check, and add internal doc link to low level api
      
      * update doc, and decrease the sample num of dataset to avoid timeout
      
      * make hapi amp param  support str 'O1' or 'O2'
      
      * resume calling , modify the code of the check part
      
      * upgrade the usage of Fleet API, and disable 'pure_fp16' param
      fabdb43c
    • 1
      tree-based-model (#31696) · a8c3a902
      123malin 提交于
      * add index_dataset and index_sampler for tree-based model
      a8c3a902
    • F
      [ROCM] bugfix for unit tests (#32258) · 90133d24
      furnace 提交于
      * [ROCM] bugfix for test_conv_transpose_nn_grad
      
      * [ROCM] bugfix for test_batch_norm_op_v2
      
      * [ROCM] bugfix for test_empty_like_op
      
      * [ROCM] bugfix for test_conv_transpose_nn_grad
      90133d24
    • T
      heterps support pscore (#32093) · 9f8c8f96
      Thunderbrook 提交于
      * pscore support heterps
      
      * fleet cmake
      
      * fleet wrapper
      
      * macro
      
      * solve conflict
      
      * solve conflict
      
      * add unitest
      
      * paddle enforce
      
      * unitest
      
      * unitest
      
      * unitest
      9f8c8f96
    • X
      support int for nearest_interp, test=develop (#32270) · 668a0d3b
      xiaoting 提交于
      668a0d3b
    • J
      【Deepmd Support】add IsInitialized and tanh double grad (#32188) · cfdde0ec
      Jiabin Yang 提交于
      * add IsInitialized
      
      * rm additional log and add tanh double grad
      
      * rename is_initialized
      cfdde0ec