- 24 9月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.
-
- 26 5月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 19 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: Npangyoki <pangyoki@126.com>
-
- 04 2月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include second time, test=develop
-
- 03 11月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 07 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* refine PADDLE_ENFORCE test=develop
-
- 03 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* fix PADDLE_ENFORCE and refine the description test=develop
-
- 09 6月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 26 5月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 25 5月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 24 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add support for the driver api callback and fix the profiler name show bug
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 09 1月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add support for nested profiling event and printing in different level
-
- 28 11月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
* fix profile api high version test=develop
-
- 13 3月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 11 3月, 2019 1 次提交
-
- 04 3月, 2019 3 次提交
-
-
由 chengduo 提交于
Add Event for TensorCopy
- 01 3月, 2019 1 次提交
-
-
由 chengduo 提交于
Add Event for TensorCopy
-
- 24 2月, 2019 1 次提交
-
-
由 Dun 提交于
-
- 22 2月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 21 2月, 2019 1 次提交
-
-
由 Dun 提交于
* refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop
-
- 04 12月, 2018 1 次提交
-
-
由 ZongwuYang 提交于
Fix the bug that profiler cannot trace the nccl allreduce operator
-
- 26 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 13 8月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 10 8月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 31 7月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Add a few more RecordEvent. Cleanup
-
- 30 7月, 2018 3 次提交
-
-
由 typhoonzero 提交于
-
由 typhoonzero 提交于
-
由 typhoonzero 提交于
-
- 23 7月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 14 6月, 2018 1 次提交
-
-
由 Xin Pan 提交于
In cupti samples, only cuptiFlush is used. I can't find any places calling cuptiFinalize and this API can error out as not_implemented in some cuda installation.
-
- 08 6月, 2018 2 次提交
-
-
由 guochaorong 提交于
-
由 guochaorong 提交于
-
- 22 5月, 2018 1 次提交
-
-
由 Xin Pan 提交于
Experiment on vgg flower, 2 trainers, 1ps. more trainer could have more speedup. After: Pass = 0, Iters = 327, Speed = (7.52) img/s Before: Pass = 0, Iters = 385, Speed = (6.77) img/s
-
- 10 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
-
- 14 3月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-