- 01 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync
-
- 30 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support npu for memcpy op * add ut * fix ut * fix typo
-
- 29 3月, 2021 1 次提交
-
-
Co-authored-by: Nbaiyangfan <baiyangfan@baidu.com>
-
- 26 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU
-
- 23 3月, 2021 1 次提交
-
-
由 lilong12 提交于
Add 3d Parallelism Co-authored-by: NWangXi <wangxi16@baidu.com> Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: Nroot <root@yq01-sys-hic-k8s-v100-box-a225-0562.yq01.baidu.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support TensorFormVector, TensorToVector of bool type * add ut * fix compile problem
-
- 01 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support list of list attribute for NPU * fix compile problem * fix reference
-
- 23 2月, 2021 2 次提交
- 22 2月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
-
- 09 2月, 2021 3 次提交
-
-
由 Leo Chen 提交于
* support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute
-
由 Leo Chen 提交于
[feature] support npu operator
-
由 Leo Chen 提交于
[feature] support npu allocator
-
- 08 2月, 2021 1 次提交
-
-
由 gongweibao 提交于
Destroy session first.
-
- 28 1月, 2021 1 次提交
-
-
由 Leo Chen 提交于
Dev/fix ascend string
-
- 27 1月, 2021 1 次提交
-
-
由 Leo Chen 提交于
fix compilation on ascend-20.1
-
- 15 1月, 2021 2 次提交
-
-
由 gongweibao 提交于
Fix compilcation on CANN20.1 and older
-
由 hutuxian 提交于
-
- 14 1月, 2021 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 13 1月, 2021 3 次提交
-
-
由 cc 提交于
* skip quantizing ops in cpu inference, test=develop
-
由 alncat 提交于
* added support for inference using qunatization aware trained dygraph * added support for inference using qunatization aware trained dygraph correct boost get usage * Delete incorrect warning message (#30196) * fix warning and no grad * clean redundant API alias in 2.0 - part 2 (#30013) * delete paddle.nn.functional.assign * fix dynamic to static error * just add the op error message for the matmul xpu (#30246) add the op error message for the matmul xpu * Add Static Variable Clone (#30208) Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat * use wget to replace curl to download the lcov file (#30229) * use wget to replace curl to download the lcov file * add cache for lcov * fix test_pool3d_op timeout issue (#30248) * Fix unittests bugs. (#30250) * modify error message based on comments (#30189) * modify error message based on comments * edit code according to review. * Correct spelling according to review. * Fix bug for 'save mutiple method' (#30218) * Fix bug for 'save mutiple method' * To pass coverage. * edit code to pass coverage. * edit code to pass coverage. * add unittest for coverage. * change for coverage. * edit for coverage. * added support for inference using qunatization aware trained dygraph * Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206) * add alias from fluid.layers.auc to static.auc * Update __init__.py * added support for inference using qunatization aware trained dygraph correct boost get usage * corrected boost get usage * corrected naming issues and enforcing zero check * correct paddle enforce message * added more error checkings * corrected error report message and optimized code * corrected findvar usage * corrected paddle_enforce in scope * correct error messages * correct error reporting format Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com> Co-authored-by: NXiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com> Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com> Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com> Co-authored-by: NYUNSHEN XIE <1084314248@qq.com> Co-authored-by: NBai Yifan <me@ethanbai.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: NWeiXin <weixin10@baidu.com> Co-authored-by: NJiaqi Liu <liujiaqi06@baidu.com>
-
由 Zhang Jun 提交于
* fix bug on compiling inference shared lib with crypto;test=develop * fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF * update cmake * remove unnecessary enforce message
-
- 12 1月, 2021 3 次提交
-
-
由 JZ-LIANG 提交于
-
由 tangwei12 提交于
* add sparse embedding & load vars for 2.0 Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b * fix hdfs gloo Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6 * fix gloo hdfs Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e * move loadvar/sparse embedding from incubute to static Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
-
由 tangwei12 提交于
* rename sendrecv.proto to namespace paddle.distributed * split ps with distributed
-
- 11 1月, 2021 2 次提交
- 10 1月, 2021 1 次提交
-
-
由 wangchaochaohu 提交于
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
-
- 08 1月, 2021 4 次提交
-
-
由 Zhen Wang 提交于
* add cast ops before and after unsupported fp16 ops. * Keep partial net in FP32 pattern. * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode. * Add fp16 support for adam op. * add multi precision attr for adam. * Fix the bug of test_multi_precision_fp16_train UT. * Code format for CI. * Fix the redefine error about MPTypeTrait on windows. * fix bugs of the _create_accumulators func in Momentum. * fix bug when inserting post cast op. * Add the update_loss_scaling op in allow_set of UnusedVarCheck. * Update for ci coverage. * Add some doc for OptimizerWithMixedPrecision. * Fix the code style. * Imporve the doc of `amp_init`. * Change for fp16 testing if users have the infer program defined in separate way.
-
由 Leo Chen 提交于
-
由 Leo Chen 提交于
* change to tensor copy sync * change to tensor copy sync * make copy_to safe when use TensorCopy * refine code * add ut * add cudapinned garbagecollector * add testcase: cpu place -> cuda pinned place
-
由 Chengmo 提交于
* add tensor table
-
- 07 1月, 2021 3 次提交
-
-
由 Huihuang Zheng 提交于
Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc
-
由 Chen Weihang 提交于
* simplify prepared op impl to improve performance * fix kunlun compile error * continue fix kunlun compile error * only transform diff place when dtype diff * fix failed unittests * remove useless file * polish impl by review comment
-
由 liuyuhui 提交于
-
- 06 1月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 05 1月, 2021 2 次提交
-
-
由 liuyuhui 提交于
-
由 Thunderbrook 提交于
* add topo aware * resource.h * topo aware * format
-
- 04 1月, 2021 2 次提交
-
-
由 WangXi 提交于
-
由 Shang Zhizhou 提交于
* fix op version checker of pass bug * fix code style * update pass version
-