- 24 6月, 2021 1 次提交
-
-
由 houj04 提交于
* in NPU environment, use CPUPlace for missing operators. * in NPU environment, use CPUPlace for missing operators. * fix TensorCopy bug and add unit test. * fix code style. * add more unit tests.
-
- 21 6月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* enable npu alignment * support flatten_params/grads * support clip by global norm * remove memset in coalesce_tensor_op * fix npu kernel of sum op when input is one tensor * add ut for flatten_param_grads+regularizer * fix ut * fix typo
-
- 01 6月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* replace and remove complex64/128 types in custom OP and other files * fix custom_tensor_test fail bug * fix custom_conj_test fail bug * fix dispatch_test_op build fail bug
-
- 12 5月, 2021 1 次提交
-
-
由 liym27 提交于
-
- 19 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: Npangyoki <pangyoki@126.com>
-
- 09 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
- 26 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 22 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
* - Tensor copy fix to oneDNN tensors * - Fixes after review
-
- 04 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments
-
- 01 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest
-
- 15 10月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 13 10月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* add sumary feature * refine printting tensor * add sci_mode * add sample code * fix indent error * fix _format_item * polish code * support item indent * add ut * set place for ut * fix py2 issue * fix ut
-
- 09 10月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
test=develop - compilation fix test=develop
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 16 9月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* polish framework error message part 7 * fix typo * polish by reviewes comment
-
- 27 8月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
fix bug that can't print int8_t
-
- 24 8月, 2020 1 次提交
-
-
由 Jack Zhou 提交于
add the isnan, isfinite, isinf api for the paddle 2.0
-
- 21 8月, 2020 1 次提交
-
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 15 8月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* expose and unify the Tensor concepts to the user * expose tensor to user * add copy place for Tensor * add copy place for Tensor * add note * add macro PADDLE_WITH_CUDA * remove RUN_TYPE=DIST * fix some error
-
- 29 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* simplify buffered reader to improve DataLoader performance * fix 22 failed unittests * fix cuda pinned context condition * fix test_reader_reset failed * fix two failed unittests * change unittest place * polish error messaage * polish cast op GetExpecctedKernelType * remove debug info in unittest
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 27 4月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
* add print transformer & unify print format, test=develop * remove using of dygraph_to_static_func, test=develop * remove python stdout capture, test=develop * fix compatibility problems for PY2, test=develop * fix detail error, test=develop * fix type analysis bug, test=develop * fix print tuple compatible error in PY2, test=develop * replace get_func to declarative, test=develop * fix detail bug, test=develop * fix some detail problems, test=develop * change visit_call in print transformer, test=develop
-
由 Yiqun Liu 提交于
-
- 17 3月, 2020 1 次提交
-
-
由 Adam 提交于
-
- 11 3月, 2020 1 次提交
-
-
由 Adam 提交于
-
- 12 12月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add fake init for the trainer, fix large memory hold in the trainer * do not merge recv vars from a remote endpoint, test=develop * add recv and save op, merge slice var in one op, save memory * remove hsigmoid with pull sparse, test=develop
-
- 02 12月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 28 11月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* use system allocator in unittests, test=develop * fix op bugs, test=develop * fix tensor copy bug when src and dst are the same, test=develop
-
由 Zeng Jinle 提交于
-
- 14 10月, 2019 1 次提交
-
-
由 633WHU 提交于
* support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop
-
- 03 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 14 8月, 2019 1 次提交
-
-
由 chengduo 提交于
Use CUDAPinnedPlace in buffered_reader
-
- 24 5月, 2019 1 次提交
-
-
由 wopeizl 提交于
* add __str__ method for tensor and lodtensor to support print test=develop
-
- 28 3月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
* Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)" This reverts commit 13816dd4. Apart from enabling transformer for MKL-DNN * Revert "- MKL-DNN pooling updated to set_prim_desc" This reverts commit c63f6b20. Conflicts: paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc * Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)" test=develop This reverts commit dec9cf53. * - concat compilation fix - lint test=develop - Lint fixes test=develop - Lint fixes test=develop - Fix Transpose MKLDNN op test=develop
-
- 19 3月, 2019 2 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
由 Jacek Czaja 提交于
* - Fix to crash of Transformer when mkldnn is to be used Desc: TensorCopy was not setting MKLDNN primitive descriptor when layout was to be kMKLDNN test=develop * - Enable transformer for mkl-dnn test=develo * - Compilation fix test=develop * - Removed manual selection of MKL-DNN ops to be used in Transformer test test=develop
-
- 11 3月, 2019 1 次提交
-
- 04 3月, 2019 3 次提交
-
-
由 chengduo 提交于
Add Event for TensorCopy