- 04 9月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
* Add function to disable paddle signal handler Paddle used google::InstallFaultSignalHandler to handle selected system signals, mainly for debugging and bug report purposes. However, this can be conflicted with other python packages whoever captures similar signals. Such python package involves tvm and more To resolve this issue, we support a function to disable signal handler * Remove signal test from WIN32 platform * Remove redundant return from disable_signal_handler() function * Add detailed messages to en_doc
-
- 12 7月, 2021 1 次提交
-
-
由 taixiurong 提交于
* update xpu cmake for kunlun (#33328) * xpu support amp (#33809) * fix bug DLTP-31078 (#33877) * update xpu cmake (#33906) * [xpu] add dropout & amp ops in xpu place (#33891) Co-authored-by: NTTerror <tangzhiyi11@users.noreply.github.com>
-
- 16 6月, 2021 1 次提交
-
-
由 Shang Zhizhou 提交于
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape (#33535)
-
- 19 5月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
cherry-pick of #32972
-
- 06 5月, 2021 2 次提交
- 01 5月, 2021 1 次提交
-
-
由 Baibaifan 提交于
-
- 30 4月, 2021 1 次提交
-
-
由 XiangGao 提交于
-
- 26 4月, 2021 4 次提交
- 25 4月, 2021 4 次提交
-
-
由 liym27 提交于
-
由 liym27 提交于
-
由 Wilber 提交于
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug
-
- 22 4月, 2021 3 次提交
-
-
由 Leo Chen 提交于
-
由 WeiXin 提交于
* support save/load binary format tensor * Fix error when create cudaplace * Fix error when create cudaplace * Fix error when create cudaplace * get devive context from pool. * move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'. * improve coverage. * improve coverage. * polish API * deal with conflict * disable save/load large file in unnittest * split unnittest.
-
由 tianshuo78520a 提交于
-
- 21 4月, 2021 2 次提交
-
-
由 zhang wenhui 提交于
* add allreduce and broadcast without test (#31024) add allreduce and broadcast without test * Refactor HCCLCommContext to be compatible with Paddle (#31359) Refactor HCCLCommContext to be compatible with Paddle (#31359) * [NPU] add npu kernel for communication op (#31437) * add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: Nvoid-main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> * Add auto-increasing tag id for Hcom OPs (#31702) * add c_reduce_sum op (#31793) add c_reduce_sum op * update Ascendrc hccl to 20.3 (#32126) update Ascendrc hccl to 20.3 (#32126) * fix merge code * change cmake.txt1 * [NPU] Support npu kernel for c sync stream op (#31386) * sync stream npu op * add with_ascend_acl * update c++ unittest * compile all failed * try to pre commit * after pre commit * merge&compile&test hccl successfully! * fix code style * fix code style * fix bugs about hccl * fix some bugs * fix code style * fix style * fix style * fix * fixed * merge develop Co-authored-by: Nlw921014 <liuwei921014@yeah.net> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> Co-authored-by: Nxiayanming <41795079@qq.com>
-
由 Leo Chen 提交于
* [NPU] register finalize on exit * fix
-
- 19 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: Npangyoki <pangyoki@126.com>
-
- 15 4月, 2021 3 次提交
-
-
由 123malin 提交于
* add index_dataset and index_sampler for tree-based model
-
由 Thunderbrook 提交于
* pscore support heterps * fleet cmake * fleet wrapper * macro * solve conflict * solve conflict * add unitest * paddle enforce * unitest * unitest * unitest
-
由 WeiXin 提交于
* custom python backward * polish up the code * polish up the code * polish up the code. * Fix code format and comments. * Delete redundant files. * add unnittest. * edit unnittest. * edit unnittest. * Remove redundant header files. * Improve coverage and remove redundant code. * support saving for backward. * polish code according to comments. * Add support type for PyLayer. * Modify the DOC. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish code and make the code robust. * Modify the code format.
-
- 14 4月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
* add register backward hook method * add leaf grad accumullated test
-
- 13 4月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* add layer.to api * add layer.to api * add layer.to api * add the doc for Layer.to * add input type checking * modify assert and import bug * format code style * format code style * make place support str type * add SetGradVarBase method to set the gradient after conversion * modify argument palce to device * modify argument palce to device * modify doc of layers.to API * add xpuplace to device argument
-
- 09 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
- 08 4月, 2021 1 次提交
-
-
由 Zhen Wang 提交于
* Use the runtime to create the unsupported_fp16_list using in AMP. * Add more infos about supported ops. * Add some comments for the function of OpSupportedInfos. * Fix the unit test of test_multi_precision_fp16_train.
-
- 07 4月, 2021 1 次提交
-
-
由 zhang wenhui 提交于
* Ascend rc (#30483) * Fix compilcation on CANN20.1 and older (#30494) Fix compilcation on CANN20.1 and older * Add distribution supported (#30578) Add distribution supported * Build praser for Hcom* operators (#30627) Build praser for Hcom* operators * Pass device_ids info from launch to trainer. (#30632) Pass device_ids info from launch to trainer * Add Hccl program group (#30642) Add Hccl program group * Add startup bash files of test_ascend_group. (#30645) Add startup bash files of test_ascend_group * cleanup (#30646) cleanup test_ascend_group.py * [Feature] Build parser to support distributed training (#30658) [Feature] Build parser to support distributed training * fix compilation on ascend-20.1 (#30722) fix compilation on ascend-20.1 * Dev/fix ascend string (#30749) Dev/fix ascend string * code style (#30781) code style * Merge ascend_optimizer and ascend_parser. (#30776) Merge ascend_optimizer and ascend_parser. * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug (#30797) Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug * Add paddle ascend distribution training supported (#30796) Add paddle ascend distribution training supported * pass cxx_flags to gloo cmake (#30857) * Destroy session first. (#30954) Destroy session first. * merge * fix, test=develop * fix, test=develop * fix style, test=develop * fix, test=develop * fix * fix log fatal, test=develop * fix enforce style, test=develop * fix, test=develop * fix, test=develop * fix rccl, test=develop * fix test, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix node_num, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop Co-authored-by: Nhutuxian <hutuxian2011@sina.cn> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: NLeo Chen <chenqiuliang@baidu.com> Co-authored-by: Ndingsiyu <18369187719@163.com> Co-authored-by: NOleNet <olenet@126.com>
-
- 02 4月, 2021 1 次提交
-
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
-
- 01 4月, 2021 3 次提交
-
-
由 chentianyu03 提交于
* add custom init grad for backward function * add custom init grad for backward function * handle when the grad_tensor is none * handle when the grad_tensor is none * fix the args type error on windows platform * modify the args order and doc * format code * add grad_tensor to xpu * modify the grad_tensor type check * add paddle.backward api to support multi tensors gradient compute * add paddle.backward api to support multi tensors gradient compute * add paddle.atuograd module and backward api * change tensor.backward func args * modify tensor backward api * remove create_graph intputs args * add doc and examplex code for backward api * when have the same tensor, throw error * modify test Init func args * modify the execute.Init func args in test files * add paddle.autograd package in setup.py.in * modify error msg, remove _run_backward method in class Tensor * add test cases for backward api
-
由 kuizhiqing 提交于
* new group * ci compatible fix * assert nccl
-
由 Chen Weihang 提交于
* refactor and simplify hook design * fix reducer add hook error * add Tensor.register_hook basic impl * refine prepare data impl * revert prepare data change * support register_hook for Tensor * add hook test in model * polish tests and doc example * fix double grad test failed * remove reduce hook func * fix set empty error * polish code by comments * change reduce_hook to mutable_hook * remove useless tmp_ins * fix shape code format error * fix shape code format error
-
- 31 3月, 2021 2 次提交
-
-
由 Zhou Wei 提交于
* [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * fix CI
-
由 Kaipeng Deng 提交于
* polish tensor pipeline. test=develop
-
- 30 3月, 2021 2 次提交
- 29 3月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 26 3月, 2021 1 次提交
-
-
由 cc 提交于
* Use layer to calculate output scale * add backward for moving_average_abs_max_scale and save output scales to op's attr
-