- 25 4月, 2021 9 次提交
-
-
由 Wilber 提交于
-
由 Qi Li 提交于
-
由 minghaoBD 提交于
-
由 lilong12 提交于
* update
-
由 wawltor 提交于
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result * support the cuda for fix the compare broadcast bug
-
由 Shang Zhizhou 提交于
* fix tc trt shape * fix fc dynamic shape * add fc shape assert * update
-
由 Chen Weihang 提交于
-
由 Leo Chen 提交于
* use ZerosLike instead of NPUMemsetAsync * fix compile
-
由 denglin-github 提交于
* Add dlnne engine runtime * Fix log * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format * Fix CMakeList format error * Add copyright message * Fix dlnne CMakeList.txt * Add some paddlepaddle_pass to support more networks * Fix some format bug
-
- 24 4月, 2021 1 次提交
-
-
由 winter-wang 提交于
-
- 23 4月, 2021 12 次提交
-
-
由 lilong12 提交于
* add c_identity op, test=develop
-
由 Aurelius84 提交于
* Refine Constructor logic of ParallelExecutor * refine function name * refine code comment
-
由 Leo Chen 提交于
* refactor_check_finite_and_scale_npu_kernel * fix compile * add alloc_float_status op * add alloc_float_status op * add FloatStatus for check_finite_and_unscale * refine code * remove unneccessary logic * refine for fleet
-
由 ceci3 提交于
-
由 wenbin 提交于
* move semantic checks to op_teller * more ops * more ops * revert block related change * part1 * revert activation * remove if * remove const_cast * reslove conflict * remove const_cast * delete useless var * replace vlog(1) with vlog(3), replace assert with PADDLE_ENFORCE * down to 19 files
-
由 Baibaifan 提交于
solve hccl communicate conflict (#32447)
-
由 lilong12 提交于
* add c_concat op
-
由 shanliang1992 提交于
-
由 wuhuanzhou 提交于
-
由 ronnywang 提交于
-
由 Leo Chen 提交于
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error
-
- 22 4月, 2021 7 次提交
-
-
由 wuyefeilin 提交于
support int32 and int64 kernel for clip operator
-
由 Leo Chen 提交于
-
由 Zhang Zheng 提交于
-
由 wuhuanzhou 提交于
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine * fix long long count problem Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 WeiXin 提交于
* support save/load binary format tensor * Fix error when create cudaplace * Fix error when create cudaplace * Fix error when create cudaplace * get devive context from pool. * move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'. * improve coverage. * improve coverage. * polish API * deal with conflict * disable save/load large file in unnittest * split unnittest.
-
由 tianshuo78520a 提交于
-
- 21 4月, 2021 10 次提交
-
-
由 AshburnLee 提交于
-
由 zhang wenhui 提交于
* add allreduce and broadcast without test (#31024) add allreduce and broadcast without test * Refactor HCCLCommContext to be compatible with Paddle (#31359) Refactor HCCLCommContext to be compatible with Paddle (#31359) * [NPU] add npu kernel for communication op (#31437) * add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: Nvoid-main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> * Add auto-increasing tag id for Hcom OPs (#31702) * add c_reduce_sum op (#31793) add c_reduce_sum op * update Ascendrc hccl to 20.3 (#32126) update Ascendrc hccl to 20.3 (#32126) * fix merge code * change cmake.txt1 * [NPU] Support npu kernel for c sync stream op (#31386) * sync stream npu op * add with_ascend_acl * update c++ unittest * compile all failed * try to pre commit * after pre commit * merge&compile&test hccl successfully! * fix code style * fix code style * fix bugs about hccl * fix some bugs * fix code style * fix style * fix style * fix * fixed * merge develop Co-authored-by: Nlw921014 <liuwei921014@yeah.net> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> Co-authored-by: Nxiayanming <41795079@qq.com>
-
由 cc 提交于
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 Leo Chen 提交于
* [NPU] register finalize on exit * fix
-
由 wuhuanzhou 提交于
* remove thrust includes, test=develop * fix compilation error, test=develop * fix compilation of truncated_gaussian_random_op, test=develop
-
由 liuyuhui 提交于
-
由 石晓伟 提交于
* flush denormal in the tracer op, test=develop * add cmake dependencies, test=develop * add a macro, test=develop * fix the windows case, test=develop
-
由 jakpiase 提交于
-
由 jakpiase 提交于
-
- 20 4月, 2021 1 次提交
-
-
由 tangwei12 提交于
Change-Id: Ie35a09772e46f7d90cb68ca82c1d18b9201d1abe * large scale kv store optimize Change-Id: I582cc661afdaa20749ec7493eae1b88c32b967f7 * replace std::unorded_map with roundrobin map Change-Id: I48ee0efef38853876c92d982cdfcac6603c52c88 * remove license * fix cpp lint Change-Id: Ia21fafa65adc09bb9094f7dbc987e31d5af2686e
-