1. 25 4月, 2021 2 次提交
    • L
      [NPU] refine lookup_table_v2_grad npu_kernel (#32497) · fb7590d4
      Leo Chen 提交于
      * use ZerosLike instead of NPUMemsetAsync
      
      * fix compile
      fb7590d4
    • D
      Nne integration (#32255) · feb2e476
      denglin-github 提交于
      * Add dlnne engine runtime
      
      * Fix log
      
      * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format
      
      * Fix CMakeList format error
      
      * Add copyright message
      
      * Fix dlnne CMakeList.txt
      
      * Add some paddlepaddle_pass to support more networks
      
      * Fix some format bug
      feb2e476
  2. 24 4月, 2021 2 次提交
  3. 23 4月, 2021 14 次提交
  4. 22 4月, 2021 7 次提交
    • W
      support int32 and int64 kernel for clip operator (#32373) · c3328288
      wuyefeilin 提交于
      support int32 and int64 kernel for clip operator 
      c3328288
    • L
      [NPU] remove ascend_parser for WITH_ASCEND_CL (#32451) · a1a527fb
      Leo Chen 提交于
      a1a527fb
    • Z
      Modify some contents for elementwise op impl (#32414) · 890d6bc0
      Zhang Zheng 提交于
      890d6bc0
    • W
      strip after compilation (#32145) · e727820d
      wuhuanzhou 提交于
      e727820d
    • S
      fix count problem (#32415) · 73d0b0e9
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      
      * fix long long count problem
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      73d0b0e9
    • W
      support save/load binary format tensor. (#32211) · f4d9adc7
      WeiXin 提交于
      * support save/load binary format tensor
      
      * Fix error when create cudaplace
      
      * Fix error when create cudaplace
      
      * Fix error when create cudaplace
      
      * get devive context from pool.
      
      * move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.
      
      * improve coverage.
      
      * improve coverage.
      
      * polish API
      
      * deal with conflict
      
      * disable save/load large file in unnittest
      
      * split unnittest.
      f4d9adc7
    • T
      e58c705b
  5. 21 4月, 2021 12 次提交
    • A
      bf0ec9b8
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527
    • C
      Update the error info for quantizaion (#32273) · 3da2c7f3
      cc 提交于
      3da2c7f3
    • S
      optimize get-feat function of graph engine (#32261) · 2b68d20b
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      2b68d20b
    • L
      [NPU] register npu finalize on exit (#32390) · 8e4c1936
      Leo Chen 提交于
      * [NPU] register finalize on exit
      
      * fix
      8e4c1936
    • W
      remove thrust include files (#32395) · ab6f8745
      wuhuanzhou 提交于
      * remove thrust includes, test=develop
      
      * fix compilation error, test=develop
      
      * fix compilation of truncated_gaussian_random_op, test=develop
      ab6f8745
    • L
    • flush denormal in the tracer op, test=develop (#32350) · 9ff85561
      石晓伟 提交于
      * flush denormal in the tracer op, test=develop
      
      * add cmake dependencies, test=develop
      
      * add a macro, test=develop
      
      * fix the windows case, test=develop
      9ff85561
    • J
      5d19f8d8
    • I
      Modify the exit code of mac CI approval error (#32389) · a2cbbe83
      iducn 提交于
      a2cbbe83
    • Y
      add retry on gcda_clean.py (#32318) · 229f9308
      YUNSHEN XIE 提交于
      * add retry on gcda_clean.py
      
      * add exit code for paddle_coverage.sh
      
      * fix format error
      
      * fix format error
      229f9308
    • J
      Added oneDNN reduce_op GRAD kernel (#32280) · ead83422
      jakpiase 提交于
      ead83422
  6. 20 4月, 2021 3 次提交