1. 17 11月, 2021 7 次提交
  2. 16 11月, 2021 16 次提交
  3. 15 11月, 2021 17 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
    • L
      [new-exec] fix stream analysis (#37161) · 584b4b24
      Leo Chen 提交于
      * fix revord_event
      
      * refine class Instruction
      
      * refine Instruction and InterpreterCore
      
      * make instruction and operator_base consistent
      
      * support NoNeedBufferVar in stream_analyzer
      
      * fix place of event
      
      * add vlog before continue
      584b4b24
    • C
      remove needless declare (#37195) · 9c591703
      Chen Weihang 提交于
      9c591703
    • B
      remove input dim check in op_teller and update ut (#37097) · 6b21bb0b
      baoachun 提交于
      * remove input dim check of activation in op_teller
      
      * remove input dim check of concat in op_teller
      
      * remove input dim check of clip in op_teller
      
      * remove input dim check of scale in op_teller
      
      * remove input dim check in op_teller
      
      * update attr check of slice in op_teller
      6b21bb0b
    • Y
      fix ctest depent probs (#37203) · cf958f2f
      Yuang Liu 提交于
      cf958f2f
    • W
      fix 3 bug of new_executor (#37142) · 8358d614
      wanghuancoder 提交于
      * fix 3 bug, test=develop
      
      * refine, test=develop
      8358d614
    • F
      fix:delete macro INFERENCE (#37130) · b628c316
      feng_shuai 提交于
      b628c316
    • A
      Added BF16 to mean op (#37104) · df7cc457
      arlesniak 提交于
      * Added BF16 to mean op
      
      * fix for CI
      
      * fix for CI
      
      * fix for CI
      df7cc457
    • J
      fix cinn_compile_test not pass problem (#37190) · 83eef6d2
      jiangcheng 提交于
      83eef6d2
    • W
      [New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
      Weilong Wu 提交于
      * Add elementwise_mul triple grad kernel
      
      * Removed InplaceInferer and polished code
      59fdf4da
    • Z
      Accessor 20211112 2 (#37181) · 84b0ec97
      zhaocaibei123 提交于
      84b0ec97
    • Z
      Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0
      Zeng Jinle 提交于
      * add split_program
      
      * make ut faster
      
      * increase ut timeout
      
      * make result deterministic
      
      * add fuse_all_reduce pass
      
      * add ut framework, update
      
      * fix ut framework
      
      * remove useless code
      
      * add coverage support
      
      * update
      
      * fix CI
      
      * fix some bugs and fix ci coverage
      
      * fix conflict
      12339fa0
    • S
      graph-engine cache optimization (#37168) · b44db69f
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      
      * fix long long count problem
      
      * remove redandunt graph files
      
      * remove unused shell
      
      * recover dropout_op_pass.h
      
      * fix potential stack overflow when request number is too large & node add & node clear & node remove
      
      * when sample k is larger than neigbor num, return directly
      
      * using random seed generator of paddle to speed up
      
      * fix bug of random sample k
      
      * fix code style
      
      * fix code style
      
      * add remove graph to fleet_py.cc
      
      * fix blocking_queue problem
      
      * fix style
      
      * fix
      
      * recover capacity check
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * fix distributed op combining problems
      
      * optimize
      
      * remove logs
      
      * fix MultiSlotDataGenerator error
      
      * cache for graph engine
      
      * fix type compare error
      
      * more test&fix thread terminating problem
      
      * remove header
      
      * change time interval of shrink
      
      * use cache when sample nodes
      
      * remove unused function
      
      * change unique_ptr to shared_ptr
      
      * simplify cache template
      
      * cache api on client
      
      * fix
      
      * reduce sample threads when cache is not used
      
      * reduce cache memory
      
      * cache optimization
      
      * remove test function
      
      * remove extra fetch function
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      b44db69f
    • Z
      fix bug of indexing with ellipsis (#37182) · f2a56c6a
      zyfncg 提交于
      f2a56c6a
    • J
      10cc040d
    • L
      Optimize Matmul_v2 (#37037) · 444a7358
      Linjie Chen 提交于
      Optimize dot product of Matmul_v2 
      444a7358
    • L
      modify sparse_attention docs, test=document_fix (#36554) · 6b0cc2b1
      Liu-xiandong 提交于
      * modify sparse_attention docs, test=develop
      
      * add warning
      
      * add warning ,test=document_fix
      6b0cc2b1