1. 27 12月, 2021 17 次提交
    • S
      fix bugs in fp16 for dp (#38405) · 1ab5c511
      ShenLiang 提交于
      1ab5c511
    • Y
      [PTen]move reshape kernel according to new directory (#38432) · 49216134
      YuanRisheng 提交于
      * move reshape
      
      * fix compile bugs
      
      * delete manipulation file
      
      * fix compile bugs
      49216134
    • P
      fix accumulator bug when multiple inplace OPs are executed continuously (#38406) · 113c8b93
      pangyoki 提交于
      * fix accumulator bug
      
      * fix unittest
      113c8b93
    • Z
      Refine clip_by_global_norm (#38209) · 65f7fa0d
      zhangbo9674 提交于
      * refine clip
      
      * delete unused code
      
      * refine logic for clip
      65f7fa0d
    • S
      [BugFix]Fix bug in pfp16 in DataParallel (#38378) · e8e47581
      ShenLiang 提交于
      * fix bug in pfp16
      
      * fix hip
      
      * fix hip
      e8e47581
    • B
      9cfdae91
    • B
      add matmulv2_transpose_reshape_pass ut (#37416) · f664a533
      baoachun 提交于
      * update mkldnn matmul_v2_transpose_reshape_fuse_pass ut
      
      * update mkldnn matmul_v2_transpose_reshape_fuse_pass ut
      
      * update ut
      
      * update ut
      f664a533
    • S
      fix renorm (#38459) · b0c7144a
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      
      * fix long long count problem
      
      * remove redandunt graph files
      
      * remove unused shell
      
      * recover dropout_op_pass.h
      
      * fix potential stack overflow when request number is too large & node add & node clear & node remove
      
      * when sample k is larger than neigbor num, return directly
      
      * using random seed generator of paddle to speed up
      
      * fix bug of random sample k
      
      * fix code style
      
      * fix code style
      
      * add remove graph to fleet_py.cc
      
      * fix blocking_queue problem
      
      * fix style
      
      * fix
      
      * recover capacity check
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * fix distributed op combining problems
      
      * optimize
      
      * remove logs
      
      * fix MultiSlotDataGenerator error
      
      * cache for graph engine
      
      * fix type compare error
      
      * more test&fix thread terminating problem
      
      * remove header
      
      * change time interval of shrink
      
      * use cache when sample nodes
      
      * remove unused function
      
      * change unique_ptr to shared_ptr
      
      * simplify cache template
      
      * cache api on client
      
      * fix
      
      * reduce sample threads when cache is not used
      
      * reduce cache memory
      
      * cache optimization
      
      * remove test function
      
      * remove extra fetch function
      
      * graph-engine data transfer optimization
      
      * support graph_split load&query
      
      * remove logs
      
      * change shards to pointer vector
      
      * use inference
      
      * remove test code
      
      * renorm op
      
      * simplify renorm op
      
      * recover local changes
      
      * recover renorm op kernel
      
      * fix init
      
      * add blanklines in renorm doc
      
      * fix import
      
      * fix import
      
      * add renorm to init.py
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      b0c7144a
    • L
      add device-agnostic stream class (#38391) · 6b5e33b4
      Leo Chen 提交于
      * add device-agnostic stream class
      
      * add stream.h
      
      * fix ut
      
      * fix cpu compile
      6b5e33b4
    • S
      refine float16 implementation (#38439) · 78375990
      sneaxiy 提交于
      78375990
    • S
      refine CUDA Graph (#38401) · 5f7e4a21
      sneaxiy 提交于
      5f7e4a21
    • L
      Support multi-outputs feature for broadcast ops (#38329) · 89d38f55
      limingshu 提交于
      * No harm to KP
      
      * Pass the compile stage
      
      * change the WriteData function
      
      * fix template bugs and pass ctest of current elementwise
      
      * for passing partial template specialization of tempalte function in CI-ROCm
      
      * To make 'WriteData' funtion flexible.
      
      * a less harmful way to support multi-output
      
      * a less harmful way to support multi-output
      89d38f55
    • C
      remove npu related impl (#38428) · f1d56b77
      Chen Weihang 提交于
      f1d56b77
    • C
      [PTen] Move cast kernel impl (#38382) · 1fb734d7
      Chen Weihang 提交于
      * rename to api to copy_to
      
      * revert needless change
      
      * polish format
      1fb734d7
    • B
      04527ee3
    • G
      gelu using normcdf for cudnn (#38450) · 37022482
      Guoxia Wang 提交于
      37022482
    • Z
      [AMP] Fix amp.decorate bug: parameters for non leaf layers cannot be decotated (#38402) · 5d902954
      zhangbo9674 提交于
      * fix bug
      
      * refine code
      
      * refine code
      
      * refine code
      5d902954
  2. 26 12月, 2021 5 次提交
  3. 24 12月, 2021 18 次提交
    • C
      add is dense tensor method (#38424) · 6ff3596e
      Chen Weihang 提交于
      6ff3596e
    • W
      add nansum api to math (#38137) · 6554cc10
      wangguanqun 提交于
      * add nansum api
      
      * delete layerhelper
      
      * add nansum to all and tensor_method_func
      
      * update doc
      
      * update doc
      
      * update doc
      6554cc10
    • S
      renorm op (#38130) · 6982871d
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      
      * fix long long count problem
      
      * remove redandunt graph files
      
      * remove unused shell
      
      * recover dropout_op_pass.h
      
      * fix potential stack overflow when request number is too large & node add & node clear & node remove
      
      * when sample k is larger than neigbor num, return directly
      
      * using random seed generator of paddle to speed up
      
      * fix bug of random sample k
      
      * fix code style
      
      * fix code style
      
      * add remove graph to fleet_py.cc
      
      * fix blocking_queue problem
      
      * fix style
      
      * fix
      
      * recover capacity check
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * add remove graph node; add set_feature
      
      * fix distributed op combining problems
      
      * optimize
      
      * remove logs
      
      * fix MultiSlotDataGenerator error
      
      * cache for graph engine
      
      * fix type compare error
      
      * more test&fix thread terminating problem
      
      * remove header
      
      * change time interval of shrink
      
      * use cache when sample nodes
      
      * remove unused function
      
      * change unique_ptr to shared_ptr
      
      * simplify cache template
      
      * cache api on client
      
      * fix
      
      * reduce sample threads when cache is not used
      
      * reduce cache memory
      
      * cache optimization
      
      * remove test function
      
      * remove extra fetch function
      
      * graph-engine data transfer optimization
      
      * support graph_split load&query
      
      * remove logs
      
      * change shards to pointer vector
      
      * use inference
      
      * remove test code
      
      * renorm op
      
      * simplify renorm op
      
      * recover local changes
      
      * recover renorm op kernel
      
      * fix init
      
      * add blanklines in renorm doc
      
      * fix import
      
      * fix import
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      6982871d
    • T
      add gradient unittest and update code example for max/min (#38393) · ee69f437
      Tao Luo 提交于
      * add gradient unittest and update code example for max/min
      
      * update docs
      
      * remove _get_reduce_all_value
      ee69f437
    • Z
      [AMP] Add multi_precision for sgd (#38231) · a4d07bb9
      zhangbo9674 提交于
      a4d07bb9
    • C
      [pten] combine reduce_cuda codes (#38328) · 08941eda
      chentianyu03 提交于
      * combine reduce_cuda codes
      
      * support float16 in pten redcue_mean
      
      * replace ReduceCudaKernel impl with pten reduce impl
      
      * mv reduce funcs into reduce_cuda_impl
      
      * rm unsed codes and headers
      
      * mv GetReduceDim into reduce_cuda_impl
      
      * recover GetReduceDim in reduce_op.h
      
      * add new dispatch macro
      
      * fix pool op output not inited and cause transform to pten::denseTensor error
      
      * fix output tensor not initialized error
      
      * rename new dispatch macro and format code style
      
      * rm reduce_functor_op.h file
      08941eda
    • L
      set env for test_standalone_executor (#38430) · 5ab6ebaf
      Leo Chen 提交于
      5ab6ebaf
    • J
      [Auto Paralle] partitioner refactor (#37853) · c4fdb057
      JZ-LIANG 提交于
      c4fdb057
    • Z
      new API inner&outer (#37706) · b463dff4
      zhiboniu 提交于
      b463dff4
    • Z
      [Unify Tensors PR #1] Replaced pten::Allocation with... · 42cf2bee
      Zhanlue Yang 提交于
      [Unify Tensors PR #1] Replaced pten::Allocation with shared_ptr<memory::Allocation> for Storage (#38301)
      
      * Added shared_ptr<Allocation> member & corresponding interfaces to Storage
      
      * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly
      
      * Fixed issues with storage offset
      
      * Used place to malloc allocation for TensorStorage
      42cf2bee
    • Z
      [heterps]move pre-init id logic from common_sparse_table to sparse_geo_table (#38173) · 52329f6f
      zmxdream 提交于
      * remove pre-init id in common_sparse_tabl.cc
      52329f6f
    • zhouweiwei2014's avatar
      add new API/OP:paddle.Tensor.exponential_ (#38256) · 33185000
      zhouweiwei2014 提交于
      * add new API/OP:paddle.Tensor.exponential_
      
      * fix CI
      33185000
    • 努力努力在努力丶's avatar
      [MLU]add mlu op interface (#38241) · c396ee65
      努力努力在努力丶 提交于
      * [MLU]add mlu op interface
      
      * [MLU]fix alpha of activation op
      c396ee65
    • Y
      add pull gpups sparse op (#37124) · 572b3e90
      yaoxuefeng 提交于
       add pull gpups sparse op
      572b3e90
    • B
      fix share buffer to (#38407) · 9409ff6b
      Baibaifan 提交于
      9409ff6b
    • 4b3d5195
    • C
      add register general kernel marco (#38409) · fc0a50aa
      Chen Weihang 提交于
      fc0a50aa
    • Z
      Add new API cholesky_solve (#38167) · 39f7c41f
      zhiboniu 提交于
      39f7c41f