1. 03 3月, 2022 19 次提交
    • F
      Support cuda graph in StreamSafeCudaAllocator (#39594) · 4c0511fa
      From00 提交于
      * Support cuda graph in StreamSafeCudaAllocator
      
      * Fix CI error
      
      * Arrange AllocatorFacade
      
      * Fix CI error
      
      * Fix CI error
      
      * Fix ROCM Compile error
      
      * Fix ROCM Compile error
      4c0511fa
    • Z
    • R
      [CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
      ronnywang 提交于
      b4665d23
    • W
      modify infershape of multiclass nms (#40059) · 756af9ff
      wangxinxin08 提交于
      * modify infershape of multiclass nms
      756af9ff
    • Y
      [Phi]Delete kernel registry of elementwise_sub op in Fluid (#40039) · cac00e0b
      YuanRisheng 提交于
      * delete elementwise_sub kernel registry
      
      * fix compile bugs in xpu ci
      
      * fix bugs when run inference ci
      cac00e0b
    • W
      EmbEltwiseLayernorm fix (#40015) · c3f3643b
      wenbin 提交于
      * emb fix
      
      * fix trt6 compile
      
      * fix half
      
      * absolute error fix
      c3f3643b
    • H
      Modified sigmoid by the elementwise interface. (#39898) · 5d9e11a4
      huangxu96 提交于
      * Modified sigmoid by elementwise interface.
      
      * using TensorReduceImpl to repalce Sum function
      
      * using reduceimpl to calculate the norm variable
      
      * Removed useless code
      5d9e11a4
    • L
      Add support of int16 for gather op. (#40052) · 3e56e816
      Li Min 提交于
      * add support of int16 for gather op.
      
      * Recover formats.
      
      * Recover formats.
      
      * fix.
      
      * Fix format.
      
      * Fix format.
      3e56e816
    • X
      [phi] transfer pad kernel into phi and pass the test_pad_op (#40012) · 9f74b84e
      xiongkun 提交于
      * add pad forward
      
      * fix error
      
      * transfer pad and pass the test_pad_op
      9f74b84e
    • L
      add communication api for ProcessGroupNCCL (#40097) · b565b349
      lilong12 提交于
      b565b349
    • C
      2ffa6436
    • L
      Workqueue threadnames (#40035) · b8a16911
      liutiexing 提交于
      * add align for WorkQueue
      
      * add spinlock
      
      * merge develop
      
      * merge
      
      * Add EventsWaiter
      
      * Revert "Add EventsWaiter"
      
      This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.
      
      * Set thread name for WorkQueue
      
      * Add thread names
      
      * fix ut
      Co-authored-by: Nliutiexing <liutiexing@google.com>
      b8a16911
    • C
      move gather_tree infer shape (#40082) · 3779e807
      crystal 提交于
      3779e807
    • F
      [Phi] move gaussian_random (#39932) · 00bbb8c5
      furnace 提交于
      [Phi] move gaussian_random kernel
      00bbb8c5
    • Z
      bugfix in is_xpu_support_op (#40070) · 34d93bee
      zhangxiaoci 提交于
      34d93bee
    • J
      Support slim eager (#39874) · da47544c
      Jiabin Yang 提交于
      * eager, test=develop
      
      * fix bug, test=develop
      
      * eager, test=develop
      
      * merge legacy to fluid
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Refactor TensorAdd func by template and remove gradient_accumulation in eager
      
      * Remove needless target name
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Use overload instead of template
      
      * Remove legacy code
      
      * Remove legacy code
      
      * selectedrows, test=develop
      
      * Remove DataType test
      
      * eager, test=develop
      
      * eager, test=develop
      
      * support gan, test=develop
      
      * Using Tensor directly instead of using EagerTensor
      
      * support gradient_accumulation
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * refine code
      
      * ptb, test=develop
      
      * Rename all EagerTensor to Tensor
      
      * Rename some EagerTensor to Tensor
      
      * rename EagerTensor to EagerVariable
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * add more test
      
      * eager, test=develop
      
      * Support copiable selected rows and merge develop
      
      * save load, eager, test=develop
      
      * save load, eager, test=develop
      
      * refine, test=develop
      
      * remove useless _set_value method
      
      * refine, test=develop
      
      * refine, test=develop
      
      * revert static_runner, test=develop
      
      * EagerTensor to Tensor, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * clear grad, test=develop
      
      * merge, develop
      
      * merge, develop
      
      * merge, test=develop
      
      * merge, test=develop
      
      * Support quant and part of slice
      
      * support legacy static save
      
      * extend slim tests time
      
      * remove imperative on inference
      
      * remove imperative on inference
      
      * merge develop
      
      * fix typo
      
      * fix typo
      
      * split slice related code into 2 part for imperative and eager
      
      * split slice from inference
      
      * split slice from inference
      
      * fix test_tensor_register_hook
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      Co-authored-by: NWeilong Wu <veyron_wu@163.com>
      Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
      da47544c
    • Z
    • H
      Move bn to pten (#39347) · ebd0f512
      hong 提交于
      * add bn cpu version; test=develop
      
      * move batch norm to pten
      
      * move batch norm to pten; test=develop
      
      * fix bug; test=develop
      
      * fix func::tranpose depend bug; test=develop
      
      * fix compile bugs; test=develop
      
      * fix use_op batch_norm bug; test=develop
      
      * fix cudnn bn add relu test; test=develop
      
      * fix pten context build and double grad bug; test= develop
      
      * remve useless code; test=develop
      
      * add batch norm gpu fp16 support; test=develop
      
      * fix test bn op bug; test=develop
      
      * remove output dtype set; test=develop
      
      * fix bug; test=develop
      
      * fix bug; test=develop
      
      * fix applay pass to program bug; test=develop
      
      * revert to develop; test=develop
      
      * fix rocm bug; test=develop
      
      * revert operator to develop; test=develop
      
      * fix pre_commit; test=develop
      
      * fix statci check error; test=develop
      
      * resolve conflict; test=develop
      
      * ana batch norm bug;
      
      * revert batch norm op
      
      * resolve conlict
      
      * fix nan inf and speed bug; test=develop
      
      * fix bug; test=develop
      
      * fix error; test=develop
      
      * test expand op; test=develop
      
      * fix bug; test=develop
      
      * resolve confilct
      
      * resolve confilct; test=develop
      
      * polish code; test=develop
      
      * polish code; test=develop
      
      * change mutable data to ctx alloc; test=develop
      
      * make format same with ci; test=develop
      
      * fix format error with ci; test=develop
      ebd0f512
    • L
      Add the implementation of Gloo for ProcessGroup (#39892) · c16f85f9
      lilong12 提交于
      * add pg_gloo
      c16f85f9
  2. 02 3月, 2022 21 次提交
    • L
      Replacing dropout eval eigen usage by cuda kernel (#40053) · 272b32fd
      Li Min 提交于
      * Replacing dropout eval eigen usage by cuda kernel
      272b32fd
    • F
      [MLU] add mlu ci script (#39805) · a8e02ef1
      fwenguang 提交于
      * [MLU] add mlu ci script
      
      * Update CMakeLists.txt
      a8e02ef1
    • H
      Move sgd to phi (#40045) · f3d54e2e
      hong 提交于
      * move sgd to phi; test=develop
      
      * update
      
      * add sgd kernel; test=develop
      f3d54e2e
    • W
      modify infershape of yolo_box (#40056) · ebc6959c
      wangxinxin08 提交于
      * modify infershape of yolo_box
      ebc6959c
    • L
      add check for backward hook (#40041) · 1980e33a
      Leo Chen 提交于
      * add check for backward hook
      
      * refine ut
      1980e33a
    • S
      Move gather.h/gather.cu.h/scatter.h/scatter.cu.h to the phi library (#40043) · 09258040
      sneaxiy 提交于
      * move gather.h gather.cu.h scatter.h scatter.cu.h to phi library
      
      * fix CI
      
      * fix rocm ci
      09258040
    • S
      vec scale kernel (#40011) · 2e6548a9
      sneaxiy 提交于
      2e6548a9
    • Y
      [Phi]Move elementwise function to funcs directory (#39986) · 5898e9ab
      YuanRisheng 提交于
      * move elementwise function to funcs directory
      
      * fix compile bugs
      
      * modify according to comment
      5898e9ab
    • A
      [XPU] Fix Phi Kernel cache problem in operator.cc (#40044) · 66196573
      Aurelius84 提交于
      * [XPU] Fix Phi Kernel cache problem in operator.cc
      
      * fix typo
      66196573
    • H
      Move transpose to pten (#39327) · 7a857924
      hong 提交于
      * immigrate_transpose_to_pten cpu kernel only; test=develop
      
      * fix bug; test=develop
      
      * add transpose cuda api
      
      * bug fix;
      
      * fix bugs
      
      * fix bugs; test=develop
      
      * bug fix;
      
      * move transepose to pten; test=develop
      
      * fix bug; test=develop
      
      * fix bugs; test=develop
      
      * add transpose grad fp16 support; test=develop
      
      * fix bug; test=develop
      
      * fix npu bug; test=develop
      
      * fix nemul = 0 bug; test=develop
      
      * add fp16 support; test=develop
      
      * fix data type register bug; test=develop
      
      * fix transpose bug; test=develop
      
      * update transpose
      
      * fix transpose bug; test=develop
      
      * remove useless code; test=develop
      
      * remove useless code; test=develop
      
      * fix transpose alias bug; test=develop
      
      * polish code; test=develop
      
      * resolve confict; test=develop
      
      * resolve confilct; test=develop
      
      * recover prepared operator; test=develop
      
      * fix bug; test=develop
      
      * polish code; test=develop
      
      * fix bug; test=develop
      
      * fix bug; test=develop
      7a857924
    • F
      Move BroadcastTensors OP to phi (#40047) · 2a5590a1
      From00 提交于
      * Move BroadcastTensors OP to phi
      
      * Remove mutable_data in impl
      
      * Move BilinearTensorProductInferMeta to multiary.h/cc
      2a5590a1
    • Z
      new fleet_desc builder (#39948) · 1c4e3e5d
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * sync/geo test ok & fix heter_worker program ok
      
      * .
      
      * new fleet desc generator
      
      * new fleet_desc builder
      
      * new fleet_desc builder
      
      * .
      
      * .
      
      * correct ps.proto compile
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      1c4e3e5d
    • H
      [Infrt]add phi kernel dialect (#39726) · 07dad6d6
      huzhiqiang 提交于
      07dad6d6
    • Z
      [bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
      zhangbo9674 提交于
      * add softmax log_softmax
      
      * refine rocm
      
      * refine unittest
      4a4215ff
    • C
      【phi】migrate gather_tree,reduce_prod to phi (#39844) · 6af2729e
      crystal 提交于
      * move to phi
      
      * migrate gather_tree_op into phi
      
      * move reduce_prod tp phi
      
      * optimize code
      6af2729e
    • C
      Upgrade new profiler (#39984) · 0c3f7fbc
      chenjian 提交于
      * add new profiler components
      
      * fix bug
      
      * upgrade new profiler
      
      * fix operator.cc
      
      * fix operator.cc
      
      * fix cmakelists.txt
      
      * fix bug
      
      * fix according to pr
      
      * fix bug
      
      * fix cmake
      
      * fix bug
      
      * fix a bug
      
      * fix bug
      
      * fix bug
      0c3f7fbc
    • J
      add logic kernel for mlu (#39940) · bc113e10
      joeqiao12 提交于
      bc113e10
    • Y
      [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
      Yuang Liu 提交于
      [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
      
      244ae318
    • L
      90ab7403
    • C
      [Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
      Chen Weihang 提交于
      * unify complex type trait and fix real imag bug
      
      * add unittest for type tratis
      0764fda2
    • Q
      [MLU] adapt matmul op (#39727) · b4d931e8
      qipengh 提交于
      * [MLU] adapt matmul op
      
      * [MLU] fix phi namespace
      b4d931e8