1. 05 3月, 2022 1 次提交
    • W
      Ps optimizer multi programs (#39883) · bcaf88d2
      wangguanqun 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      
      * ps optimizer multi programs
      
      * the one ps merge
      
      * fix bug in test
      bcaf88d2
  2. 04 3月, 2022 8 次提交
    • H
      Move yolo box to phi (#40112) · faece382
      hong 提交于
      * add yolo box kernel; test=develop
      
      * fix comile error; test=develop
      faece382
    • H
      Enable eager model test (#40154) · 880dec0f
      hong 提交于
      * enable eager model; test=develop
      
      * set bs = 5; test=develop
      880dec0f
    • H
      Add digamma abs trunc yaml (#40024) · 0bfba16b
      hong 提交于
      * add digamma, abs, trunc; test=develop
      
      * fix bug and add diagonal; test=develop
      
      * add name coverter; test=develop
      
      * update tracer.py; test=develop
      
      * add test case; test=develop
      
      * fix bugs; test=develop
      0bfba16b
    • C
      [paddle-inference]support setting fully connected in multi-head attention... · 8dbfc2ae
      ceci3 提交于
      [paddle-inference]support setting fully connected in multi-head attention static shape branch to int8  (#39660)
      
      * fix inference int
      
      * update
      
      * add unittest
      8dbfc2ae
    • L
      add communication api for ProcessGroupGloo (#40100) · 5435459a
      lilong12 提交于
      * add pg_gloo apis
      5435459a
    • J
      73a4fe6c
    • H
      add eager test in rnn and fc; test=develop (#40149) · c47ae621
      hong 提交于
      c47ae621
    • H
      Move conv to pten (#39354) · d50fb43e
      hong 提交于
      * move conv to pten
      
      * move conv to pten; test=develop
      
      * fix bug;
      
      * add conv cudnn impl; test=develop
      
      * update
      
      * update operator; test=develop
      
      * fix bug; test=develop
      
      * move operator and prepared_operator to develop; test=develop
      
      * resolve conflict; test=develop
      
      * remove useless code;test=develop
      
      * add depency ; test=develop
      
      * fix bug;
      
      * add sig.cc ; test=develop
      
      * fix use_op error; test=develop
      
      * fix bug; test=develop
      
      * fix bug; test=develop
      
      * add conv3d register; test=develop
      
      * fix star gan and conv_nn_grad test failed; test=develop
      
      * add header; test=develop
      
      * manul to recover to develop;
      
      * resolve confilct; test=develop
      
      * remove useless code
      
      * fix bug;
      
      * remove conv2d_cudnn; test=develop
      
      * fix bugs; test=develop
      
      * fix cpu rocm compile bugs; test=develop
      
      * fix blas error; test=develop
      
      * fix compile bug; test=develop
      
      * fix windows compile error; test=develop
      
      * fix windows error; test=develop
      
      * resolve confilct; test=develop
      d50fb43e
  3. 03 3月, 2022 7 次提交
    • S
    • W
      EmbEltwiseLayernorm fix (#40015) · c3f3643b
      wenbin 提交于
      * emb fix
      
      * fix trt6 compile
      
      * fix half
      
      * absolute error fix
      c3f3643b
    • L
      add communication api for ProcessGroupNCCL (#40097) · b565b349
      lilong12 提交于
      b565b349
    • B
      change_ASP_sharding_option (#40028) · 815f7a67
      Baibaifan 提交于
      815f7a67
    • J
      Support slim eager (#39874) · da47544c
      Jiabin Yang 提交于
      * eager, test=develop
      
      * fix bug, test=develop
      
      * eager, test=develop
      
      * merge legacy to fluid
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Refactor TensorAdd func by template and remove gradient_accumulation in eager
      
      * Remove needless target name
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Use overload instead of template
      
      * Remove legacy code
      
      * Remove legacy code
      
      * selectedrows, test=develop
      
      * Remove DataType test
      
      * eager, test=develop
      
      * eager, test=develop
      
      * support gan, test=develop
      
      * Using Tensor directly instead of using EagerTensor
      
      * support gradient_accumulation
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * refine code
      
      * ptb, test=develop
      
      * Rename all EagerTensor to Tensor
      
      * Rename some EagerTensor to Tensor
      
      * rename EagerTensor to EagerVariable
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * add more test
      
      * eager, test=develop
      
      * Support copiable selected rows and merge develop
      
      * save load, eager, test=develop
      
      * save load, eager, test=develop
      
      * refine, test=develop
      
      * remove useless _set_value method
      
      * refine, test=develop
      
      * refine, test=develop
      
      * revert static_runner, test=develop
      
      * EagerTensor to Tensor, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * clear grad, test=develop
      
      * merge, develop
      
      * merge, develop
      
      * merge, test=develop
      
      * merge, test=develop
      
      * Support quant and part of slice
      
      * support legacy static save
      
      * extend slim tests time
      
      * remove imperative on inference
      
      * remove imperative on inference
      
      * merge develop
      
      * fix typo
      
      * fix typo
      
      * split slice related code into 2 part for imperative and eager
      
      * split slice from inference
      
      * split slice from inference
      
      * fix test_tensor_register_hook
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      Co-authored-by: NWeilong Wu <veyron_wu@163.com>
      Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
      da47544c
    • H
      Move bn to pten (#39347) · ebd0f512
      hong 提交于
      * add bn cpu version; test=develop
      
      * move batch norm to pten
      
      * move batch norm to pten; test=develop
      
      * fix bug; test=develop
      
      * fix func::tranpose depend bug; test=develop
      
      * fix compile bugs; test=develop
      
      * fix use_op batch_norm bug; test=develop
      
      * fix cudnn bn add relu test; test=develop
      
      * fix pten context build and double grad bug; test= develop
      
      * remve useless code; test=develop
      
      * add batch norm gpu fp16 support; test=develop
      
      * fix test bn op bug; test=develop
      
      * remove output dtype set; test=develop
      
      * fix bug; test=develop
      
      * fix bug; test=develop
      
      * fix applay pass to program bug; test=develop
      
      * revert to develop; test=develop
      
      * fix rocm bug; test=develop
      
      * revert operator to develop; test=develop
      
      * fix pre_commit; test=develop
      
      * fix statci check error; test=develop
      
      * resolve conflict; test=develop
      
      * ana batch norm bug;
      
      * revert batch norm op
      
      * resolve conlict
      
      * fix nan inf and speed bug; test=develop
      
      * fix bug; test=develop
      
      * fix error; test=develop
      
      * test expand op; test=develop
      
      * fix bug; test=develop
      
      * resolve confilct
      
      * resolve confilct; test=develop
      
      * polish code; test=develop
      
      * polish code; test=develop
      
      * change mutable data to ctx alloc; test=develop
      
      * make format same with ci; test=develop
      
      * fix format error with ci; test=develop
      ebd0f512
    • L
      Add the implementation of Gloo for ProcessGroup (#39892) · c16f85f9
      lilong12 提交于
      * add pg_gloo
      c16f85f9
  4. 02 3月, 2022 16 次提交
    • F
      [MLU] add mlu ci script (#39805) · a8e02ef1
      fwenguang 提交于
      * [MLU] add mlu ci script
      
      * Update CMakeLists.txt
      a8e02ef1
    • L
      add check for backward hook (#40041) · 1980e33a
      Leo Chen 提交于
      * add check for backward hook
      
      * refine ut
      1980e33a
    • H
      Move transpose to pten (#39327) · 7a857924
      hong 提交于
      * immigrate_transpose_to_pten cpu kernel only; test=develop
      
      * fix bug; test=develop
      
      * add transpose cuda api
      
      * bug fix;
      
      * fix bugs
      
      * fix bugs; test=develop
      
      * bug fix;
      
      * move transepose to pten; test=develop
      
      * fix bug; test=develop
      
      * fix bugs; test=develop
      
      * add transpose grad fp16 support; test=develop
      
      * fix bug; test=develop
      
      * fix npu bug; test=develop
      
      * fix nemul = 0 bug; test=develop
      
      * add fp16 support; test=develop
      
      * fix data type register bug; test=develop
      
      * fix transpose bug; test=develop
      
      * update transpose
      
      * fix transpose bug; test=develop
      
      * remove useless code; test=develop
      
      * remove useless code; test=develop
      
      * fix transpose alias bug; test=develop
      
      * polish code; test=develop
      
      * resolve confict; test=develop
      
      * resolve confilct; test=develop
      
      * recover prepared operator; test=develop
      
      * fix bug; test=develop
      
      * polish code; test=develop
      
      * fix bug; test=develop
      
      * fix bug; test=develop
      7a857924
    • Z
      new fleet_desc builder (#39948) · 1c4e3e5d
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      
      * cpu-async-ps minimize test ok & gpu minimize test ok
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * add heter 2stage unittest
      
      * sync/geo test ok & fix heter_worker program ok
      
      * .
      
      * new fleet desc generator
      
      * new fleet_desc builder
      
      * new fleet_desc builder
      
      * .
      
      * .
      
      * correct ps.proto compile
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      1c4e3e5d
    • Z
      [bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
      zhangbo9674 提交于
      * add softmax log_softmax
      
      * refine rocm
      
      * refine unittest
      4a4215ff
    • J
      [Auto Parallel] Adapt Partitioner & DistOp for ERNIE3.0 Inference and cache (#39895) · c9cd47d9
      JZ-LIANG 提交于
      * adapot dist op
      
      * add dist_fill_constant_batch_size_like
      
      * remvoe print
      
      * update compitable
      
      * add unitest
      c9cd47d9
    • A
      [IPU] update ipu unittests p0 (#39707) · 1db188f3
      Allen Guo 提交于
      * update ipu UTs part0
      
      * rename UT
      
      * sync api changes
      
      * update uts for new api
      
      * use_ipumodel() as classmethod
      1db188f3
    • J
      add logic kernel for mlu (#39940) · bc113e10
      joeqiao12 提交于
      bc113e10
    • Y
      [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
      Yuang Liu 提交于
      [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
      
      244ae318
    • Q
      [MLU] adapt matmul op (#39727) · b4d931e8
      qipengh 提交于
      * [MLU] adapt matmul op
      
      * [MLU] fix phi namespace
      b4d931e8
    • F
      [MLU] add transpose2 mlu kernel (#39994) · 4cab812e
      fwenguang 提交于
      4cab812e
    • B
      add_new_comm_primitive (#40040) · 4e00d2bb
      Baibaifan 提交于
      4e00d2bb
    • L
      fix unittests for eignvalsh (#39841) · aa47297a
      lkylkylky 提交于
      aa47297a
    • zhouweiwei2014's avatar
      optimize CUDA implementaion of randint OP (#39952) · fb635089
      zhouweiwei2014 提交于
      * change CUDA implementaion of randint OP,move distribution common func to phi
      
      * fix CI
      
      * fix CI
      fb635089
    • W
      [Eager] open eager when WITH_PYTHON (#39979) · 9af72957
      wanghuancoder 提交于
      * open eager when WITH_PYTHON, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * add DWITH_PYTHON for gen_fluid_lib, test=develop
      9af72957
    • W
      [Eager] Support gnn ptb_rnn in eager mode (#39993) · dbcf8797
      Weilong Wu 提交于
      dbcf8797
  5. 01 3月, 2022 8 次提交