1. 15 8月, 2022 4 次提交
    • Y
      [Auto Parallel] Move the distributed info from python to c++ (#44510) · a52357fe
      Yulong Ao 提交于
      * [Auto Parallel] Move the distributed info from python to c++
      
      * [Auto Parallel] Add dist_attrs for VarDesc and OpDesc
      
      * [Auto Parallel] Add the lost file
      
      * [Auto Parallel] Make the dist attr be unique_ptr
      
      * [Auto Parallel] Add the proto conversion
      
      * [Auto Parallel] Improve the proto support
      
      * [Auto Parallel] Fix the bugs for adding a device or a link
      
      * [Auto Parallel] Add the C++ ProcessMesh and DistributedMapper
      
      * [Auto Parallel] Improve the impl of these dist attrs
      
      * [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh
      
      * [Auto Parallel] Fix the unittest problem
      
      * [Auto Parallel] Explicitly add the src file for auto_parallel target
      
      * [Auto Parallel] Add the proto depedency explicitly
      
      * [Auto Parallel] Fix the cmake bug on windows and mac
      
      * [Auto Parallel] Remove the pybind11 header file in process_mesh.h
      
      * [Auto Parallel] Remove unused codes
      
      * [Auto Parallel] Check whether the dist attr is null
      
      * [Auto Parallel] Implement the assign operator for OpDesc explicitly
      a52357fe
    • H
      [XPU] add some collective ops. (#45049) · 7e2a20d5
      houj04 提交于
      * [XPU] add some collective ops. test=kunlun
      
      * use XPUOpTestWrapper. test=kunlun
      
      * skip kl1 for collective ops. fix typo: deivce -> device. test=kunlun
      7e2a20d5
    • R
      Update FLAGS for standalone executor (#45127) · 566bbf0c
      Ruibiao Chen 提交于
      * Update FLAGS for standalone executor
      
      * Update FLAGS_FORCE_USE_PROGRAM_CACHE
      566bbf0c
    • W
      convert_fp16 support multi block (#45050) · 9aecf286
      Wilber 提交于
      * convert_fp16 support multi block
      
      * update
      
      * update
      9aecf286
  2. 14 8月, 2022 1 次提交
  3. 13 8月, 2022 2 次提交
    • L
      Refine program cache (#45005) · e96dae8b
      Leo Chen 提交于
      * add cached_serialize_str_
      
      * support program hash
      
      * add sha
      
      * add ut
      
      * use hash_str only for new_exe
      
      * fix attr order
      e96dae8b
    • Z
      fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      
      * fix logging risk
      
      * fix logging possible risk
      
      * write trainer_desc file
      
      * support split sparse params in local & remote
      
      * fix import paddle.fluid.core.PSGPU
      
      * fix import paddle.fluid.core.PSGPU
      
      * add remote_sparse & local_sparse config
      
      * fix unittest
      
      * fix test_dist_fleet_geo table error
      
      * fix PADDLE_ENFORCE error
      
      * fix other's pr conflict
      3f5c405f
  4. 12 8月, 2022 18 次提交
    • L
      fix nccl comm in sync_bn (#45100) · 1e965756
      LiYuRio 提交于
      1e965756
    • S
      Offload calculations from matmul op to fuse pass (#44941) · acb78ea2
      Sławomir Siwek 提交于
      * remove v2_transpose_reshape
      
      * matmul_transpose_reshape
      
      * reshape_transpose_matmul
      
      * Add int8 support for matmulV2
      
      * restore ut
      
      * adjust old ut
      
      * restore parallel UT ruels
      
      * remove mkldnn code from base ops
      
      * move enforces to pass
      
      * remove duplicated functions
      
      * delete duplicated enforces
      
      * feedback from review
      
      * add comments to variables
      
      * enable eltwise support
      
      * dynamic attribute
      
      * remove fusepass tests from op test
      
      * remove fuse pass cases from op test
      
      * revert introduction of dynamic attributes
      
      * style
      Co-authored-by: Nwozna <joanna.wozna@intel.com>
      acb78ea2
    • H
      [phi] Transfer linear_interp_v2 yaml to phi (#45072) · c737232f
      HongyuJia 提交于
      * support optional<vector<Tensor>> in yaml and eager
      
      * delete useless comments in eager_gen.py
      
      * fix api_base.py support optional<vector<TTensor>>
      
      * python_c_gen.py support optional<vector<tensor>>
      
      * transfer linear_interp_v2 yaml from fluid to phi
      
      * fix op_test typo error
      
      * change linear_interp_v2 testcase
      
      * fix args in final_state_linear_interp_v2
      
      * fix zeropad2d typo. test=document_fix
      c737232f
    • C
      [Auto Parallel] Update reshard for auto search (#45002) · 8624f3b1
      caozhou 提交于
      * update reshard for auto search
      
      * fix unittest bug
      
      * update dist tensor
      
      * update reshard output
      
      * fix unittests bug
      
      * merge develop
      8624f3b1
    • C
      Add Quant Row&Column ParallelLinear (#44869) · 236ad4fc
      Chang Xu 提交于
      236ad4fc
    • A
      fix compilation (#45087) · 4eec94dd
      Allen Guo 提交于
      4eec94dd
    • A
      Fix concat and tile attribute for 2ONNX (#44658) · bb8203cd
      Aurelius84 提交于
      * Fix concat and tile attribute for ONNX
      
      * disable unittest
      bb8203cd
    • J
      [Auto Parallel] Data Parallel Optimization Pass 1 (#44882) · 7aeec4ed
      JZ-LIANG 提交于
      * bugfix
      
      * remove scaling
      
      * support rescale_grad opt
      7aeec4ed
    • J
      [Eager] Support more final_state code (#44986) · cf17ae8a
      Jiabin Yang 提交于
      * support more final_state code
      
      * support more final_state code
      
      * fix api error
      
      * fix norm error
      
      * fix pool3d error
      
      * revert pool3d and max_pool_3d_adaptive
      
      * fix code check error
      
      * fix norm problem
      cf17ae8a
    • K
      transfer memcpy_h2d from fluid to phi (#44932) · 7bc57d35
      kangguangli 提交于
      * transfer memcpy_h2d from fluid to phi
      
      * use UnchangedInferMeta instead
      
      * restore test_standalone_executor
      
      * add newline to fix codestyle check
      
      * rename pt -> phi
      
      * simplify logic and add check
      
      * make the comment more clear
      
      * remove useless comment
      
      * refine code
      7bc57d35
    • Y
      trt engine input data type should be consistent with trt input bindin… (#45103) · a3eb341e
      Yuanle Liu 提交于
      * trt engine input data type should be consistent with trt input bindings type
      
      * fix some bugs
      
      * fix some bugs
      
      * fix some bugs
      a3eb341e
    • H
      change default log level (#45093) · 34234282
      hong 提交于
      34234282
    • Z
      Remove some custom_impl api (#45066) · adb61b7b
      zyfncg 提交于
      * remove some custom_impl api and make them generated by yaml completely
      
      * delete useless code
      
      * fix adamw bug
      
      * fix infermeta
      
      * revert adamw
      
      * polish code
      
      * fix bug
      adb61b7b
    • Z
      refix index resize in multiclassnms3 (#45095) · 49e2a4d8
      zhiboniu 提交于
      49e2a4d8
    • Y
      [Auto Parallel] Pybind ProcessMesh and DeviceMesh (#45013) · 5bf3dec9
      Yulong Ao 提交于
      * [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh
      
      * [Auto Parallel] Fix the unittest problem
      
      * [Auto Parallel] Explicitly add the src file for auto_parallel target
      
      * [Auto Parallel] Add the proto depedency explicitly
      
      * [Auto Parallel] Fix the cmake bug on windows and mac
      
      * [Auto Parallel] Remove the pybind11 header file in process_mesh.h
      5bf3dec9
    • D
      enhance grid_sampler to support 3d input (#45015) · 1773fbba
      duanyanhui 提交于
      * enhance grid_sampler to support 3d input
      1773fbba
    • Z
      fix extra output of kernels for inference (#45048) · 1cb883da
      zyfncg 提交于
      1cb883da
    • S
      [geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3
      Siming Dai 提交于
      * add init file
      
      * add op definition and infermeta
      
      * add kernel definition funcs
      
      * add broadcast infer shape
      
      * add gpu forward kernel
      
      * delete SUB and DIV
      
      * add x_grad
      
      * add template
      
      * add e_grad for min and max
      
      * fix small bug
      
      * temp commit
      
      * temp commit
      
      * add e_grad for sum and mean
      
      * fix some compile bug
      
      * fix compile bugs
      
      * fix compile problem
      
      * add sum forward unittest
      
      * fix broadcast error, add kernel sig, register e_grad, change unit test
      
      * fix grad
      
      * add temp grad fix
      
      * temp commit
      
      * add min max unittest
      
      * add max, min unittest, fix mul bug
      
      * add cpu forward sum and mean
      
      * add forward min max, fix mean unittest
      
      * add cpu backward min max
      
      * fix code-style
      
      * add backward sum mean
      
      * fix rocm ci
      
      * set uniitest timeout
      
      * fix bug of x broadcast to e, gpu grad
      
      * fix bug of x broadcast to e, cpu grad
      
      * rename BOOST_GET_CONST macro
      
      * fix rocm ci
      
      * mv graph_send_e_recv to graph_send_ue_recv
      
      * move out_size to IntArray
      
      * add eager op test
      
      * fix max pool type bug, add unittest for api
      
      * revise api doc
      
      * add fp16 for atomic min and max, add unittest
      
      * add unittest
      
      * add fp16 support for graph_send_recv
      
      * fix unittest fp16 bug
      
      * change OutSizeTensor to Out_size
      
      * move E to Y
      
      * add copyright, fix comment
      
      * review code
      
      * fix thread block size
      
      * fix thread block size
      
      * change api attribute name: pool_type to reduce_op, compute_type to message_op
      
      * change api attribute name, move pool_type to reduce_op, move compute_type to message_op
      615b15a3
  5. 11 8月, 2022 12 次提交
  6. 10 8月, 2022 3 次提交