1. 18 8月, 2022 1 次提交
  2. 17 8月, 2022 2 次提交
  3. 16 8月, 2022 4 次提交
    • C
      [Phi] Move amp ops into phi (#45079) · b4f67757
      Chen Weihang 提交于
      * move check finite and unscale kernel into phi
      
      * move infershape into phi
      
      * move update_loss_scaling kernel into phi
      
      * remove original kernels
      
      * move update loss scaling infershape into phi
      
      * add header for xpu and npu
      
      * solve coverage failed
      
      * fix npu test failed
      
      * remove mutable data in cu file
      
      * fix new executor failed
      
      * add valid check for meta tensor output
      b4f67757
    • F
      convert multihead to oss (#45019) · f706d95d
      feng_shuai 提交于
      * convert multihead to oss
      
      * fix:bug
      
      * fix:delete const cast
      
      * fix:don't support bias_qk
      
      * add vit pass
      
      * fix:convert bug and add preln_residual_bias
      
      * support length=-1
      
      * add UT for convert
      
      * add no_bias_qk support for gpu_multihead_op
      
      * delete infer_shape depends on bias_qk
      
      * oss just can be used in T4 and A*
      
      * fix:change api for ROCM CI
      f706d95d
    • W
      fix new quant (#45155) · 2fb65e44
      Wangzheee 提交于
      2fb65e44
    • F
  4. 15 8月, 2022 2 次提交
    • Y
    • Y
      [Auto Parallel] Move the distributed info from python to c++ (#44510) · a52357fe
      Yulong Ao 提交于
      * [Auto Parallel] Move the distributed info from python to c++
      
      * [Auto Parallel] Add dist_attrs for VarDesc and OpDesc
      
      * [Auto Parallel] Add the lost file
      
      * [Auto Parallel] Make the dist attr be unique_ptr
      
      * [Auto Parallel] Add the proto conversion
      
      * [Auto Parallel] Improve the proto support
      
      * [Auto Parallel] Fix the bugs for adding a device or a link
      
      * [Auto Parallel] Add the C++ ProcessMesh and DistributedMapper
      
      * [Auto Parallel] Improve the impl of these dist attrs
      
      * [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh
      
      * [Auto Parallel] Fix the unittest problem
      
      * [Auto Parallel] Explicitly add the src file for auto_parallel target
      
      * [Auto Parallel] Add the proto depedency explicitly
      
      * [Auto Parallel] Fix the cmake bug on windows and mac
      
      * [Auto Parallel] Remove the pybind11 header file in process_mesh.h
      
      * [Auto Parallel] Remove unused codes
      
      * [Auto Parallel] Check whether the dist attr is null
      
      * [Auto Parallel] Implement the assign operator for OpDesc explicitly
      a52357fe
  5. 14 8月, 2022 1 次提交
  6. 13 8月, 2022 2 次提交
    • L
      Refine program cache (#45005) · e96dae8b
      Leo Chen 提交于
      * add cached_serialize_str_
      
      * support program hash
      
      * add sha
      
      * add ut
      
      * use hash_str only for new_exe
      
      * fix attr order
      e96dae8b
    • Z
      fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      
      * fix logging risk
      
      * fix logging possible risk
      
      * write trainer_desc file
      
      * support split sparse params in local & remote
      
      * fix import paddle.fluid.core.PSGPU
      
      * fix import paddle.fluid.core.PSGPU
      
      * add remote_sparse & local_sparse config
      
      * fix unittest
      
      * fix test_dist_fleet_geo table error
      
      * fix PADDLE_ENFORCE error
      
      * fix other's pr conflict
      3f5c405f
  7. 12 8月, 2022 2 次提交
    • S
      Offload calculations from matmul op to fuse pass (#44941) · acb78ea2
      Sławomir Siwek 提交于
      * remove v2_transpose_reshape
      
      * matmul_transpose_reshape
      
      * reshape_transpose_matmul
      
      * Add int8 support for matmulV2
      
      * restore ut
      
      * adjust old ut
      
      * restore parallel UT ruels
      
      * remove mkldnn code from base ops
      
      * move enforces to pass
      
      * remove duplicated functions
      
      * delete duplicated enforces
      
      * feedback from review
      
      * add comments to variables
      
      * enable eltwise support
      
      * dynamic attribute
      
      * remove fusepass tests from op test
      
      * remove fuse pass cases from op test
      
      * revert introduction of dynamic attributes
      
      * style
      Co-authored-by: Nwozna <joanna.wozna@intel.com>
      acb78ea2
    • K
      transfer memcpy_h2d from fluid to phi (#44932) · 7bc57d35
      kangguangli 提交于
      * transfer memcpy_h2d from fluid to phi
      
      * use UnchangedInferMeta instead
      
      * restore test_standalone_executor
      
      * add newline to fix codestyle check
      
      * rename pt -> phi
      
      * simplify logic and add check
      
      * make the comment more clear
      
      * remove useless comment
      
      * refine code
      7bc57d35
  8. 11 8月, 2022 1 次提交
  9. 10 8月, 2022 4 次提交
  10. 09 8月, 2022 2 次提交
  11. 08 8月, 2022 1 次提交
  12. 05 8月, 2022 4 次提交
  13. 04 8月, 2022 2 次提交
    • S
      Matmuls with activation and elementwise_add fuses (#44655) · 0420d514
      Sławomir Siwek 提交于
      * Add unit tests
      
      * matmul_v2 + activation
      
      * matmuls + elementwise_add
      
      * matmul_v2 postops
      
      * transform matmul to v2
      
      * opcompat
      
      * fix fusing matmul with multipe outs
      
      * add shape constraints
      
      * remove unused vars
      
      * change pass order
      
      * - Unit tests to be debugged
      
      - fix
      
      - refactor
      
      - diagnostic
      
      - more diagnostic
      
      - fix
      
      - Fix number two
      
      - fix
      
      - fix
      
      - fix
      
      - alpha added
      
      - more fixes
      
      - compilation fix
      
      - removed diagnostic code
      
      - cosmetic fixes
      
      * lint
      
      * add alpha constraint
      
      * merge matmul refactor
      
      * trigger CI
      
      * - fix
      
      * - another fix
      
      * code style
      
      * add support for matmul+elementwise_add+activation
      
      * code style
      
      * fix bfloat16 bugs
      
      * change append_binary to append_sum
      Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
      0420d514
    • 0e26361c
  14. 03 8月, 2022 2 次提交
  15. 02 8月, 2022 6 次提交
  16. 01 8月, 2022 3 次提交
  17. 29 7月, 2022 1 次提交