1. 21 7月, 2023 2 次提交
  2. 20 7月, 2023 10 次提交
    • L
      polish some code (#55583) · f172b02f
      Leo Chen 提交于
      f172b02f
    • S
      【静态图性能优化】图依赖信息复用 (#55389) · ee65599e
      Sonder 提交于
      * add share api for DependencyBuilder
      
      * add judge codes for sharing build results
      
      * add ShareBuildResultsFrom
      
      * update ShareDependencyFrom
      
      * fix error
      
      * add share codes
      
      * fix memory error
      
      * update according review
      
      * update notes
      
      * fix code style
      
      * remove const_cast
      
      * fix code style
      ee65599e
    • H
      [NewIR]Change feed list to variable list && support GPU (#55401) · 75517841
      hong 提交于
      * add feed with place op
      
      * remove useless unitest
      
      * udpate mkldnn
      
      * update
      
      * new ir support builtin slice op
      
      * fix phi kernel adaptor bug
      
      * add enable_static
      
      * remove useless test case
      
      * change feed list to single variable
      
      * support gpu
      
      * fix bug
      
      * remove template
      
      * add more data type
      
      * fix cimpile bug
      75517841
    • L
      Fix UT failure (#55360) · 7eeff7b1
      Leo Chen 提交于
      * Fix TRT multihead matmul UT failure
      7eeff7b1
    • X
      Update gloo in dygraph (#55537) · 1d1e5484
      Xing-lil 提交于
      * update broadcast gloo in dygraph
      
      * update
      
      * update reduce gloo in dygraph
      
      * update reduce gloo in dygraph
      
      * update
      
      * update allreduce allgather
      
      * update all
      
      * update
      
      * update
      
      * update
      1d1e5484
    • Z
      4df00939
    • M
      fix bug of constant folding pass (#55556) · bc61c796
      ming1753 提交于
      bc61c796
    • X
      [Kunlun] Modify some legacy code on distributed training (#55515) · 806f8d2b
      XiaociZhang 提交于
      * [Kunlun] Mofify some legacy code on distributed training
      
      There were limitations on XPUs before, such as concat/split is not
      supported, and c_broadcast only support fp32. These limitations are
      lifted recently.
      
      Multi-device profiling on XPU will also be supported by this PR.
      Without this PR, a hanging broadcast will be issued by devices that
      enables profiling, eventually lead to kernel timeout error.
      
      * fix typo
      806f8d2b
    • J
      [Semi Auto] Entropy SPMD Rule (#55394) · 5f376f00
      JZ-LIANG 提交于
      * base rule
      
      * add sharidng merge
      
      * add sharidng axis merge
      
      * define unified data class for inferencing dist_attr
      
      * test wrap DistTensorSpec in dygraph mode
      
      * matmul main logic done
      
      * shape int64
      
      * common cc
      
      * define unified data class for inferencing dist_attr
      
      * test wrap DistTensorSpec in dygraph mode
      
      * define python api and wrap function in static mode for DistTensorSpec
      
      * revise syntax
      
      * map bugfix
      
      * broadcast func
      
      * compile 1
      
      * add unitest
      
      * add registry
      
      * update unitest
      
      * bugfix
      
      * bugfix
      
      * add pybind
      
      * bugfix
      
      * bugfix macro gloabl name space
      
      * bugfix macro gloabl name space
      
      * pybind
      
      * pybind test
      
      * pybind bugfixed1
      
      * pybind bugfixed2
      
      * pybind unitest
      
      * merge dev
      
      * merge dev
      
      * merge dev
      
      * fixed cmake conflict
      
      * fixed cmake conflict
      
      * rename get method
      
      * revise inferforward output type
      
      * revise comment
      
      * replicated rule
      
      * replicated rule 2
      
      * revert bug deps
      
      * add rule
      
      * add unitest
      
      * add rule
      
      * add unitest
      
      * move ut of auto_parallel
      
      * fix ut
      
      * bugfix
      
      * bugfix
      
      * bugfix
      
      * bugfix
      
      * bugfix
      
      * bugfix
      
      * bugfix
      
      * resolute input sharding conflict maybe
      
      * fixed comment
      
      * add rule
      
      * add unitest
      
      * fixed typoes
      
      ---------
      Co-authored-by: NYichen Zhang <zhangyichen03@baidu.com>
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      5f376f00
    • Z
      [IR] Add variable name prefix for BuildScope (#55536) · 44f409cf
      zhangbo9674 提交于
      * add interface
      
      * add code
      
      * add code
      
      * add code
      
      * add code
      
      * fix bug
      
      * fix bug
      
      * add var prefix
      44f409cf
  3. 19 7月, 2023 9 次提交
  4. 18 7月, 2023 3 次提交
  5. 17 7月, 2023 7 次提交
  6. 14 7月, 2023 5 次提交
  7. 13 7月, 2023 4 次提交