1. 07 8月, 2023 1 次提交
    • U
      [WIP] Integration flash attention 2 (#55758) · 0473369f
      umiswing 提交于
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      0473369f
  2. 05 8月, 2023 1 次提交
  3. 04 8月, 2023 11 次提交
    • K
      [NewIR] Rename feed with place to data (#55778) · 274e5e54
      kangguangli 提交于
      * fix bug: feed_with_place should consider variable existence
      
      * fix
      
      * fix build scope
      
      * change method to set feed var name
      
      * remove feed_with_place to placeholder
      
      * fix
      
      * rename to data
      
      * fix
      
      * fix
      274e5e54
    • J
      [Semi AutoParall] Support Partial Semantic I (#55508) · e3b6e02f
      JZ-LIANG 提交于
      e3b6e02f
    • H
      [NewIR]New ir aot placement refactor (#55810) · dd1379e4
      hong 提交于
      * refacot aot
      
      * update
      
      * fix bugs
      
      * remove some test
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * update
      dd1379e4
    • F
      [CINN] Dump more compilation result and optimize parallel compiler flags (#55935) · 39b59603
      Fisher 提交于
      1. `Parallel Compiler`:
          - 合并`FLAGS_cinn_parallel_compile_size`和`FLAGS_cinn_parallel_compile_thread`,通过`FLAGS_cinn_parallel_compile_thread`即可指定编译时使用的线程数,所有的`fusion_groups`将会平均分配到可用的线程上
          - 增强编译完成后返回的信息,除`instruction`外,将`lowered_function`、`source_code`、`source_ptx`返回,供上层进一步使用
      2. Debug信息:
          - 新增`FLAGS_ cinn_dump_group_lowered_func`、`FLAGS_cinn_dump_group_source_code`、`FLAGS_ cinn_dump_group_ptx`、`FLAGS_ cinn_dump_group_instruction`,可分别按`fusion_groups`储存编译的每个阶段中的中间代码
          - 重新整理`graph_visualization`,所有的可视化图、单测代码均能正确分组储存
      3. Bug修复:
          - 修复`MakeDirectory`不能正确创建文件夹的问题
      4. 其他:
          - 清除了一些无用代码
      39b59603
    • R
      [clang-tidy] enable modernize-use-emplace (#55799) · 469a0392
      Ruibin Cheung 提交于
      * [clang-tidy] enable modernize-use-emplace
      
      * Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into modernize_use_emplace
      469a0392
    • Z
      1e4f627d
    • K
      [NewIR] add decorator for dy2st test with new ir (#55840) · b67715a4
      kangguangli 提交于
      * add decorator for new_ir_test
      
      * fix bug and only test in ci-coverage
      
      * fix bug and only test in ci-coverage
      
      * fix
      
      * fix bugs
      
      * fix
      
      * fix
      b67715a4
    • J
      Support Combined indexing for __getitem__ and __setitem__ (#55211) · 697c712f
      JYChen 提交于
      * WIP: start writing combined indexing get
      
      * list/tuple/Variable
      
      * getitem 80%
      
      * add setitem
      
      * add some unittest for setitem
      
      * lazy import
      
      * fix some setitem error
      
      * fix advance indexing with decreasing axes; fix strided_slice input name
      
      * combine int-tensor getitem is ok (without boolean support & broadcast); add getitem unittest for static
      
      * add broadcast & parse bool tensor for __getitem
      
      * [change getitem] _getitem_impl_ to _getitem_static, not deleting the former one
      
      * refine new getitem; fix ut in variable/var_base
      
      * add __getitem__ ut in dygraph
      
      * re-dispatch getitem for Py/CPP; fix strided_slice decrease axes error in dygraph
      
      * fix ut; support tensor in slice
      
      * [change setitem] _setitem_impl_ to _setitem_static, not deleting the former one
      
      * remove some UT (for some, temporarily)
      
      * add IndexError to solve timeout problem in static-mode
      
      * 1.temply forbideen all-False bool-indexput; 2.setitem_static will return new variable
      
      * xpu uses old stratege
      
      * rename dy2st setitem ut to avoid same-name problem
      
      * dy2st for new combined index
      
      * ut case for combine-index with dy2st
      
      * open ut with all-false-bool setitem
      
      * remove useless doc and _getitem_impl_
      
      * change static res
      
      * fix static xpu
      697c712f
    • N
      Fix a bug in VecAutomaticAddPerBlock (#55929) · 81511469
      niuliling123 提交于
      81511469
    • C
      [IR] Reshape2 and Flatten_contiguous_range Support Inplace (#55809) · dd0681e3
      chen 提交于
      * inplace pass support reshape2 and flatten_contiguous_range
      
      * recover the modification to inplace_op_var_pass.cc
      dd0681e3
    • J
      97ab6aa6
  4. 03 8月, 2023 14 次提交
  5. 02 8月, 2023 11 次提交
    • X
      [EvalFrame] support python3.11 in eval frame. (#55887) · f45dd5ee
      xiongkun 提交于
      f45dd5ee
    • W
      Eager tensor doc (#55879) · 880e94fc
      wanghuancoder 提交于
      * add docstring of three eager method
      
      * test=docs_preview
      
      * update element size bind
      
      * update docs of numpy, clone, clear_gradient, element_size; test=docs_preview
      
      * refine clear_gradient docs; test=docs_preview
      
      * refine element_size docs; test=docs_preview
      
      * add detach doc; test=docs_preview
      
      * empty commit; test=docs_preview
      
      * update signature; test=docs_preview
      
      * refactor; test=docs_preview
      
      * empty commit; test=docs_preview
      
      * add docstring of Tensor
      
      * empty commit; test=docs_preview
      
      * refine TensorDoc; test=docs_preview
      
      * refine TensorDoc; test=docs_preview
      
      * remove extra indent in TensorDoc; test=docs_preview
      
      * remove a space; test=docs_preview
      
      * move docs ahead of implementation; test=docs_preview
      
      * refine
      
      ---------
      Co-authored-by: Nwj-Mcat <1435130236@qq.com>
      Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
      880e94fc
    • G
      [clang-tidy] NO.6 enable `modernize-avoid-c-arrays` check (#55774) · c000091e
      gouzil 提交于
      * [clang-tidy] modernize-avoid-c-arrays
      
      * rollback
      
      * [clang-tidy] fix
      
      * close modernize-avoid-c-arrays
      
      * fix PHI_DEFINE_string; add PHI_DEFINE_bool NOLINT
      
      * fix PHI_DEFINE_string
      
      * fix next_h_state and parity err
      
      * fix win32
      
      * fix cuda_graph
      
      * fix accuracy_kernel
      
      * fix math_function
      
      * fix fused_softmax_mask_kernel.cu load_data and warp_reduce; rollback concat_and_split_functor ins_addr
      
      * fix fused_dropout_add_grad_kernel
      
      * fix
      
      * rollback cu
      
      * rollback concat_and_split_functor.cu
      
      * rollback
      c000091e
    • W
      [XPU]Add conv1d fuse pass (#55719) · 22c7a6eb
      wz1qqx 提交于
      22c7a6eb
    • Z
      [IR] NewIr Interpreter Beta run regular (#55828) · 63b7fc80
      zhangbo9674 提交于
      * add interface
      
      * add code
      
      * add code
      
      * add code
      
      * add code
      
      * fix bug
      
      * fix bug
      
      * add var prefix
      
      * add code
      
      * add code
      
      * add code
      
      * fix compile bug
      
      * fix bug
      
      * refine code
      
      * refine code
      
      * refine code
      
      * refine code
      
      * fix bug
      
      * add code
      
      * add code
      
      * fix bug
      
      * add code
      
      * add code
      
      * refine code
      
      * refine code
      
      * fix bug
      
      * add code
      
      * fix bug in phi__kernel_utils
      
      * refine code
      
      * fix bug
      
      * open flag
      
      * refine code
      
      * fix bug
      
      * fix bug
      
      * refine code
      
      * fix bug
      63b7fc80
    • Y
      [Inference] Replace groupNorm when data types are bf16 and fp16, and data... · e61d892a
      yangjianfengo1 提交于
      [Inference] Replace groupNorm when data types are bf16 and fp16, and data format is NHWC implementation. (#55399)
      
      * finish
      
      * cpergroup odd
      
      * fix bf16
      
      * single channel
      
      * code style
      
      * jingdu duiqi
      
      * add head_file
      
      * add bf16 head file
      
      * bf16 2
      
      * bf16
      
      * bf16 head
      
      * bf16 compile
      
      * py test
      
      * bf16 compile
      
      * bf16 compile
      
      * unset py test
      
      * nhwc
      
      * test
      
      * mean var
      
      * bf16 success
      
      * su
      
      * ctest success
      
      * use is_same_as
      
      * is_same
      
      * use is_same
      
      * rtol
      
      * gpu_stream
      
      * del sigmod
      
      * fix bfloat16 type
      
      * use cuda_bf16_hpp
      
      * use_cuda_arch
      
      * bfloat162float2
      
      * del inplace_tol
      
      * del max_releative_tol
      
      * temp store
      
      * jingdu duiqi
      
      * temp store
      
      * plugin
      
      * jingdu duiqi
      
      * duiqi
      
      * include cuda.h
      
      * del half
      
      * half single
      
      * ci
      
      * add const
      
      * ci
      
      * cudamemset
      
      * del printf
      
      * fp16 test
      
      * add half compute
      
      * del br16 ci
      
      * del ci
      
      * ci approve
      
      * del fluid include
      e61d892a
    • W
      fix security bug (#55866) · 92aa92fa
      wanghuancoder 提交于
      * fix security bug
      92aa92fa
    • C
      Add FP16 & BF16 for erfinv (#55287) · 6d7efd09
      cyberslack_lee 提交于
      6d7efd09
    • W
      fix security bug (#55782) · 19da5c0c
      wanghuancoder 提交于
      * fix security bug
      19da5c0c
    • J
      [XPU] Add gather_squeeze_pass (#55605) · d13a49d6
      jiangfan06 提交于
      d13a49d6
    • X
      【new ir】add ir pybind api (#55745) · ef29468e
      xiaoguoguo626807 提交于
      * add ir core
      
      * add test
      
      * modify name
      
      * merge
      
      * add test for __eq__
      
      * shield  test for __eq__
      
      * --amend
      
      * Update new_ir_compiler.cc
      ef29468e
  6. 01 8月, 2023 2 次提交