1. 30 5月, 2023 2 次提交
    • R
      update_c++17 (#53892) · 950b563b
      risemeup1 提交于
      * update_c++17
      
      * update_c++17
      
      * fix windows bug
      
      * solve cirle depend
      
      * solve cirle depend
      
      * solve cirle depend
      
      * solve cirle depend
      
      * solve cirle depend
      
      * fix windows bug
      
      * fix compiler error
      
      * fix compiler error
      
      * update eigen3
      
      * update eigen3
      
      * update eigen3
      
      * fix mac-py3 compiler error
      
      * update C++17
      
      * fix mac compiler error
      
      * fix compile error
      
      * fix coverage_compiler error
      
      * fix coverage_ci_problem
      
      * fix coverage_error
      
      * fix_kunlun200 compile error
      
      * fix kunlun200 compiler error
      
      * fix compile error
      
      * fix compiler error
      
      * fix py3 failed test
      
      * fix kunlun200 compiler error
      
      * test
      
      * fix test error
      
      * fix test error
      
      * fix test error
      
      * test
      
      * test
      
      * fix mac py3 error
      
      * fix mac py3 error
      
      * fix mac py3 error
      
      * fix test error
      
      * fix test error
      
      * fix compile error
      
      * fix compile error
      
      * fix compile error
      
      * test
      
      * test
      
      * fix compiler error
      
      * test
      
      * test
      
      * debug on ci
      
      * fix compiler error
      
      * fix compiler error
      
      * test
      
      * fix cinn compiler error
      
      * test
      
      * fix rocm cmpile error
      
      * fix cinn and kunlun compile error
      
      * update c++14
      
      * Update flags.cmake
      950b563b
    • Y
      [AMP] Reimplement check_nan_inf as check_numerics_kernel. (#52245) · 44bd5927
      Yiqun Liu 提交于
      * Reimplement the check_nan_inf function as check_numerics kernel.
      
      * Remove the cpu implemention to phi.
      
      * Add ifdef for the including of omp.h.
      
      * Move the use of FLAGS_check_nan_inf_level out of header file.
      
      * Implement a common PrintAndThrowError function.
      
      * Fix the error using of __NVCC__, which should be instead with __CUDA_ARCH__.
      
      * Add dependency of phi.
      
      * Polish codes and unittest.
      44bd5927
  2. 26 5月, 2023 1 次提交
    • Y
      [PHI Decoupling]Create PHI shared lib (#53735) · da50a009
      YuanRisheng 提交于
      * create phi so
      
      * fix ci bugs
      
      * fix py3 bugs
      
      * add file
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * perfect so
      
      * fix py3 bugs
      
      * delete all static target in phi
      
      * fix windows bugs
      
      * fix py3 bugs
      
      * fix ci bugs
      
      * fix windows bugs
      
      * fix bugs: gflags can't be linked by dynamic and static lib
      
      * fix bugs that can not load 3rd party
      
      * fix ci bugs
      
      * fix compile bugs
      
      * fix py3 bugs
      
      * fix conflict
      
      * fix xpu bugs
      
      * fix mac compile bugs
      
      * fix psgpu bugs
      
      * fix inference failed
      
      * deal with conflict
      
      * fix LIBRARY_PATH bug
      
      * fix windows bugs
      
      * fix onednn error
      
      * fix windows compile bugs
      
      * fix windows compile bugs
      
      * fix test_cuda_graph_static_mode_error aborted
      
      * fix windows bugs
      
      * fix mac-python3 error
      
      * fix hip compile bugs
      
      * change mode to static
      
      * change to static mode
      
      * fix ci bugs
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * fix bugs
      
      * add static flag
      
      * add PADDLE_API
      
      * change position of PADDLE_API
      
      * fix windows bugs
      
      * change mode to dynamic lib
      
      * fix windows static bugs
      
      * deal with conflict
      
      * fix windows unit bug
      
      * fix coverage
      
      * deal with conflict
      
      * fix windows-inference
      
      * fix py3 bugs
      
      * fix bugs when compile type_info
      
      * fix compile bugs
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * fix windows openblas
      
      * fix xpu bugs
      
      * fix enforce_test in windows
      
      * update code according comment
      
      * fix windows cmake bug
      
      * fix windows bugs
      
      * fix windows bugs
      
      * delete cinn unittest
      
      * fix cinn bugs
      
      ---------
      Co-authored-by: HappyHeavyRain's avatarlzydev <1528794076@qq.com>
      da50a009
  3. 27 4月, 2023 1 次提交
  4. 03 4月, 2023 1 次提交
  5. 29 3月, 2023 1 次提交
    • Y
      Add Fuse Adamw Pass (#50484) · 66098bff
      yuehuayingxueluo 提交于
      * add fuse adamw pass
      
      * fix some bugs
      
      * fix CIbug
      
      * change chunk_size
      
      * fix CI bug
      
      * rm test_fused_adam_op.py
      
      * fix CI bugs
      
      * fix fuse_adamw_op_pass.cc
      
      * change code style
      
      * fix CI bug
      
      * fix ut bug and use_adamw_op_pass.cc
      
      * fix test_fuse_adamw_pass.py
      
      * fix CI bug
      
      * remove fluid
      
      * fix ci bug
      
      * fix CI bug
      66098bff
  6. 22 3月, 2023 1 次提交
    • G
      Add fused_feed_forward pass (#50423) · 5dda0ef6
      Ghost Screaming 提交于
      * Add fused_feed_forward pass for semi-automatic static graph training.
      
      * Add fused_feedforward property in parallel_executor.cc
      
      * Polish code.
      
      * Polish fused feed_forward pass code. Support use_dropout1 and
      use_dropout2 option.
      
      * Support model parallel in fused_feedforward pass.
      5dda0ef6
  7. 10 1月, 2023 1 次提交
  8. 05 12月, 2022 1 次提交
  9. 27 10月, 2022 1 次提交
    • L
      make all cpp tests dynamic linked to libpaddle.so [except windows] (#47088) · 2096448b
      Leo Chen 提交于
      * make all cpp tests dynamic linked to libpaddle.so
      
      * add comments
      
      * keep old cc_test for some tests
      
      * fix some ut
      
      * make some ut use cc_test_old
      
      * fix typos and fit for win32
      
      * fix lib path
      
      * fix some tests
      
      * skip lite test
      
      * fit for rocm
      
      * fit for cinn
      
      * fit for mac
      
      * fit for win32
      
      * skip inference ut
      
      * skip  windows
      
      * fix coverage
      2096448b
  10. 31 8月, 2022 1 次提交
  11. 25 8月, 2022 1 次提交
  12. 04 6月, 2022 1 次提交
  13. 07 3月, 2022 1 次提交
    • M
      cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca
      Ming-Xu Huang 提交于
      * Added cuBlasLtHandle_t to device context.
      
      * Added fused_gemm_epilogue op.
      
      1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
      2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
      2. Act currently only be supported ReLU. (Will add GeLU in the future).
      
      * Added UT to fused_gemm_epilogue op.
      
      * Added LinearAct Pattern
      
      1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
      pattern.
      2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
      3. act currently only support ReLU (Will support GeLU in the future).
      
      * Added FuseGemmEpiloguePass
      
      1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
      fusion (GeLU will be supported in the future).
      2. Only support matmul_v2 from nn.Linear.
      
      * Added pybind to BuildStrageter.fuse_gemm_epilogue_.
      
      * Added UT for fuse_gemm_epilogue_pass.
      
      * GeLU support and EpilogueSingleton
      
      1. Added GeLU support to fused_gemm_epilogue op.
      2. Added EpilogueSingleton to cache auxiliary pointer.
      3. Added related UTs.
      
      * Rename cublaslt_epilogue_opto gemm_epilogue_op.*.
      
      * Added both train and infer pattern to LinearAct.
      
      1. Added support of fwd graph with grap_ops linking to LinearAct.
      2. Added related changes to fuse_gemm_epilogue_pass for above
      modification.
      
      * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.
      
      * Added identity activation support to gemm_epilogue_op.
      
      * Added Linear Fusion (matmul_v2 + ele_add)
      
      1. Added matmul_v2 + ele_add pattern to LinearActPattern.
      2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.
      
      * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*
      
      * Add fused_gemm_epilogue_grad op.
      
      1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.
      
      * Add UTs to fused_gemm_epilogue_grad_op.
      
      * Change attribute name in fused_gemm_epilogue_grad_op for clearing.
      
      * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.
      
      * Added ElementwiseAdd+Matmul+Act graph pattern detection.
      
      * Fuse backward of Linear( Act(x))
      
      1. Added backward fusion pass to Linear( Act(x)).
      2. Added backward fusion pass to Linear(x).
      
      * Added UTs to backward fusion of Linear(Act(x)).
      
      * Complete document of arguments to fused_gemm_epilogue_op.
      
      * Made arguments of some functions pass by reference.
      
      * Modify code with review comments.
      
      1. Made arguments of some function pass by reference.
      2. Removed redundant code.
      3. Followed Google code style to change code.
      
      * Made 'const' code style be consistent
      
      * Fixed random seed of python UTs.
      
      * Set Compiling constrains to cuBlasLt
      
      1. Require CUDA 11.6+
      2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.
      
      * Code Reivew from Paddle
      
      1. Changed arguments name is_first_gemm to without_x_gradient for
      clearing.
      2. Applied PADDLE_THROW in fused_gemm_epilogue_op.
      
      * Remove EpilogueSingleton
      
      1. Applied ReserveSpace to replace Epilogue for passing auxiliary
      pointers between FWD and BWD.
      
      * Fix a logical error and enhance UTs.
      
      1. Added act op count checking in UTs.
      2. Fix issue to fuse backward or ReLU(Linear(X)).
      3. TODO: solve GELU fusion issues.
      
      * Fix Linear and GeLU fusion issues.
      
      1. Modified graph_detech_pattern to fit with both linear wiht gelu or
      relu.
      2. Modified data range in Uts to allow negative values.
      
      * Removed fused_gemm_epilogue_op.h.
      
      * Rename namespace pten to phi.
      
      * Rename name of arguments in fused_gemm_epilogue_op
      
      1. bias -> Bias.
      2. out -> Out.
      3. reserve_space -> ReserveSpace.
      
      * Change EpiloguePassActivationCache as local variable.
      
      1. Removed singleton in EpiloguePassActivationCache.
      2. Made EpiloguePassActivationCache as an argument to each pass
      functions.
      2a3d9eca
  14. 21 1月, 2022 1 次提交
  15. 24 10月, 2021 1 次提交
  16. 15 10月, 2021 1 次提交
    • J
      Add BuildCinnPass (#36345) · b3f02c57
      jiangcheng 提交于
      * Add CinnSubgraphSearchPass
      
      * solve CI problem of subgraph order not same
      
      * fix some bug by review advices
      
      * ensure the independently of subgraph, that mean the subgraph should not have link to out-graph
      
      * rename cinn_subgraph_search_pass to build_cinn_pass and delete paddle_to_cinn_pass
      
      * add flag to control wheter append build cinn pass
      
      * remove AppendPass at ParallelExecutorPassBuilder
      
      * rename paddle_to_cinn_pass to build_cinn_pass in build_strategy and close test_run_from_cinn
      b3f02c57
  17. 11 10月, 2021 1 次提交
  18. 08 9月, 2021 1 次提交
  19. 05 8月, 2021 1 次提交
  20. 29 7月, 2021 1 次提交
    • Z
      add fix op run order pass (#34427) · 79e758c6
      Zeng Jinle 提交于
      * add fix op run order pass
      
      * add ut for fix_op_run_order
      
      * fix ci error
      
      * improve coverage
      
      * improve coverge again and fix cpu test case
      
      * follow some comments
      79e758c6
  21. 28 7月, 2021 1 次提交
  22. 15 7月, 2021 1 次提交
  23. 22 2月, 2021 1 次提交
  24. 18 1月, 2021 1 次提交
  25. 12 1月, 2021 1 次提交
  26. 04 1月, 2021 1 次提交
  27. 24 12月, 2020 1 次提交
  28. 27 10月, 2020 1 次提交
  29. 21 9月, 2020 1 次提交
  30. 02 9月, 2020 1 次提交
    • W
      Add FetchAsyncOpHandle, and use it in FastThreadedExecutor (#26643) · 2d2c31a6
      wanghuancoder 提交于
      * optimized transformation form tensor to numpy, test=develop
      
      * Modify fetch op handle, from memcpy Sync to memcpy Async, test=develop
      
      * modify CUDAPinnedPlace to CPUPlace, test=develop
      
      * modify CPUPlace to CUDAPinnedPlace, and set default inplace to false, test=develop
      
      * revert fetch_op_handle, add fetch_async_op_handle, test=develop
      
      * revert fetch_op_handle, add fetch_async_op_handle, test=develop
      
      * fix error msg report, test=develop
      
      * fix bug in cpuplace, test=develop
      
      * fix bug in unmerge and tensorarray modle, test=develop
      
      * fix bug, double copy gpu memory, test=develop
      
      * fix chenweihang¡¯s review advice, test=develop
      2d2c31a6
  31. 07 7月, 2020 1 次提交
    • H
      catch bad alloc exception (#25140) · 70d7d07f
      hong 提交于
      * cat bad alloc exception; test=develop
      
      * add unitest; test=develop
      
      * move bad alloc catch to the first place; test=develop
      
      * polish error message; test=develop
      
      * polish error message; test=develop
      
      * add mutex header; test=develop
      70d7d07f
  32. 14 4月, 2020 1 次提交
  33. 09 4月, 2020 1 次提交
    • M
      Remove: NGraph engine from PDPD repository (#23545) · 3baaee9a
      mozga-intel 提交于
      * Remove the NGraph engine from PDPD repository
      1. Each operator was removed from the operator's directory
      2. Each test was removed from the unittest directory
      3. The parallel executor support was removed from the PDPD
      4. The CMake file was removed from the PDPD
      5. The NG flags were removed from the repository
      test=develop
      
      * Remove ngraph from:
      1. Cmake file
      2. Python file
      test=develop
      3baaee9a
  34. 01 4月, 2020 1 次提交
  35. 20 3月, 2020 1 次提交
    • Z
      Reader sequential and inference partial feed (#22699) · acfc9b8a
      Zeng Jinle 提交于
      * sequential reader stage 1, test=develop
      
      * fix ut, test=develop
      
      * fix iterable=False reset bug, add some logs and polish code, test=develop
      
      * inference feed partial data, test=develop
      
      * Turn on keep_order=True for test, test=develop
      
      * enhance ut to test more cases, test=develop
      
      * test commit for reverting
      
      * Revert "test commit for reverting", test=develop
      
      This reverts commit 80aef42e.
      
      * add ut of merged and unmerged results, test=develop
      
      * add more uts for coverages and add en doc of api, test=develop
      
      * follow comments, test=develop
      
      * change note style, test=develop
      acfc9b8a
  36. 13 2月, 2020 1 次提交
  37. 07 2月, 2020 1 次提交
    • Y
      Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038
      Yiqun Liu 提交于
      * Add the first implememtation of fusion_group op #19621 (#3)
      
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Enable generating code for a given subgraph. #21126 (#4)
      
      * Enable generating code for a given subgraph.
      
      * Support sorting the subgraph.
      
      * Remove the rearange of expressions because we use the sorted subgraph directly.
      
      * Enable generating code for a subgraph which is composed of grad ops.
      
      * Use expression information to check the accuracy in unittest.
      
      * Separate load and store from computation expressions.
      test=develop
      
      * Improve the loading statements in generated codes.
      test=develop
      
      * Remove unused arguments from formal list.
      test=develop
      
      * Enable the detection of subgraph of grad ops.
      
      * Generate code for detected subgraph in fusion_group_pass.
      
      * Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
      test=develop
      
      * Fix a bug when checking whether the shape of all inputs are the same.
      
      * Add debug information.
      
      * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
      
      test=develop
      
      * Call subgraph_detector in fusion_group pass.
      test=develop
      
      * Disable fusion_group when WITH_GPU is OFF.
      test=develop
      
      * Refine all PADDLE_ENFORCE message.
      test=develop
      
      * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
      test=develop
      
      * Follow review comments.
      test=develop
      dcfb6038
  38. 10 1月, 2020 1 次提交
    • Z
      Add bn and relu fuse pass (#22048) · 46189b16
      Zhen Wang 提交于
      * add bn and relu fuse pass
      
      * add op attr assert and dtype assert
      
      * fix some inputs&&outputs bugs for the fused op and pattern.
      
      * add the unittest for fuse_bn_act_pass. test=develop
      
      * use normative enforce statements. test=develop
      
      * add the cpu test. test=develop
      
      * add the support of batch_size=1 for the bn with relu op. test=develop
      
      * add the error type for paddle throws. test=develop
      
      * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
      46189b16
  39. 12 12月, 2019 1 次提交