- 30 5月, 2023 1 次提交
-
-
由 risemeup1 提交于
* update_c++17 * update_c++17 * fix windows bug * solve cirle depend * solve cirle depend * solve cirle depend * solve cirle depend * solve cirle depend * fix windows bug * fix compiler error * fix compiler error * update eigen3 * update eigen3 * update eigen3 * fix mac-py3 compiler error * update C++17 * fix mac compiler error * fix compile error * fix coverage_compiler error * fix coverage_ci_problem * fix coverage_error * fix_kunlun200 compile error * fix kunlun200 compiler error * fix compile error * fix compiler error * fix py3 failed test * fix kunlun200 compiler error * test * fix test error * fix test error * fix test error * test * test * fix mac py3 error * fix mac py3 error * fix mac py3 error * fix test error * fix test error * fix compile error * fix compile error * fix compile error * test * test * fix compiler error * test * test * debug on ci * fix compiler error * fix compiler error * test * fix cinn compiler error * test * fix rocm cmpile error * fix cinn and kunlun compile error * update c++14 * Update flags.cmake
-
- 12 4月, 2023 1 次提交
-
-
由 zqw_1997 提交于
* slight modify * support cuda12+ arch, Hopper arch and discard 30 arch * add arch 90 for each paddle_known_gpu_archs12 * for comments
-
- 10 4月, 2023 1 次提交
-
-
由 risemeup1 提交于
-
- 29 3月, 2023 1 次提交
-
-
由 sneaxiy 提交于
* fix generate_kernels.py in CUDA 12.0 * fix attrs bug
-
- 24 3月, 2023 1 次提交
-
-
由 ZhangDY-6483 提交于
* first version, notest * return final rst, notest * use infinity() instead of max * ut structure * start up of ut * generate lse * update * add depense * reconstruct cmake * move file * add memory efficient attention and fix blasimpl * update * update cmake * add namespace * update cmake * use .cu * update for pad3d * bug fix * bug fix * update * bug fix * update enforce * add test case * merge the lse pad * fix kernel_fn of backward * fix PADDLE_ENFORCE_EQ and phi_api * fix PADDLE_ENFORCE * fix PADDLE_ENFORCE * rerun coverage * fix memory efficient attention test * rerun ci * add cuda version condition * add cuda version condition * delete WIP test * replace PADDLE_ENFORCE * edit the namespace of datatype in multiple.cc * rerun * rerun --------- Co-authored-by: Nliuyuang <liuyuang@baidu.com>
-
- 08 3月, 2023 1 次提交
-
-
由 pangyoki 提交于
-
- 10 1月, 2023 1 次提交
-
-
由 MarDino 提交于
-
- 27 12月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 08 11月, 2022 1 次提交
-
-
由 chalsliu 提交于
-
- 14 6月, 2022 1 次提交
-
-
由 Wilber 提交于
* cmake-lint * update
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 21 4月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
* update ampere sm * update ampere sm * update ampere sm
-
- 12 4月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
-
- 22 3月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
-
- 02 3月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Adjust GPU Arches for Whl releases * Adjusted CUDA arches * fixed minor issue * adjusted gpu arches
-
- 11 2月, 2022 1 次提交
-
-
由 zhangchunle 提交于
-
- 31 8月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
[Background] Expansion in code size can be irreversible in the long run, leading to huge release packages which not only hampers user experience but also exceeds a hard limit of pypi. In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU arches supported. This PR aims to prune this NV_FATBIN. [Solution] In the new release strategy, two types of whl packages will be involved: Cubin PIP package: PIP package maintains a smaller window for GPU arches support, containing sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches JIT release package: This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60, compute_70, compute_75, compute_80, with best performance and GPU arches coverage. However, it takes around 10 min to install due to the JIT compilation. [How to use] The new release strategy is disabled by default. To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
-
- 14 7月, 2021 1 次提交
-
-
由 zhouweiwei2014 提交于
* Support sccache to speed up compilation on Windows * Support sccache to speed up compilation on Windows
-
- 06 7月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add gpu implementation of shuffle batch test=develop * add thrust cuda patches test=develop * fix macro guard * fix shuffle batch compile on windows/hip * fix hip compilation error * refine CMakeLists.txt * fix windows compile error * try to fix windows CI compilation error * fix windows compilation again * fix shuffle_batch op test on Windows
-
- 02 6月, 2021 1 次提交
-
-
由 Pei Yang 提交于
-
- 26 5月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* fix ninja compilation bug on windows * polish windows ci * polish windows ci
-
- 31 3月, 2021 2 次提交
-
-
由 tianshuo78520a 提交于
-
由 wuhuanzhou 提交于
* update compilation with C++14, test=develop * fix compilation error in eigen, test=develop
-
- 30 3月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 17 3月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
-
- 19 2月, 2021 1 次提交
-
-
由 Wojciech Uss 提交于
* Modify relu native implementation * fix GPU performance
-
- 14 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
-
- 27 11月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
-
- 21 10月, 2020 2 次提交
- 18 9月, 2020 1 次提交
-
-
由 Pei Yang 提交于
-
- 09 9月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 07 9月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 20 8月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
specify cuda arch when dectected fail
-
- 10 8月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* Fixed compile warning about incorrect compile options,fix paddle_build.bat * fix paddle_build.bat to more safe
-
- 09 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 16 6月, 2020 1 次提交
-
-
由 T8T9 提交于
-
- 10 6月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
fix bug in CUDA_NVCC_FALS and CMAKE_CUDA_FLAGS
-
- 08 6月, 2020 1 次提交
-
-
由 T8T9 提交于
* add -DPADDLE_CUDA_BINVER. test=develop, test=win_gpu * nvcc will use add_compile_options, avoid using it if you don't want to pass arguments to nvcc. test=develop * test=develop, test=win_gpu
-
- 05 6月, 2020 1 次提交
-
-
由 T8T9 提交于
* support CUDA using cmake built-in way (#24395) * support CUDA using cmake built-in way. test=develop * test=develop * cmake_minimum_required 3.10 * test=develop
-