Created by: T8T9
Motivation: The building stage of ci is too slow because
-
Paddle use
Find(CUDA)
to introduce CUDA currently, which doesn't support ccache. Because old cmake doesn't support CUDA, it can only add ccache prefix togcc
andg++
command.Find(Cuda)
includesFindCUDA
module underlying. It will generate a.cu.o.Release.cmake
script for every.cu
file,then callcmake -E <cmake script>
to compile.cu
files. cmake wouldn't addccache
prefix to cmake command. To confirm this, you can checkpaddle/fluid/operators/math/CMakeFiles/fc.dir/build.make
andpaddle/fluid/operators/math/CMakeFiles/fc.dir/fc_generated_fc.cu.o.Release.cmake
in your build directory.cmake 3.10 support CUDA, it can compile
.cu
files by callingnvcc
command directly, just likegcc
andg++
. So, to ccache CUDA object files, we need to use cmake built-in way. -
cmake will use
-x cu
to compile.cu
files usingnvcc
compiler explicitly. but ccache less than 3.7.9 can not recognize this option, this bug has been fixed in ccache 3.7.9.Paddle Ci use ccache 3.6 currently, so we need to upgrade ccache to 3.7.9 to ccache CUDA object files.
-
Paddle pass compiler options(-Wno-unused-function, -Werror etc.) to gcc/g++ by adding
-Xcompiler -Wno-unused-function -Xcompiler -Werror
flags to nvcc. We should notice there is a ' ' between flag-Xcompiler
and-Werror
, ccache might treat it as two separate flags. The problem is bothg++
andnvcc
have a built-in option-Werror
, and-Werror
is a compiler option, so ccache will remove this option when preprocess source files. In this case, ccache will change-Xcompiler -Werror
to-Xcompiler
, and preprocessor can not recognize-Xcompiler
, this will make preprocess fail, and then ccache.To fix this, we should use
-Xcompiler=-Werror
to tell ccache these two flags are binded, and they should be processed together.
Changes:
- use
enable_language(CUDA)
which is the cmake built-in way to support ccache. - upgrade ccache to 3.7.9.
- set ccache max_size to 20GB. ccache will consume about 7GB disk space to cache object files after supporting CUDA, but the default max_size is 5GB, this will cause auto clean frequently, and cache will be deleted circularly, which can increase cache miss significantly. So we need to increase max_size bigger than 7GB. However, 7GB is space consumed by a clean build, paddle ci will build many times for different PRs, so we should make max_size bigger than 7GB to cache more object files to guarantee a high cache hit. I think 20GB is enough.
Expected: Time consumption of building stage should be reduced to about 11 minutes from about 55 minutes.