1. 15 6月, 2021 1 次提交
  2. 07 6月, 2021 1 次提交
  3. 31 5月, 2021 1 次提交
  4. 25 5月, 2021 1 次提交
    • N
      Makefile: LTO: have linker check -Wframe-larger-than · 24845dcb
      Nick Desaulniers 提交于
      -Wframe-larger-than= requires stack frame information, which the
      frontend cannot provide. This diagnostic is emitted late during
      compilation once stack frame size is available.
      
      When building with LTO, the frontend simply lowers C to LLVM IR and does
      not have stack frame information, so it cannot emit this diagnostic.
      When the linker drives LTO, it restarts optimizations and lowers LLVM IR
      to object code. At that point, it has stack frame information but
      doesn't know to check for a specific max stack frame size.
      
      I consider this a bug in LLVM that we need to fix. There are some
      details we're working out related to LTO such as which value to use when
      there are multiple different values specified per TU, or how to
      propagate these to compiler synthesized routines properly, if at all.
      
      Until it's fixed, ensure we don't miss these. At that point we can wrap
      this in a compiler version guard or revert this based on the minimum
      support version of Clang.
      
      The error message is not generated during link:
        LTO     vmlinux.o
      ld.lld: warning: stack size limit exceeded (8224) in foobarbaz
      
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Reported-by: 八两学士's avatarCandle Sun <candlesea@gmail.com>
      Suggested-by: NFangrui Song <maskray@google.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210312010942.1546679-1-ndesaulniers@google.com
      24845dcb
  5. 24 5月, 2021 1 次提交
  6. 17 5月, 2021 1 次提交
  7. 10 5月, 2021 1 次提交
  8. 06 5月, 2021 3 次提交
  9. 03 5月, 2021 1 次提交
  10. 01 5月, 2021 2 次提交
  11. 26 4月, 2021 1 次提交
  12. 25 4月, 2021 18 次提交
    • N
      kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test · f634ca65
      Nathan Chancellor 提交于
      Normally, invocations of $(HOSTCC) include $(KBUILD_HOSTLDFLAGS), which
      in turn includes $(HOSTLDFLAGS), which allows users to pass in their own
      flags when linking. However, the 'has_libelf' test does not, meaning
      that if a user requests a specific linker via HOSTLDFLAGS=-fuse-ld=...,
      it is not respected and the build might error.
      
      For example, if a user building with clang wants to use all of the LLVM
      tools without any GNU tools, they might remove all of the GNU tools from
      their system or PATH then build with
      
      $ make HOSTLDFLAGS=-fuse-ld=lld LLVM=1 LLVM_IAS=1
      
      which says use all of the LLVM tools, the integrated assembler, and
      ld.lld for linking host executables. Without this change, the build will
      error because $(HOSTCC) uses its default linker, rather than the one
      requested via -fuse-ld=..., which is GNU ld in clang's case in a default
      configuration.
      
      error: Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please
      install libelf-dev, libelf-devel or elfutils-libelf-devel
      make[1]: *** [Makefile:1260: prepare-objtool] Error 1
      
      Add $(KBUILD_HOSTLDFLAGS) to the 'has_libelf' test so that the linker
      choice is respected.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/479Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      f634ca65
    • M
      kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst · 961ab4a3
      Masahiro Yamada 提交于
      scripts/Makefile.modsign is a subset of scripts/Makefile.modinst,
      and duplicates the code. Let's merge them.
      
      By the way, you do not need to run 'make modules_sign' explicitly
      because modules are signed as a part of 'make modules_install' when
      CONFIG_MODULE_SIG_ALL=y. If CONFIG_MODULE_SIG_ALL=n, mod_sign_cmd is
      set to 'true', so 'make modules_sign' is not functional.
      
      In my understanding, the reason of still keeping this is to handle
      corner cases like commit 64178cb6 ("builddeb: fix stripped module
      signatures if CONFIG_DEBUG_INFO and CONFIG_MODULE_SIG_ALL are set").
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      961ab4a3
    • M
      kbuild: move module strip/compression code into scripts/Makefile.modinst · 65ce9c38
      Masahiro Yamada 提交于
      Both mod_strip_cmd and mod_compress_cmd are only used in
      scripts/Makefile.modinst, hence there is no good reason to define them
      in the top Makefile. Move the relevant code to scripts/Makefile.modinst.
      
      Also, show separate log messages for each of install, strip, sign, and
      compress.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      65ce9c38
    • M
      kbuild: refactor scripts/Makefile.modinst · ccae4cfa
      Masahiro Yamada 提交于
      scripts/Makefile.modinst is ugly and weird in multiple ways; it
      specifies real files $(modules) as phony, makes directory manipulation
      needlessly too complicated.
      
      Clean up the Makefile code, and show the full path of installed modules
      in the log.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      ccae4cfa
    • M
      kbuild: rename extmod-prefix to extmod_prefix · 7f69180b
      Masahiro Yamada 提交于
      This seems to be useful in sub-make as well. As a preparation of
      exporting it, rename extmod-prefix to extmod_prefix because exported
      variables cannot contain hyphens.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      7f69180b
    • M
      kbuild: check module name conflict for external modules as well · 1a998be6
      Masahiro Yamada 提交于
      If there are multiple modules with the same name in the same external
      module tree, there is ambiguity about which one will be loaded, and
      very likely something odd is happening.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      1a998be6
    • M
      kbuild: show the target directory for depmod log · 3ac42b21
      Masahiro Yamada 提交于
      It is clearer to show the directory which depmod will work on.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      3ac42b21
    • M
      kbuild: unify modules(_install) for in-tree and external modules · 3e3005df
      Masahiro Yamada 提交于
      If you attempt to build or install modules ('make modules(_install)'
      with CONFIG_MODULES disabled, you will get a clear error message, but
      nothing for external module builds.
      
      Factor out the modules and modules_install rules into the common part,
      so you will get the same error message when you try to build external
      modules with CONFIG_MODULES=n.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      3e3005df
    • M
      kbuild: remove unneeded mkdir for external modules_install · 4b97ec0e
      Masahiro Yamada 提交于
      scripts/Makefile.modinst creates directories as needed.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      4b97ec0e
    • M
      kbuild: generate Module.symvers only when vmlinux exists · 69bc8d38
      Masahiro Yamada 提交于
      The external module build shows the following warning if Module.symvers
      is missing in the kernel tree.
      
        WARNING: Symbol version dump "Module.symvers" is missing.
                 Modules may not have dependencies or modversions.
      
      I think this is an important heads-up because the resulting modules may
      not work as expected. This happens when you did not build the entire
      kernel tree, for example, you might have prepared the minimal setups
      for external modules by 'make defconfig && make modules_preapre'.
      
      A problem is that 'make modules' creates Module.symvers even without
      vmlinux. In this case, that warning is suppressed since Module.symvers
      already exists in spite of its incomplete content.
      
      The incomplete (i.e. invalid) Module.symvers should not be created.
      
      This commit changes the second pass of modpost to dump symbols into
      modules-only.symvers. The final Module.symvers is created by
      concatenating vmlinux.symvers and modules-only.symvers if both exist.
      
      Module.symvers is supposed to collect symbols from both vmlinux and
      modules. It might be a bit confusing, and I am not quite sure if it
      is an official interface, but presumably it is difficult to rename it
      because some tools (e.g. kmod) parse it.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      69bc8d38
    • M
      kbuild: check the minimum assembler version in Kconfig · ba64beb1
      Masahiro Yamada 提交于
      Documentation/process/changes.rst defines the minimum assembler version
      (binutils version), but we have never checked it in the build time.
      
      Kbuild never invokes 'as' directly because all assembly files in the
      kernel tree are *.S, hence must be preprocessed. I do not expect
      raw assembly source files (*.s) would be added to the kernel tree.
      
      Therefore, we always use $(CC) as the assembler driver, and commit
      aa824e0c ("kbuild: remove AS variable") removed 'AS'. However,
      we are still interested in the version of the assembler acting behind.
      
      As usual, the --version option prints the version string.
      
        $ as --version | head -n 1
        GNU assembler (GNU Binutils for Ubuntu) 2.35.1
      
      But, we do not have $(AS). So, we can add the -Wa prefix so that
      $(CC) passes --version down to the backing assembler.
      
        $ gcc -Wa,--version | head -n 1
        gcc: fatal error: no input files
        compilation terminated.
      
      OK, we need to input something to satisfy gcc.
      
        $ gcc -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1
        GNU assembler (GNU Binutils for Ubuntu) 2.35.1
      
      The combination of Clang and GNU assembler works in the same way:
      
        $ clang -no-integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1
        GNU assembler (GNU Binutils for Ubuntu) 2.35.1
      
      Clang with the integrated assembler fails like this:
      
        $ clang -integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1
        clang: error: unsupported argument '--version' to option 'Wa,'
      
      For the last case, checking the error message is fragile. If the
      proposal for -Wa,--version support [1] is accepted, this may not be
      even an error in the future.
      
      One easy way is to check if -integrated-as is present in the passed
      arguments. We did not pass -integrated-as to CLANG_FLAGS before, but
      we can make it explicit.
      
      Nathan pointed out -integrated-as is the default for all of the
      architectures/targets that the kernel cares about, but it goes
      along with "explicit is better than implicit" policy. [2]
      
      With all this in my mind, I implemented scripts/as-version.sh to
      check the assembler version in Kconfig time.
      
        $ scripts/as-version.sh gcc
        GNU 23501
        $ scripts/as-version.sh clang -no-integrated-as
        GNU 23501
        $ scripts/as-version.sh clang -integrated-as
        LLVM 0
      
      [1]: https://github.com/ClangBuiltLinux/linux/issues/1320
      [2]: https://lore.kernel.org/linux-kbuild/20210307044253.v3h47ucq6ng25iay@archlinux-ax161/Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      ba64beb1
    • M
      kbuild: replace sed with $(subst ) or $(patsubst ) · 6e0839fd
      Masahiro Yamada 提交于
      For simple text replacement, it is better to use a built-in function
      instead of sed if possible. You can save one process forking.
      
      I do not mean to replace all sed invocations because GNU Make itself
      does not support regular expression (unless you use guile).
      
      I just replaced simple ones.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      6e0839fd
    • N
      Makefile: Only specify '--prefix=' when building with clang + GNU as · eec08090
      Nathan Chancellor 提交于
      When building with LLVM_IAS=1, there is no point to specifying
      '--prefix=' because that flag is only used to find GNU cross tools,
      which will not be used indirectly when using the integrated assembler.
      All of the tools are invoked directly from PATH or a full path specified
      via the command line, which does not depend on the value of '--prefix='.
      
      Sharing commands to reproduce issues becomes a little bit easier without
      a '--prefix=' value because that '--prefix=' value is specific to a
      user's machine due to it being an absolute path.
      
      Some further notes from Fangrui Song:
      
        clang can spawn GNU as (if -f?no-integrated-as is specified) and GNU
        objcopy (-f?no-integrated-as and -gsplit-dwarf and -g[123]).
        objcopy is only used for GNU as assembled object files.
        With integrated assembler, the object file streamer creates .o and
        .dwo simultaneously.
        With GNU as, two objcopy commands are needed to extract .debug*.dwo to
        .dwo files && another command to remove .debug*.dwo sections.
      
      A small consequence of this change (to keep things simple) is that
      '--prefix=' will always be specified now, even with a native build, when
      it was not before. This should not be an issue due to the way that the
      Makefile searches for the prefix (based on elfedit's location). This
      ends up improving the experience for host builds because PATH is better
      respected and matches GCC's behavior more closely. See the below thread
      for more details:
      
      https://lore.kernel.org/r/20210205213651.GA16907@Ryzen-5-4500U.localdomain/Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      eec08090
    • N
      Makefile: Remove '--gcc-toolchain' flag · c91d4e47
      Nathan Chancellor 提交于
      This flag was originally added to allow clang to find the GNU cross
      tools in commit 785f11aa ("kbuild: Add better clang cross build
      support"). This flag was not enough to find the tools at times so
      '--prefix' was added to the list in commit ef8c4ed9 ("kbuild: allow
      to use GCC toolchain not in Clang search path") and improved upon in
      commit ca9b31f6 ("Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang
      cross compilation"). Now that '--prefix' specifies a full path and
      prefix, '--gcc-toolchain' serves no purpose because the kernel builds
      with '-nostdinc' and '-nostdlib'.
      
      This has been verified with self compiled LLVM 10.0.1 and LLVM 13.0.0 as
      well as a distribution version of LLVM 11.1.0 without binutils in the
      LLVM toolchain locations.
      
      Link: https://reviews.llvm.org/D97902Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Reviewed-by: NFangrui Song <maskray@google.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      c91d4e47
    • R
      kbuild: apply fixdep logic to link-vmlinux.sh · 0b956e20
      Rasmus Villemoes 提交于
      The patch adding CONFIG_VMLINUX_MAP revealed a small defect in the
      build system: link-vmlinux.sh takes decisions based on CONFIG_*
      options, but changing one of those does not always lead to vmlinux
      being linked again.
      
      For most of the CONFIG_* knobs referenced previously, this has
      probably been hidden by those knobs also affecting some object file,
      hence indirectly also vmlinux.
      
      But CONFIG_VMLINUX_MAP is only handled inside link-vmlinux.sh, and
      changing CONFIG_VMLINUX_MAP=n to CONFIG_VMLINUX_MAP=y does not cause
      the build system to re-link (and hence have vmlinux.map
      emitted). Since that map file is mostly a debugging aid, this is
      merely a nuisance which is easily worked around by just deleting
      vmlinux and building again.
      
      But one could imagine other (possibly future) CONFIG options that
      actually do affect the vmlinux binary but which are not captured
      through some object file dependency.
      
      To fix this, make link-vmlinux.sh emit a .vmlinux.d file in the same
      format as the dependency files generated by gcc, and apply the fixdep
      logic to that. I've tested that this correctly works with both in-tree
      and out-of-tree builds.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      0b956e20
    • M
      kbuild: show warning if 'make headers_check' is used · 609bbb4d
      Masahiro Yamada 提交于
      Since commit 7ecaf069 ("kbuild: move headers_check rule to
      usr/include/Makefile"), 'make headers_check' is no-op.
      
      This stub target is remaining here in case some scripts still invoke it.
      In order to prompt people to remove stale code, show a noisy warning
      message if used. The stub will be really removed after the Linux 5.15
      release.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      609bbb4d
    • M
      kbuild: include Makefile.compiler only when compiler is needed · 805b2e1d
      Masahiro Yamada 提交于
      Since commit f2f02ebd ("kbuild: improve cc-option to clean up all
      temporary files"), running 'make kernelversion' in a read-only source
      tree emits a bunch of warnings:
      
        mkdir: cannot create directory '.tmp_12345': Permission denied
      
      No-build targets such as kernelversion, clean, help, etc. do not
      need to evaluate $(call cc-option,) or friends. Skip Makefile.compiler
      so $(call cc-option,) becomes no-op.
      
      This not only fixes the warnings, but also runs non-build targets much
      faster.
      
      Basically, all installation targets should also be non-build targets.
      Unfortunately, vdso_install requires the compiler because it builds
      vdso before installation. This is a problem that must be fixed by a
      separate patch.
      Reported-by: NIsrael Tsadok <itsadok@gmail.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      805b2e1d
    • M
      kbuild: split cc-option and friends to scripts/Makefile.compiler · 57fd251c
      Masahiro Yamada 提交于
      scripts/Kbuild.include is included everywhere, but macros such as
      cc-option are needed by build targets only.
      
      For example, when 'make clean' traverses the tree, it does not need
      to evaluate $(call cc-option,).
      
      Split cc-option, ld-option, etc. to scripts/Makefile.compiler, which
      is only included from the top Makefile and scripts/Makefile.build.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      57fd251c
  13. 19 4月, 2021 1 次提交
  14. 14 4月, 2021 2 次提交
  15. 12 4月, 2021 1 次提交
  16. 09 4月, 2021 2 次提交
    • N
      keys: cleanup build time module signing keys · b31f2a49
      Nayna Jain 提交于
      The "mrproper" target is still looking for build time generated keys in
      the kernel root directory instead of certs directory. Fix the path and
      remove the names of the files which are no longer generated.
      
      Fixes: cfc411e7 ("Move certificate handling to its own directory")
      Signed-off-by: NNayna Jain <nayna@linux.ibm.com>
      Reviewed-by: NStefan Berger <stefanb@linux.ibm.com>
      Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      b31f2a49
    • S
      add support for Clang CFI · cf68fffb
      Sami Tolvanen 提交于
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
  17. 08 4月, 2021 1 次提交
    • K
      stack: Optionally randomize kernel stack offset each syscall · 39218ff4
      Kees Cook 提交于
      This provides the ability for architectures to enable kernel stack base
      address offset randomization. This feature is controlled by the boot
      param "randomize_kstack_offset=on/off", with its default value set by
      CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
      
      This feature is based on the original idea from the last public release
      of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
      All the credit for the original idea goes to the PaX team. Note that
      the design and implementation of this upstream randomize_kstack_offset
      feature differs greatly from the RANDKSTACK feature (see below).
      
      Reasoning for the feature:
      
      This feature aims to make harder the various stack-based attacks that
      rely on deterministic stack structure. We have had many such attacks in
      past (just to name few):
      
      https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
      https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
      https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
      
      As Linux kernel stack protections have been constantly improving
      (vmap-based stack allocation with guard pages, removal of thread_info,
      STACKLEAK), attackers have had to find new ways for their exploits
      to work. They have done so, continuing to rely on the kernel's stack
      determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
      were not relevant. For example, the following recent attacks would have
      been hampered if the stack offset was non-deterministic between syscalls:
      
      https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
      (page 70: targeting the pt_regs copy with linear stack overflow)
      
      https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
      (leaked stack address from one syscall as a target during next syscall)
      
      The main idea is that since the stack offset is randomized on each system
      call, it is harder for an attack to reliably land in any particular place
      on the thread stack, even with address exposures, as the stack base will
      change on the next syscall. Also, since randomization is performed after
      placing pt_regs, the ptrace-based approach[1] to discover the randomized
      offset during a long-running syscall should not be possible.
      
      Design description:
      
      During most of the kernel's execution, it runs on the "thread stack",
      which is pretty deterministic in its structure: it is fixed in size,
      and on every entry from userspace to kernel on a syscall the thread
      stack starts construction from an address fetched from the per-cpu
      cpu_current_top_of_stack variable. The first element to be pushed to the
      thread stack is the pt_regs struct that stores all required CPU registers
      and syscall parameters. Finally the specific syscall function is called,
      with the stack being used as the kernel executes the resulting request.
      
      The goal of randomize_kstack_offset feature is to add a random offset
      after the pt_regs has been pushed to the stack and before the rest of the
      thread stack is used during the syscall processing, and to change it every
      time a process issues a syscall. The source of randomness is currently
      architecture-defined (but x86 is using the low byte of rdtsc()). Future
      improvements for different entropy sources is possible, but out of scope
      for this patch. Further more, to add more unpredictability, new offsets
      are chosen at the end of syscalls (the timing of which should be less
      easy to measure from userspace than at syscall entry time), and stored
      in a per-CPU variable, so that the life of the value does not stay
      explicitly tied to a single task.
      
      As suggested by Andy Lutomirski, the offset is added using alloca()
      and an empty asm() statement with an output constraint, since it avoids
      changes to assembly syscall entry code, to the unwinder, and provides
      correct stack alignment as defined by the compiler.
      
      In order to make this available by default with zero performance impact
      for those that don't want it, it is boot-time selectable with static
      branches. This way, if the overhead is not wanted, it can just be
      left turned off with no performance impact.
      
      The generated assembly for x86_64 with GCC looks like this:
      
      ...
      ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
      					    # 12380 <kstack_offset>
      ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
      ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
      ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
      ffffffff8100398c: 48 29 c4              sub %rax,%rsp
      ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
      ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
      ...
      
      As a result of the above stack alignment, this patch introduces about
      5 bits of randomness after pt_regs is spilled to the thread stack on
      x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
      stack alignment). The amount of entropy could be adjusted based on how
      much of the stack space we wish to trade for security.
      
      My measure of syscall performance overhead (on x86_64):
      
      lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
          randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
          randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds
      
      So, roughly 0.9% overhead growth for a no-op syscall, which is very
      manageable. And for people that don't want this, it's off by default.
      
      There are two gotchas with using the alloca() trick. First,
      compilers that have Stack Clash protection (-fstack-clash-protection)
      enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
      any dynamic stack allocations. While the randomization offset is
      always less than a page, the resulting assembly would still contain
      (unreachable!) probing routines, bloating the resulting assembly. To
      avoid this, -fno-stack-clash-protection is unconditionally added to
      the kernel Makefile since this is the only dynamic stack allocation in
      the kernel (now that VLAs have been removed) and it is provably safe
      from Stack Clash style attacks.
      
      The second gotcha with alloca() is a negative interaction with
      -fstack-protector*, in that it sees the alloca() as an array allocation,
      which triggers the unconditional addition of the stack canary function
      pre/post-amble which slows down syscalls regardless of the static
      branch. In order to avoid adding this unneeded check and its associated
      performance impact, architectures need to carefully remove uses of
      -fstack-protector-strong (or -fstack-protector) in the compilation units
      that use the add_random_kstack() macro and to audit the resulting stack
      mitigation coverage (to make sure no desired coverage disappears). No
      change is visible for this on x86 because the stack protector is already
      unconditionally disabled for the compilation unit, but the change is
      required on arm64. There is, unfortunately, no attribute that can be
      used to disable stack protector for specific functions.
      
      Comparison to PaX RANDKSTACK feature:
      
      The RANDKSTACK feature randomizes the location of the stack start
      (cpu_current_top_of_stack), i.e. including the location of pt_regs
      structure itself on the stack. Initially this patch followed the same
      approach, but during the recent discussions[2], it has been determined
      to be of a little value since, if ptrace functionality is available for
      an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
      different offsets in the pt_regs struct, observe the cache behavior of
      the pt_regs accesses, and figure out the random stack offset. Another
      difference is that the random offset is stored in a per-cpu variable,
      rather than having it be per-thread. As a result, these implementations
      differ a fair bit in their implementation details and results, though
      obviously the intent is similar.
      
      [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
      [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
      [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.htmlCo-developed-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
      39218ff4
  18. 05 4月, 2021 1 次提交