You need to sign in or sign up before continuing.
  1. 10 3月, 2023 1 次提交
    • S
      module: replace module_layout with module_memory · ac3b4328
      Song Liu 提交于
      module_layout manages different types of memory (text, data, rodata, etc.)
      in one allocation, which is problematic for some reasons:
      
      1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
      2. It is hard to use huge pages in modules (and not break strict rwx).
      3. Many archs uses module_layout for arch-specific data, but it is not
         obvious how these data are used (are they RO, RX, or RW?)
      
      Improve the scenario by replacing 2 (or 3) module_layout per module with
      up to 7 module_memory per module:
      
              MOD_TEXT,
              MOD_DATA,
              MOD_RODATA,
              MOD_RO_AFTER_INIT,
              MOD_INIT_TEXT,
              MOD_INIT_DATA,
              MOD_INIT_RODATA,
      
      and allocating them separately. This adds slightly more entries to
      mod_tree (from up to 3 entries per module, to up to 7 entries per
      module). However, this at most adds a small constant overhead to
      __module_address(), which is expected to be fast.
      
      Various archs use module_layout for different data. These data are put
      into different module_memory based on their location in module_layout.
      IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
      data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
      
      module_memory simplifies quite some of the module code. For example,
      ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
      different allocator for the data. kernel/module/strict_rwx.c is also
      much cleaner with module_memory.
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      ac3b4328
  2. 07 2月, 2023 1 次提交
  3. 20 1月, 2023 1 次提交
  4. 12 11月, 2022 1 次提交
  5. 26 10月, 2022 1 次提交
  6. 27 9月, 2022 2 次提交
  7. 12 7月, 2022 2 次提交
  8. 13 5月, 2022 1 次提交
  9. 05 4月, 2022 3 次提交
  10. 12 1月, 2022 1 次提交
  11. 14 12月, 2021 1 次提交
  12. 19 7月, 2021 1 次提交
    • C
      printk: Userspace format indexing support · 33701557
      Chris Down 提交于
      We have a number of systems industry-wide that have a subset of their
      functionality that works as follows:
      
      1. Receive a message from local kmsg, serial console, or netconsole;
      2. Apply a set of rules to classify the message;
      3. Do something based on this classification (like scheduling a
         remediation for the machine), rinse, and repeat.
      
      As a couple of examples of places we have this implemented just inside
      Facebook, although this isn't a Facebook-specific problem, we have this
      inside our netconsole processing (for alarm classification), and as part
      of our machine health checking. We use these messages to determine
      fairly important metrics around production health, and it's important
      that we get them right.
      
      While for some kinds of issues we have counters, tracepoints, or metrics
      with a stable interface which can reliably indicate the issue, in order
      to react to production issues quickly we need to work with the interface
      which most kernel developers naturally use when developing: printk.
      
      Most production issues come from unexpected phenomena, and as such
      usually the code in question doesn't have easily usable tracepoints or
      other counters available for the specific problem being mitigated. We
      have a number of lines of monitoring defence against problems in
      production (host metrics, process metrics, service metrics, etc), and
      where it's not feasible to reliably monitor at another level, this kind
      of pragmatic netconsole monitoring is essential.
      
      As one would expect, monitoring using printk is rather brittle for a
      number of reasons -- most notably that the message might disappear
      entirely in a new version of the kernel, or that the message may change
      in some way that the regex or other classification methods start to
      silently fail.
      
      One factor that makes this even harder is that, under normal operation,
      many of these messages are never expected to be hit. For example, there
      may be a rare hardware bug which one wants to detect if it was to ever
      happen again, but its recurrence is not likely or anticipated. This
      precludes using something like checking whether the printk in question
      was printed somewhere fleetwide recently to determine whether the
      message in question is still present or not, since we don't anticipate
      that it should be printed anywhere, but still need to monitor for its
      future presence in the long-term.
      
      This class of issue has happened on a number of occasions, causing
      unhealthy machines with hardware issues to remain in production for
      longer than ideal. As a recent example, some monitoring around
      blk_update_request fell out of date and caused semi-broken machines to
      remain in production for longer than would be desirable.
      
      Searching through the codebase to find the message is also extremely
      fragile, because many of the messages are further constructed beyond
      their callsite (eg. btrfs_printk and other module-specific wrappers,
      each with their own functionality). Even if they aren't, guessing the
      format and formulation of the underlying message based on the aesthetics
      of the message emitted is not a recipe for success at scale, and our
      previous issues with fleetwide machine health checking demonstrate as
      much.
      
      This provides a solution to the issue of silently changed or deleted
      printks: we record pointers to all printk format strings known at
      compile time into a new .printk_index section, both in vmlinux and
      modules. At runtime, this can then be iterated by looking at
      <debugfs>/printk/index/<module>, which emits the following format, both
      readable by humans and able to be parsed by machines:
      
          $ head -1 vmlinux; shuf -n 5 vmlinux
          # <level[,flags]> filename:line function "format"
          <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
          <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
          <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
          <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
          <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
      
      This mitigates the majority of cases where we have a highly-specific
      printk which we want to match on, as we can now enumerate and check
      whether the format changed or the printk callsite disappeared entirely
      in userspace. This allows us to catch changes to printks we monitor
      earlier and decide what to do about it before it becomes problematic.
      
      There is no additional runtime cost for printk callers or printk itself,
      and the assembly generated is exactly the same.
      Signed-off-by: NChris Down <chris@chrisdown.name>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
      33701557
  13. 09 7月, 2021 1 次提交
    • S
      module: add printk formats to add module build ID to stacktraces · 9294523e
      Stephen Boyd 提交于
      Let's make kernel stacktraces easier to identify by including the build
      ID[1] of a module if the stacktrace is printing a symbol from a module.
      This makes it simpler for developers to locate a kernel module's full
      debuginfo for a particular stacktrace.  Combined with
      scripts/decode_stracktrace.sh, a developer can download the matching
      debuginfo from a debuginfod[2] server and find the exact file and line
      number for the functions plus offsets in a stacktrace that match the
      module.  This is especially useful for pstore crash debugging where the
      kernel crashes are recorded in something like console-ramoops and the
      recovery kernel/modules are different or the debuginfo doesn't exist on
      the device due to space concerns (the debuginfo can be too large for space
      limited devices).
      
      Originally, I put this on the %pS format, but that was quickly rejected
      given that %pS is used in other places such as ftrace where build IDs
      aren't meaningful.  There was some discussions on the list to put every
      module build ID into the "Modules linked in:" section of the stacktrace
      message but that quickly becomes very hard to read once you have more than
      three or four modules linked in.  It also provides too much information
      when we don't expect each module to be traversed in a stacktrace.  Having
      the build ID for modules that aren't important just makes things messy.
      Splitting it to multiple lines for each module quickly explodes the number
      of lines printed in an oops too, possibly wrapping the warning off the
      console.  And finally, trying to stash away each module used in a
      callstack to provide the ID of each symbol printed is cumbersome and would
      require changes to each architecture to stash away modules and return
      their build IDs once unwinding has completed.
      
      Instead, we opt for the simpler approach of introducing new printk formats
      '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
      for "pointer backtrace with module build ID" and then updating the few
      places in the architecture layer where the stacktrace is printed to use
      this new format.
      
      Before:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm]
        direct_entry+0x16c/0x1b4 [lkdtm]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      After:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
      [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
        Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
      [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
      [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
        Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com
      
      Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
      Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
      Link: https://sourceware.org/elfutils/Debuginfod.html [2]
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Signed-off-by: NBixuan Cui <cuibixuan@huawei.com>
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Evan Green <evgreen@chromium.org>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9294523e
  14. 09 4月, 2021 1 次提交
    • S
      add support for Clang CFI · cf68fffb
      Sami Tolvanen 提交于
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
  15. 18 3月, 2021 1 次提交
  16. 08 2月, 2021 6 次提交
  17. 07 12月, 2020 1 次提交
  18. 25 11月, 2020 2 次提交
    • J
      module: simplify version-attribute handling · b112082c
      Johan Hovold 提交于
      Instead of using the array-of-pointers trick to avoid having gcc mess up
      the built-in module-version array stride, specify type alignment when
      declaring entries to prevent gcc from increasing alignment.
      
      This is essentially an alternative (one-line) fix to the problem
      addressed by commit b4bc8428 ("module: deal with alignment issues in
      built-in module versions").
      
      gcc can increase the alignment of larger objects with static extent as
      an optimisation, but this can be suppressed by using the aligned
      attribute when declaring variables.
      
      Note that we have been relying on this behaviour for kernel parameters
      for 16 years and it indeed hasn't changed since the introduction of the
      aligned attribute in gcc-3.1.
      
      Link: https://lore.kernel.org/lkml/20201103175711.10731-1-johan@kernel.orgSigned-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      b112082c
    • J
      module: drop version-attribute alignment · 0801a007
      Johan Hovold 提交于
      Commit 98562ad8 ("module: explicitly align module_version_attribute
      structure") added an alignment attribute to the struct
      module_version_attribute type in order to fix an alignment issue on m68k
      where the structure is 2-byte aligned while MODULE_VERSION() forced the
      __modver section entries to be 4-byte aligned (sizeof(void *)).
      
      This was essentially an alternative fix to the problem addressed by
      b4bc8428 ("module: deal with alignment issues in built-in module
      versions") which used the array-of-pointer trick to prevent gcc from
      increasing alignment of the version attribute entries. And with the
      pointer indirection in place there's no need to increase the alignment
      of the type.
      
      Link: https://lore.kernel.org/lkml/20201103175711.10731-1-johan@kernel.orgSigned-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      0801a007
  19. 11 11月, 2020 1 次提交
    • A
      bpf: Load and verify kernel module BTFs · 36e68442
      Andrii Nakryiko 提交于
      Add kernel module listener that will load/validate and unload module BTF.
      Module BTFs gets ID generated for them, which makes it possible to iterate
      them with existing BTF iteration API. They are given their respective module's
      names, which will get reported through GET_OBJ_INFO API. They are also marked
      as in-kernel BTFs for tooling to distinguish them from user-provided BTFs.
      
      Also, similarly to vmlinux BTF, kernel module BTFs are exposed through
      sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space
      tools to inspect module BTF contents and dump their types with existing tools:
      
      [vmuser@archvm bpf]$ ls -la /sys/kernel/btf
      total 0
      drwxr-xr-x  2 root root       0 Nov  4 19:46 .
      drwxr-xr-x 13 root root       0 Nov  4 19:46 ..
      
      ...
      
      -r--r--r--  1 root root     888 Nov  4 19:46 irqbypass
      -r--r--r--  1 root root  100225 Nov  4 19:46 kvm
      -r--r--r--  1 root root   35401 Nov  4 19:46 kvm_intel
      -r--r--r--  1 root root     120 Nov  4 19:46 pcspkr
      -r--r--r--  1 root root     399 Nov  4 19:46 serio_raw
      -r--r--r--  1 root root 4094095 Nov  4 19:46 vmlinux
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org
      36e68442
  20. 28 10月, 2020 1 次提交
    • A
      module: use hidden visibility for weak symbol references · 13150bc5
      Ard Biesheuvel 提交于
      Geert reports that commit be288182 ("arm64/build: Assert for
      unwanted sections") results in build errors on arm64 for configurations
      that have CONFIG_MODULES disabled.
      
      The commit in question added ASSERT()s to the arm64 linker script to
      ensure that linker generated sections such as .got.plt etc are empty,
      but as it turns out, there are corner cases where the linker does emit
      content into those sections. More specifically, weak references to
      function symbols (which can remain unsatisfied, and can therefore not
      be emitted as relative references) will be emitted as GOT and PLT
      entries when linking the kernel in PIE mode (which is the case when
      CONFIG_RELOCATABLE is enabled, which is on by default).
      
      What happens is that code such as
      
      	struct device *(*fn)(struct device *dev);
      	struct device *iommu_device;
      
      	fn = symbol_get(mdev_get_iommu_device);
      	if (fn) {
      		iommu_device = fn(dev);
      
      essentially gets converted into the following when CONFIG_MODULES is off:
      
      	struct device *iommu_device;
      
      	if (&mdev_get_iommu_device) {
      		iommu_device = mdev_get_iommu_device(dev);
      
      where mdev_get_iommu_device is emitted as a weak symbol reference into
      the object file. The first reference is decorated with an ordinary
      ABS64 data relocation (which yields 0x0 if the reference remains
      unsatisfied). However, the indirect call is turned into a direct call
      covered by a R_AARCH64_CALL26 relocation, which is converted into a
      call via a PLT entry taking the target address from the associated
      GOT entry.
      
      Given that such GOT and PLT entries are unnecessary for fully linked
      binaries such as the kernel, let's give these weak symbol references
      hidden visibility, so that the linker knows that the weak reference
      via R_AARCH64_CALL26 can simply remain unsatisfied.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NFangrui Song <maskray@google.com>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Link: https://lore.kernel.org/r/20201027151132.14066-1-ardb@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>
      13150bc5
  21. 26 10月, 2020 1 次提交
  22. 01 9月, 2020 1 次提交
  23. 05 8月, 2020 1 次提交
  24. 01 8月, 2020 5 次提交
  25. 19 5月, 2020 1 次提交
  26. 12 5月, 2020 1 次提交