1. 14 9月, 2021 1 次提交
  2. 12 8月, 2021 1 次提交
  3. 19 7月, 2021 1 次提交
    • C
      printk: Userspace format indexing support · 33701557
      Chris Down 提交于
      We have a number of systems industry-wide that have a subset of their
      functionality that works as follows:
      
      1. Receive a message from local kmsg, serial console, or netconsole;
      2. Apply a set of rules to classify the message;
      3. Do something based on this classification (like scheduling a
         remediation for the machine), rinse, and repeat.
      
      As a couple of examples of places we have this implemented just inside
      Facebook, although this isn't a Facebook-specific problem, we have this
      inside our netconsole processing (for alarm classification), and as part
      of our machine health checking. We use these messages to determine
      fairly important metrics around production health, and it's important
      that we get them right.
      
      While for some kinds of issues we have counters, tracepoints, or metrics
      with a stable interface which can reliably indicate the issue, in order
      to react to production issues quickly we need to work with the interface
      which most kernel developers naturally use when developing: printk.
      
      Most production issues come from unexpected phenomena, and as such
      usually the code in question doesn't have easily usable tracepoints or
      other counters available for the specific problem being mitigated. We
      have a number of lines of monitoring defence against problems in
      production (host metrics, process metrics, service metrics, etc), and
      where it's not feasible to reliably monitor at another level, this kind
      of pragmatic netconsole monitoring is essential.
      
      As one would expect, monitoring using printk is rather brittle for a
      number of reasons -- most notably that the message might disappear
      entirely in a new version of the kernel, or that the message may change
      in some way that the regex or other classification methods start to
      silently fail.
      
      One factor that makes this even harder is that, under normal operation,
      many of these messages are never expected to be hit. For example, there
      may be a rare hardware bug which one wants to detect if it was to ever
      happen again, but its recurrence is not likely or anticipated. This
      precludes using something like checking whether the printk in question
      was printed somewhere fleetwide recently to determine whether the
      message in question is still present or not, since we don't anticipate
      that it should be printed anywhere, but still need to monitor for its
      future presence in the long-term.
      
      This class of issue has happened on a number of occasions, causing
      unhealthy machines with hardware issues to remain in production for
      longer than ideal. As a recent example, some monitoring around
      blk_update_request fell out of date and caused semi-broken machines to
      remain in production for longer than would be desirable.
      
      Searching through the codebase to find the message is also extremely
      fragile, because many of the messages are further constructed beyond
      their callsite (eg. btrfs_printk and other module-specific wrappers,
      each with their own functionality). Even if they aren't, guessing the
      format and formulation of the underlying message based on the aesthetics
      of the message emitted is not a recipe for success at scale, and our
      previous issues with fleetwide machine health checking demonstrate as
      much.
      
      This provides a solution to the issue of silently changed or deleted
      printks: we record pointers to all printk format strings known at
      compile time into a new .printk_index section, both in vmlinux and
      modules. At runtime, this can then be iterated by looking at
      <debugfs>/printk/index/<module>, which emits the following format, both
      readable by humans and able to be parsed by machines:
      
          $ head -1 vmlinux; shuf -n 5 vmlinux
          # <level[,flags]> filename:line function "format"
          <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
          <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
          <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
          <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
          <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
      
      This mitigates the majority of cases where we have a highly-specific
      printk which we want to match on, as we can now enumerate and check
      whether the format changed or the printk callsite disappeared entirely
      in userspace. This allows us to catch changes to printks we monitor
      earlier and decide what to do about it before it becomes problematic.
      
      There is no additional runtime cost for printk callers or printk itself,
      and the assembly generated is exactly the same.
      Signed-off-by: NChris Down <chris@chrisdown.name>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
      33701557
  4. 03 6月, 2021 1 次提交
    • N
      vmlinux.lds.h: Avoid orphan section with !SMP · d4c63999
      Nathan Chancellor 提交于
      With x86_64_defconfig and the following configs, there is an orphan
      section warning:
      
      CONFIG_SMP=n
      CONFIG_AMD_MEM_ENCRYPT=y
      CONFIG_HYPERVISOR_GUEST=y
      CONFIG_KVM=y
      CONFIG_PARAVIRT=y
      
      ld: warning: orphan section `.data..decrypted' from `arch/x86/kernel/cpu/vmware.o' being placed in section `.data..decrypted'
      ld: warning: orphan section `.data..decrypted' from `arch/x86/kernel/kvm.o' being placed in section `.data..decrypted'
      
      These sections are created with DEFINE_PER_CPU_DECRYPTED, which
      ultimately turns into __PCPU_ATTRS, which in turn has a section
      attribute with a value of PER_CPU_BASE_SECTION + the section name. When
      CONFIG_SMP is not set, the base section is .data and that is not
      currently handled in any linker script.
      
      Add .data..decrypted to PERCPU_DECRYPTED_SECTION, which is included in
      PERCPU_INPUT -> PERCPU_SECTION, which is include in the x86 linker
      script when either CONFIG_X86_64 or CONFIG_SMP is unset, taking care of
      the warning.
      
      Fixes: ac26963a ("percpu: Introduce DEFINE_PER_CPU_DECRYPTED")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1360Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210506001410.1026691-1-nathan@kernel.org
      d4c63999
  5. 09 4月, 2021 1 次提交
    • S
      add support for Clang CFI · cf68fffb
      Sami Tolvanen 提交于
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
  6. 26 2月, 2021 1 次提交
  7. 23 2月, 2021 1 次提交
    • A
      vmlinux.lds.h: catch even more instrumentation symbols into .data · 49387f62
      Alexander Lobakin 提交于
      LKP caught another bunch of orphaned instrumentation symbols [0]:
      
      mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
      `init/main.o' being placed in section `.data.$LPBX1'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
      `init/main.o' being placed in section `.data.$LPBX0'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
      `init/do_mounts.o' being placed in section `.data.$LPBX1'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
      `init/do_mounts.o' being placed in section `.data.$LPBX0'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
      `init/do_mounts_initrd.o' being placed in section `.data.$LPBX1'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
      `init/do_mounts_initrd.o' being placed in section `.data.$LPBX0'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
      `init/initramfs.o' being placed in section `.data.$LPBX1'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
      `init/initramfs.o' being placed in section `.data.$LPBX0'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
      `init/calibrate.o' being placed in section `.data.$LPBX1'
      mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
      `init/calibrate.o' being placed in section `.data.$LPBX0'
      
      [...]
      
      Soften the wildcard to .data.$L* to grab these ones into .data too.
      
      [0] https://lore.kernel.org/lkml/202102231519.lWPLPveV-lkp@intel.comReported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      49387f62
  8. 17 2月, 2021 2 次提交
  9. 16 2月, 2021 1 次提交
  10. 10 2月, 2021 1 次提交
  11. 08 2月, 2021 2 次提交
  12. 15 1月, 2021 3 次提交
  13. 23 12月, 2020 1 次提交
    • D
      powercap/drivers/dtpm: Add API for dynamic thermal power management · a20d0ef9
      Daniel Lezcano 提交于
      On the embedded world, the complexity of the SoC leads to an
      increasing number of hotspots which need to be monitored and mitigated
      as a whole in order to prevent the temperature to go above the
      normative and legally stated 'skin temperature'.
      
      Another aspect is to sustain the performance for a given power budget,
      for example virtual reality where the user can feel dizziness if the
      GPU performance is capped while a big CPU is processing something
      else. Or reduce the battery charging because the dissipated power is
      too high compared with the power consumed by other devices.
      
      The userspace is the most adequate place to dynamically act on the
      different devices by limiting their power given an application
      profile: it has the knowledge of the platform.
      
      These userspace daemons are in charge of the Dynamic Thermal Power
      Management (DTPM).
      
      Nowadays, the dtpm daemons are abusing the thermal framework as they
      act on the cooling device state to force a specific and arbitrary
      state without taking care of the governor decisions. Given the closed
      loop of some governors that can confuse the logic or directly enter in
      a decision conflict.
      
      As the number of cooling device support is limited today to the CPU
      and the GPU, the dtpm daemons have little control on the power
      dissipation of the system. The out of tree solutions are hacking
      around here and there in the drivers, in the frameworks to have
      control on the devices. The common solution is to declare them as
      cooling devices.
      
      There is no unification of the power limitation unit, opaque states
      are used.
      
      This patch provides a way to create a hierarchy of constraints using
      the powercap framework. The devices which are registered as power
      limit-able devices are represented in this hierarchy as a tree. They
      are linked together with intermediate nodes which are just there to
      propagate the constraint to the children.
      
      The leaves of the tree are the real devices, the intermediate nodes
      are virtual, aggregating the children constraints and power
      characteristics.
      
      Each node have a weight on a 2^10 basis, in order to reflect the
      percentage of power distribution of the children's node. This
      percentage is used to dispatch the power limit to the children.
      
      The weight is computed against the max power of the siblings.
      
      This simple approach allows to do a fair distribution of the power
      limit.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NLukasz Luba <lukasz.luba@arm.com>
      Tested-by: NLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a20d0ef9
  14. 28 10月, 2020 1 次提交
  15. 10 10月, 2020 1 次提交
  16. 22 9月, 2020 1 次提交
  17. 01 9月, 2020 8 次提交
  18. 15 8月, 2020 1 次提交
  19. 24 7月, 2020 1 次提交
  20. 22 7月, 2020 1 次提交
    • J
      x86, vmlinux.lds: Page-align end of ..page_aligned sections · de2b41be
      Joerg Roedel 提交于
      On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
      page-aligned, but the end of the .bss..page_aligned section is not
      guaranteed to be page-aligned.
      
      As a result, objects from other .bss sections may end up on the same 4k
      page as the idt_table, and will accidentially get mapped read-only during
      boot, causing unexpected page-faults when the kernel writes to them.
      
      This could be worked around by making the objects in the page aligned
      sections page sized, but that's wrong.
      
      Explicit sections which store only page aligned objects have an implicit
      guarantee that the object is alone in the page in which it is placed. That
      works for all objects except the last one. That's inconsistent.
      
      Enforcing page sized objects for these sections would wreckage memory
      sanitizers, because the object becomes artificially larger than it should
      be and out of bound access becomes legit.
      
      Align the end of the .bss..page_aligned and .data..page_aligned section on
      page-size so all objects places in these sections are guaranteed to have
      their own page.
      
      [ tglx: Amended changelog ]
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
      de2b41be
  21. 14 7月, 2020 1 次提交
  22. 08 7月, 2020 1 次提交
  23. 25 6月, 2020 2 次提交
  24. 19 5月, 2020 1 次提交
  25. 27 3月, 2020 1 次提交
  26. 19 3月, 2020 1 次提交
  27. 19 11月, 2019 1 次提交
    • S
      ftrace: Rename ftrace_graph_stub to ftrace_stub_graph · 46f94692
      Steven Rostedt (VMware) 提交于
      The ftrace_graph_stub was created and points to ftrace_stub as a way to
      assign the functon graph tracer function pointer to a stub function with a
      different prototype than what ftrace_stub has and not trigger the C
      verifier. The ftrace_graph_stub was created via the linker script
      vmlinux.lds.h. Unfortunately, powerpc already uses the name
      ftrace_graph_stub for its internal implementation of the function graph
      tracer, and even though powerpc would still build, the change via the linker
      script broke function tracer on powerpc from working.
      
      By using the name ftrace_stub_graph, which does not exist anywhere else in
      the kernel, this should not be a problem.
      
      Link: https://lore.kernel.org/r/1573849732.5937.136.camel@lca.pw
      
      Fixes: b83b43ff ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub")
      Reorted-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      46f94692
  28. 15 11月, 2019 1 次提交