1. 19 5月, 2022 2 次提交
    • J
      random: remove mostly unused async readiness notifier · 6701de6c
      Jason A. Donenfeld 提交于
      The register_random_ready_notifier() notifier is somewhat complicated,
      and was already recently rewritten to use notifier blocks. It is only
      used now by one consumer in the kernel, vsprintf.c, for which the async
      mechanism is really overly complex for what it actually needs. This
      commit removes register_random_ready_notifier() and unregister_random_
      ready_notifier(), because it just adds complication with little utility,
      and changes vsprintf.c to just check on `!rng_is_initialized() &&
      !rng_has_arch_random()`, which will eventually be true. Performance-
      wise, that code was already using a static branch, so there's basically
      no overhead at all to this change.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      6701de6c
    • J
      random: remove get_random_bytes_arch() and add rng_has_arch_random() · 248561ad
      Jason A. Donenfeld 提交于
      The RNG incorporates RDRAND into its state at boot and every time it
      reseeds, so there's no reason for callers to use it directly. The
      hashing that the RNG does on it is preferable to using the bytes raw.
      
      The only current use case of get_random_bytes_arch() is vsprintf's
      siphash key for pointer hashing, which uses it to initialize the pointer
      secret earlier than usual if RDRAND is available. In order to replace
      this narrow use case, just expose whether RDRAND is mixed into the RNG,
      with a new function called rng_has_arch_random(). With that taken care
      of, there are no users of get_random_bytes_arch() left, so it can be
      removed.
      
      Later, if trust_cpu gets turned on by default (as most distros are
      doing), this one use of rng_has_arch_random() can probably go away as
      well.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      248561ad
  2. 25 3月, 2022 1 次提交
    • W
      lib/vsprintf: avoid redundant work with 0 size · ef62c8ff
      Waiman Long 提交于
      Patch series "mm/page_owner: Extend page_owner to show memcg information", v4.
      
      While debugging the constant increase in percpu memory consumption on a
      system that spawned large number of containers, it was found that a lot
      of offline mem_cgroup structures remained in place without being freed.
      Further investigation indicated that those mem_cgroup structures were
      pinned by some pages.
      
      In order to find out what those pages are, the existing page_owner
      debugging tool is extended to show memory cgroup information and whether
      those memcgs are offline or not.  With the enhanced page_owner tool, the
      following is a typical page that pinned the mem_cgroup structure in my
      test case:
      
        Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns
        PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
          prep_new_page+0xac/0xe0
          get_page_from_freelist+0x1327/0x14d0
          __alloc_pages+0x191/0x340
          alloc_pages_vma+0x84/0x250
          shmem_alloc_page+0x3f/0x90
          shmem_alloc_and_acct_page+0x76/0x1c0
          shmem_getpage_gfp+0x281/0x940
          shmem_write_begin+0x36/0xe0
          generic_perform_write+0xed/0x1d0
          __generic_file_write_iter+0xdc/0x1b0
          generic_file_write_iter+0x5d/0xb0
          new_sync_write+0x11f/0x1b0
          vfs_write+0x1ba/0x2a0
          ksys_write+0x59/0xd0
          do_syscall_64+0x37/0x80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
        Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8.
      
      So the page was not freed because it was part of a shmem segment.  That
      is useful information that can help users to diagnose similar problems.
      
      With cgroup v1, /proc/cgroups can be read to find out the total number
      of memory cgroups (online + offline).  With cgroup v2, the cgroup.stat
      of the root cgroup can be read to find the number of dying cgroups (most
      likely pinned by dying memcgs).
      
      The page_owner feature is not supposed to be enabled for production
      system due to its memory overhead.  However, if it is suspected that
      dying memcgs are increasing over time, a test environment with
      page_owner enabled can then be set up with appropriate workload for
      further analysis on what may be causing the increasing number of dying
      memcgs.
      
      This patch (of 4):
      
      For *scnprintf(), vsnprintf() is always called even if the input size is
      0.  That is a waste of time, so just return 0 in this case.
      
      Note that vsnprintf() will never return -1 to indicate an error.  So
      skipping the call to vsnprintf() when size is 0 will have no functional
      impact at all.
      
      Link: https://lkml.kernel.org/r/20220202203036.744010-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20220202203036.744010-2-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NSergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef62c8ff
  3. 13 3月, 2022 1 次提交
    • J
      random: replace custom notifier chain with standard one · 5acd3548
      Jason A. Donenfeld 提交于
      We previously rolled our own randomness readiness notifier, which only
      has two users in the whole kernel. Replace this with a more standard
      atomic notifier block that serves the same purpose with less code. Also
      unexport the symbols, because no modules use it, only unconditional
      builtins. The only drawback is that it's possible for a notification
      handler returning the "stop" code to prevent further processing, but
      given that there are only two users, and that we're unexporting this
      anyway, that doesn't seem like a significant drawback for the
      simplification we receive here.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5acd3548
  4. 24 2月, 2022 1 次提交
  5. 10 2月, 2022 2 次提交
  6. 16 1月, 2022 1 次提交
  7. 06 12月, 2021 1 次提交
  8. 10 11月, 2021 1 次提交
  9. 27 10月, 2021 2 次提交
  10. 09 9月, 2021 1 次提交
  11. 19 8月, 2021 1 次提交
  12. 09 7月, 2021 2 次提交
    • S
      module: add printk formats to add module build ID to stacktraces · 9294523e
      Stephen Boyd 提交于
      Let's make kernel stacktraces easier to identify by including the build
      ID[1] of a module if the stacktrace is printing a symbol from a module.
      This makes it simpler for developers to locate a kernel module's full
      debuginfo for a particular stacktrace.  Combined with
      scripts/decode_stracktrace.sh, a developer can download the matching
      debuginfo from a debuginfod[2] server and find the exact file and line
      number for the functions plus offsets in a stacktrace that match the
      module.  This is especially useful for pstore crash debugging where the
      kernel crashes are recorded in something like console-ramoops and the
      recovery kernel/modules are different or the debuginfo doesn't exist on
      the device due to space concerns (the debuginfo can be too large for space
      limited devices).
      
      Originally, I put this on the %pS format, but that was quickly rejected
      given that %pS is used in other places such as ftrace where build IDs
      aren't meaningful.  There was some discussions on the list to put every
      module build ID into the "Modules linked in:" section of the stacktrace
      message but that quickly becomes very hard to read once you have more than
      three or four modules linked in.  It also provides too much information
      when we don't expect each module to be traversed in a stacktrace.  Having
      the build ID for modules that aren't important just makes things messy.
      Splitting it to multiple lines for each module quickly explodes the number
      of lines printed in an oops too, possibly wrapping the warning off the
      console.  And finally, trying to stash away each module used in a
      callstack to provide the ID of each symbol printed is cumbersome and would
      require changes to each architecture to stash away modules and return
      their build IDs once unwinding has completed.
      
      Instead, we opt for the simpler approach of introducing new printk formats
      '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
      for "pointer backtrace with module build ID" and then updating the few
      places in the architecture layer where the stacktrace is printed to use
      this new format.
      
      Before:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm]
        direct_entry+0x16c/0x1b4 [lkdtm]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      After:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
      [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
        Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
      [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
      [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
        Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com
      
      Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
      Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
      Link: https://sourceware.org/elfutils/Debuginfod.html [2]
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Signed-off-by: NBixuan Cui <cuibixuan@huawei.com>
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Evan Green <evgreen@chromium.org>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9294523e
    • Z
      lib: fix spelling mistakes · 9dbbc3b9
      Zhen Lei 提交于
      Fix some spelling mistakes in comments:
      permanentely ==> permanently
      wont ==> won't
      remaning ==> remaining
      succed ==> succeed
      shouldnt ==> shouldn't
      alpha-numeric ==> alphanumeric
      storeing ==> storing
      funtion ==> function
      documenation ==> documentation
      Determin ==> Determine
      intepreted ==> interpreted
      ammount ==> amount
      obious ==> obvious
      interupts ==> interrupts
      occured ==> occurred
      asssociated ==> associated
      taking into acount ==> taking into account
      squence ==> sequence
      stil ==> still
      contiguos ==> contiguous
      matchs ==> matches
      
      Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.comSigned-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9dbbc3b9
  13. 02 7月, 2021 1 次提交
  14. 30 6月, 2021 1 次提交
    • S
      slub: force on no_hash_pointers when slub_debug is enabled · 79270291
      Stephen Boyd 提交于
      Obscuring the pointers that slub shows when debugging makes for some
      confusing slub debug messages:
      
       Padding overwritten. 0x0000000079f0674a-0x000000000d4dce17
      
      Those addresses are hashed for kernel security reasons.  If we're trying
      to be secure with slub_debug on the commandline we have some big problems
      given that we dump whole chunks of kernel memory to the kernel logs.
      Let's force on the no_hash_pointers commandline flag when slub_debug is on
      the commandline.  This makes slub debug messages more meaningful and if by
      chance a kernel address is in some slub debug object dump we will have a
      better chance of figuring out what went wrong.
      
      Note that we don't use %px in the slub code because we want to reduce the
      number of places that %px is used in the kernel.  This also nicely prints
      a big fat warning at kernel boot if slub_debug is on the commandline so
      that we know that this kernel shouldn't be used on production systems.
      
      [akpm@linux-foundation.org: fix build with CONFIG_SLUB_DEBUG=n]
      
      Link: https://lkml.kernel.org/r/20210601182202.3011020-5-swboyd@chromium.orgSigned-off-by: NStephen Boyd <swboyd@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NPetr Mladek <pmladek@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79270291
  15. 19 5月, 2021 2 次提交
  16. 17 5月, 2021 1 次提交
  17. 23 4月, 2021 1 次提交
  18. 07 4月, 2021 1 次提交
  19. 19 3月, 2021 2 次提交
  20. 17 2月, 2021 1 次提交
  21. 15 2月, 2021 1 次提交
  22. 19 11月, 2020 2 次提交
  23. 25 9月, 2020 1 次提交
  24. 25 8月, 2020 1 次提交
    • G
      lib: Revert use of fallthrough pseudo-keyword in lib/ · 6a9dc5fd
      Gustavo A. R. Silva 提交于
      The following build error for powerpc64 was reported by Nathan Chancellor:
      
        "$ scripts/config --file arch/powerpc/configs/powernv_defconfig -e KERNEL_XZ
      
         $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- distclean powernv_defconfig zImage
         ...
         In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:234,
                          from arch/powerpc/boot/decompress.c:38:
         arch/powerpc/boot/../../../lib/xz/xz_dec_stream.c: In function 'dec_main':
         arch/powerpc/boot/../../../lib/xz/xz_dec_stream.c:586:4: error: 'fallthrough' undeclared (first use in this function)
           586 |    fallthrough;
               |    ^~~~~~~~~~~
      
         This will end up affecting distribution configurations such as Debian
         and OpenSUSE according to my testing. I am not sure what the solution
         is, the PowerPC wrapper does not set -D__KERNEL__ so I am not sure
         that compiler_attributes.h can be safely included."
      
      In order to avoid these sort of problems, it seems that the best
      solution is to use /* fall through */ comments instead of the
      fallthrough pseudo-keyword macro in lib/, for now.
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Fixes: df561f66 ("treewide: Use fallthrough pseudo-keyword")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Reviewed-and-tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a9dc5fd
  25. 24 8月, 2020 1 次提交
  26. 01 8月, 2020 3 次提交
  27. 03 7月, 2020 1 次提交
  28. 20 5月, 2020 2 次提交
  29. 15 5月, 2020 1 次提交
    • D
      bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier · b2a5212f
      Daniel Borkmann 提交于
      Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the
      very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on
      archs with overlapping address ranges.
      
      While the helpers have been addressed through work in 6ae08ae3 ("bpf: Add
      probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need
      an option for bpf_trace_printk() as well to fix it.
      
      Similarly as with the helpers, force users to make an explicit choice by adding
      %pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding
      strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS.
      The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended
      for other objects aside strings that are probed and printed under tracing, and
      reused out of other facilities like bpf_seq_printf() or BTF based type printing.
      
      Existing behavior of %s for current users is still kept working for archs where it
      is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
      For archs not having this property we fall-back to pick probing under KERNEL_DS as
      a sensible default.
      
      Fixes: 8d3b7dce ("bpf: add support for %s specifier to bpf_trace_printk()")
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
      b2a5212f
  30. 28 2月, 2020 1 次提交