1. 26 7月, 2021 2 次提交
  2. 01 7月, 2021 4 次提交
  3. 30 6月, 2021 1 次提交
  4. 22 6月, 2021 3 次提交
    • P
      clocksource: Provide kernel module to test clocksource watchdog · 1253b9b8
      Paul E. McKenney 提交于
      When the clocksource watchdog marks a clock as unstable, this might
      be due to that clock being unstable or it might be due to delays that
      happen to occur between the reads of the two clocks.  It would be good
      to have a way of testing the clocksource watchdog's ability to
      distinguish between these two causes of clock skew and instability.
      
      Therefore, provide a new clocksource-wdtest module selected by a new
      TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
      parameter named "holdoff" that provides the number of seconds of delay
      before testing should start, which defaults to zero when built as a module
      and to 10 seconds when built directly into the kernel.  Very large systems
      that boot slowly may need to increase the value of this module parameter.
      
      This module uses hand-crafted clocksource structures to do its testing,
      thus avoiding messing up timing for the rest of the kernel and for user
      applications.  This module first verifies that the ->uncertainty_margin
      field of the clocksource structures are set sanely.  It then tests the
      delay-detection capability of the clocksource watchdog, increasing the
      number of consecutive delays injected, first provoking console messages
      complaining about the delays and finally forcing a clock-skew event.
      Unexpected test results cause at least one WARN_ON_ONCE() console splat.
      If there are no splats, the test has passed.  Finally, it fuzzes the
      value returned from a clocksource to test the clocksource watchdog's
      ability to detect time skew.
      
      This module checks the state of its clocksource after each test, and
      uses WARN_ON_ONCE() to emit a console splat if there are any failures.
      This should enable all types of test frameworks to detect any such
      failures.
      
      This facility is intended for diagnostic use only, and should be avoided
      on production systems.
      Reported-by: NChris Mason <clm@fb.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NFeng Tang <feng.tang@intel.com>
      Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
      1253b9b8
    • P
      clocksource: Limit number of CPUs checked for clock synchronization · fa218f1c
      Paul E. McKenney 提交于
      Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
      that clock is checked on all CPUs.  This is thorough, but might not be
      what you want on a system with a few tens of CPUs, let alone a few hundred
      of them.
      
      Therefore, by default check only up to eight randomly chosen CPUs.  Also
      provide a new clocksource.verify_n_cpus kernel boot parameter.  A value of
      -1 says to check all of the CPUs, and a non-negative value says to randomly
      select that number of CPUs, without concern about selecting the same CPU
      multiple times.  However, make use of a cpumask so that a given CPU will be
      checked at most once.
      
      Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFeng Tang <feng.tang@intel.com>
      Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org
      fa218f1c
    • P
      clocksource: Retry clock read if long delays detected · db3a34e1
      Paul E. McKenney 提交于
      When the clocksource watchdog marks a clock as unstable, this might be due
      to that clock being unstable or it might be due to delays that happen to
      occur between the reads of the two clocks.  Yes, interrupts are disabled
      across those two reads, but there are no shortage of things that can delay
      interrupts-disabled regions of code ranging from SMI handlers to vCPU
      preemption.  It would be good to have some indication as to why the clock
      was marked unstable.
      
      Therefore, re-read the watchdog clock on either side of the read from the
      clock under test.  If the watchdog clock shows an excessive time delta
      between its pair of reads, the reads are retried.
      
      The maximum number of retries is specified by a new kernel boot parameter
      clocksource.max_cswd_read_retries, which defaults to three, that is, up to
      four reads, one initial and up to three retries.  If more than one retry
      was required, a message is printed on the console (the occasional single
      retry is expected behavior, especially in guest OSes).  If the maximum
      number of retries is exceeded, the clock under test will be marked
      unstable.  However, the probability of this happening due to various sorts
      of delays is quite small.  In addition, the reason (clock-read delays) for
      the unstable marking will be apparent.
      Reported-by: NChris Mason <clm@fb.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFeng Tang <feng.tang@intel.com>
      Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
      db3a34e1
  5. 17 6月, 2021 3 次提交
  6. 09 6月, 2021 1 次提交
  7. 07 6月, 2021 1 次提交
  8. 04 6月, 2021 1 次提交
  9. 28 5月, 2021 1 次提交
  10. 20 5月, 2021 1 次提交
  11. 18 5月, 2021 1 次提交
  12. 12 5月, 2021 1 次提交
  13. 11 5月, 2021 1 次提交
  14. 07 5月, 2021 1 次提交
    • R
      init/initramfs.c: do unpacking asynchronously · e7cb072e
      Rasmus Villemoes 提交于
      Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3.
      
      These two patches are independent, but better-together.
      
      The second is a rather trivial patch that simply allows the developer to
      change "/sbin/modprobe" to something else - e.g.  the empty string, so
      that all request_module() during early boot return -ENOENT early, without
      even spawning a usermode helper, needlessly synchronizing with the
      initramfs unpacking.
      
      The first patch delegates decompressing the initramfs to a worker thread,
      allowing do_initcalls() in main.c to proceed to the device_ and late_
      initcalls without waiting for that decompression (and populating of
      rootfs) to finish.  Obviously, some of those later calls may rely on the
      initramfs being available, so I've added synchronization points in the
      firmware loader and usermodehelper paths - there might be other places
      that would need this, but so far no one has been able to think of any
      places I have missed.
      
      There's not much to win if most of the functionality needed during boot is
      only available as modules.  But systems with a custom-made .config and
      initramfs can boot faster, partly due to utilizing more than one cpu
      earlier, partly by avoiding known-futile modprobe calls (which would still
      trigger synchronization with the initramfs unpacking, thus eliminating
      most of the first benefit).
      
      This patch (of 2):
      
      Most of the boot process doesn't actually need anything from the
      initramfs, until of course PID1 is to be executed.  So instead of doing
      the decompressing and populating of the initramfs synchronously in
      populate_rootfs() itself, push that off to a worker thread.
      
      This is primarily motivated by an embedded ppc target, where unpacking
      even the rather modest sized initramfs takes 0.6 seconds, which is long
      enough that the external watchdog becomes unhappy that it doesn't get
      attention soon enough.  By doing the initramfs decompression in a worker
      thread, we get to do the device_initcalls and hence start petting the
      watchdog much sooner.
      
      Normal desktops might benefit as well.  On my mostly stock Ubuntu kernel,
      my initramfs is a 26M xz-compressed blob, decompressing to around 126M.
      That takes almost two seconds:
      
      [    0.201454] Trying to unpack rootfs image as initramfs...
      [    1.976633] Freeing initrd memory: 29416K
      
      Before this patch, these lines occur consecutively in dmesg.  With this
      patch, the timestamps on these two lines is roughly the same as above, but
      with 172 lines inbetween - so more than one cpu has been kept busy doing
      work that would otherwise only happen after the populate_rootfs()
      finished.
      
      Should one of the initcalls done after rootfs_initcall time (i.e., device_
      and late_ initcalls) need something from the initramfs (say, a kernel
      module or a firmware blob), it will simply wait for the initramfs
      unpacking to be done before proceeding, which should in theory make this
      completely safe.
      
      But if some driver pokes around in the filesystem directly and not via one
      of the official kernel interfaces (i.e.  request_firmware*(),
      call_usermodehelper*) that theory may not hold - also, I certainly might
      have missed a spot when sprinkling wait_for_initramfs().  So there is an
      escape hatch in the form of an initramfs_async= command line parameter.
      
      Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk
      Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7cb072e
  15. 06 5月, 2021 1 次提交
  16. 05 5月, 2021 2 次提交
  17. 04 5月, 2021 1 次提交
  18. 01 5月, 2021 1 次提交
  19. 17 4月, 2021 1 次提交
  20. 14 4月, 2021 1 次提交
    • S
      KEYS: trusted: Add generic trusted keys framework · 5d0682be
      Sumit Garg 提交于
      Current trusted keys framework is tightly coupled to use TPM device as
      an underlying implementation which makes it difficult for implementations
      like Trusted Execution Environment (TEE) etc. to provide trusted keys
      support in case platform doesn't posses a TPM device.
      
      Add a generic trusted keys framework where underlying implementations
      can be easily plugged in. Create struct trusted_key_ops to achieve this,
      which contains necessary functions of a backend.
      
      Also, define a module parameter in order to select a particular trust
      source in case a platform support multiple trust sources. In case its
      not specified then implementation itetrates through trust sources list
      starting with TPM and assign the first trust source as a backend which
      has initiazed successfully during iteration.
      
      Note that current implementation only supports a single trust source at
      runtime which is either selectable at compile time or during boot via
      aforementioned module parameter.
      Suggested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: NSumit Garg <sumit.garg@linaro.org>
      Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>
      5d0682be
  21. 09 4月, 2021 1 次提交
  22. 08 4月, 2021 1 次提交
    • K
      stack: Optionally randomize kernel stack offset each syscall · 39218ff4
      Kees Cook 提交于
      This provides the ability for architectures to enable kernel stack base
      address offset randomization. This feature is controlled by the boot
      param "randomize_kstack_offset=on/off", with its default value set by
      CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
      
      This feature is based on the original idea from the last public release
      of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
      All the credit for the original idea goes to the PaX team. Note that
      the design and implementation of this upstream randomize_kstack_offset
      feature differs greatly from the RANDKSTACK feature (see below).
      
      Reasoning for the feature:
      
      This feature aims to make harder the various stack-based attacks that
      rely on deterministic stack structure. We have had many such attacks in
      past (just to name few):
      
      https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
      https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
      https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
      
      As Linux kernel stack protections have been constantly improving
      (vmap-based stack allocation with guard pages, removal of thread_info,
      STACKLEAK), attackers have had to find new ways for their exploits
      to work. They have done so, continuing to rely on the kernel's stack
      determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
      were not relevant. For example, the following recent attacks would have
      been hampered if the stack offset was non-deterministic between syscalls:
      
      https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
      (page 70: targeting the pt_regs copy with linear stack overflow)
      
      https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
      (leaked stack address from one syscall as a target during next syscall)
      
      The main idea is that since the stack offset is randomized on each system
      call, it is harder for an attack to reliably land in any particular place
      on the thread stack, even with address exposures, as the stack base will
      change on the next syscall. Also, since randomization is performed after
      placing pt_regs, the ptrace-based approach[1] to discover the randomized
      offset during a long-running syscall should not be possible.
      
      Design description:
      
      During most of the kernel's execution, it runs on the "thread stack",
      which is pretty deterministic in its structure: it is fixed in size,
      and on every entry from userspace to kernel on a syscall the thread
      stack starts construction from an address fetched from the per-cpu
      cpu_current_top_of_stack variable. The first element to be pushed to the
      thread stack is the pt_regs struct that stores all required CPU registers
      and syscall parameters. Finally the specific syscall function is called,
      with the stack being used as the kernel executes the resulting request.
      
      The goal of randomize_kstack_offset feature is to add a random offset
      after the pt_regs has been pushed to the stack and before the rest of the
      thread stack is used during the syscall processing, and to change it every
      time a process issues a syscall. The source of randomness is currently
      architecture-defined (but x86 is using the low byte of rdtsc()). Future
      improvements for different entropy sources is possible, but out of scope
      for this patch. Further more, to add more unpredictability, new offsets
      are chosen at the end of syscalls (the timing of which should be less
      easy to measure from userspace than at syscall entry time), and stored
      in a per-CPU variable, so that the life of the value does not stay
      explicitly tied to a single task.
      
      As suggested by Andy Lutomirski, the offset is added using alloca()
      and an empty asm() statement with an output constraint, since it avoids
      changes to assembly syscall entry code, to the unwinder, and provides
      correct stack alignment as defined by the compiler.
      
      In order to make this available by default with zero performance impact
      for those that don't want it, it is boot-time selectable with static
      branches. This way, if the overhead is not wanted, it can just be
      left turned off with no performance impact.
      
      The generated assembly for x86_64 with GCC looks like this:
      
      ...
      ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
      					    # 12380 <kstack_offset>
      ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
      ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
      ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
      ffffffff8100398c: 48 29 c4              sub %rax,%rsp
      ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
      ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
      ...
      
      As a result of the above stack alignment, this patch introduces about
      5 bits of randomness after pt_regs is spilled to the thread stack on
      x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
      stack alignment). The amount of entropy could be adjusted based on how
      much of the stack space we wish to trade for security.
      
      My measure of syscall performance overhead (on x86_64):
      
      lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
          randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
          randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds
      
      So, roughly 0.9% overhead growth for a no-op syscall, which is very
      manageable. And for people that don't want this, it's off by default.
      
      There are two gotchas with using the alloca() trick. First,
      compilers that have Stack Clash protection (-fstack-clash-protection)
      enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
      any dynamic stack allocations. While the randomization offset is
      always less than a page, the resulting assembly would still contain
      (unreachable!) probing routines, bloating the resulting assembly. To
      avoid this, -fno-stack-clash-protection is unconditionally added to
      the kernel Makefile since this is the only dynamic stack allocation in
      the kernel (now that VLAs have been removed) and it is provably safe
      from Stack Clash style attacks.
      
      The second gotcha with alloca() is a negative interaction with
      -fstack-protector*, in that it sees the alloca() as an array allocation,
      which triggers the unconditional addition of the stack canary function
      pre/post-amble which slows down syscalls regardless of the static
      branch. In order to avoid adding this unneeded check and its associated
      performance impact, architectures need to carefully remove uses of
      -fstack-protector-strong (or -fstack-protector) in the compilation units
      that use the add_random_kstack() macro and to audit the resulting stack
      mitigation coverage (to make sure no desired coverage disappears). No
      change is visible for this on x86 because the stack protector is already
      unconditionally disabled for the compilation unit, but the change is
      required on arm64. There is, unfortunately, no attribute that can be
      used to disable stack protector for specific functions.
      
      Comparison to PaX RANDKSTACK feature:
      
      The RANDKSTACK feature randomizes the location of the stack start
      (cpu_current_top_of_stack), i.e. including the location of pt_regs
      structure itself on the stack. Initially this patch followed the same
      approach, but during the recent discussions[2], it has been determined
      to be of a little value since, if ptrace functionality is available for
      an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
      different offsets in the pt_regs struct, observe the cache behavior of
      the pt_regs accesses, and figure out the random stack offset. Another
      difference is that the random offset is stored in a per-cpu variable,
      rather than having it be per-thread. As a result, these implementations
      differ a fair bit in their implementation details and results, though
      obviously the intent is similar.
      
      [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
      [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
      [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.htmlCo-developed-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
      39218ff4
  23. 06 4月, 2021 3 次提交
    • M
      pata_legacy: Add `probe_mask' parameter like with ide-generic · 7d33004d
      Maciej W. Rozycki 提交于
      Carry the `probe_mask' parameter over from ide-generic to pata_legacy so
      that there is a way to prevent random poking at ISA port I/O locations
      in attempt to discover adapter option cards with libata like with the
      old IDE driver.  By default all enabled locations are tried, however it
      may interfere with a different kind of hardware responding there.
      
      For example with a plain (E)ISA system the driver tries all the six
      possible locations:
      
      scsi host0: pata_legacy
      ata1: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
      ata1.00: ATA-4: ST310211A, 3.54, max UDMA/100
      ata1.00: 19541088 sectors, multi 16: LBA
      ata1.00: configured for PIO
      scsi 0:0:0:0: Direct-Access     ATA      ST310211A        3.54 PQ: 0 ANSI: 5
      scsi 0:0:0:0: Attached scsi generic sg0 type 0
      sd 0:0:0:0: [sda] 19541088 512-byte logical blocks: (10.0 GB/9.32 GiB)
      sd 0:0:0:0: [sda] Write Protect is off
      sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
      sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
       sda: sda1 sda2 sda3
      sd 0:0:0:0: [sda] Attached SCSI disk
      scsi host1: pata_legacy
      ata2: PATA max PIO4 cmd 0x170 ctl 0x376 irq 15
      scsi host1: pata_legacy
      ata3: PATA max PIO4 cmd 0x1e8 ctl 0x3ee irq 11
      scsi host1: pata_legacy
      ata4: PATA max PIO4 cmd 0x168 ctl 0x36e irq 10
      scsi host1: pata_legacy
      ata5: PATA max PIO4 cmd 0x1e0 ctl 0x3e6 irq 8
      scsi host1: pata_legacy
      ata6: PATA max PIO4 cmd 0x160 ctl 0x366 irq 12
      
      however giving the kernel "pata_legacy.probe_mask=21" makes it try every
      other location only:
      
      scsi host0: pata_legacy
      ata1: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
      ata1.00: ATA-4: ST310211A, 3.54, max UDMA/100
      ata1.00: 19541088 sectors, multi 16: LBA
      ata1.00: configured for PIO
      scsi 0:0:0:0: Direct-Access     ATA      ST310211A        3.54 PQ: 0 ANSI: 5
      scsi 0:0:0:0: Attached scsi generic sg0 type 0
      sd 0:0:0:0: [sda] 19541088 512-byte logical blocks: (10.0 GB/9.32 GiB)
      sd 0:0:0:0: [sda] Write Protect is off
      sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
      sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
       sda: sda1 sda2 sda3
      sd 0:0:0:0: [sda] Attached SCSI disk
      scsi host1: pata_legacy
      ata2: PATA max PIO4 cmd 0x1e8 ctl 0x3ee irq 11
      scsi host1: pata_legacy
      ata3: PATA max PIO4 cmd 0x1e0 ctl 0x3e6 irq 8
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103211800110.21463@angie.orcam.me.ukSigned-off-by: NJens Axboe <axboe@kernel.dk>
      7d33004d
    • M
      pata_platform: Document `pio_mask' module parameter · 6ddcec95
      Maciej W. Rozycki 提交于
      Add MODULE_PARM_DESC documentation and a kernel-parameters.txt entry.
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103212023190.21463@angie.orcam.me.ukSigned-off-by: NJens Axboe <axboe@kernel.dk>
      6ddcec95
    • M
      pata_legacy: Properly document module parameters · 426e2c6a
      Maciej W. Rozycki 提交于
      Most pata_legacy module parameters lack MODULE_PARM_DESC documentation
      and none is described in kernel-parameters.txt.  Also several comments
      are inaccurate or wrong.
      
      Add the missing documentation pieces then and reorder parameters into a
      consistent block.  Remove inaccuracies as follows:
      
      - `all' affects primary and secondary port ranges only rather than all,
      
      - `probe_all' affects tertiary and further port ranges rather than all,
      
      - `ht6560b' is for HT 6560B rather than HT 6560A.
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103211909560.21463@angie.orcam.me.ukSigned-off-by: NJens Axboe <axboe@kernel.dk>
      426e2c6a
  24. 29 3月, 2021 1 次提交
  25. 18 3月, 2021 1 次提交
  26. 10 3月, 2021 1 次提交
  27. 09 3月, 2021 3 次提交
    • B
      Documentation/admin-guide: kernel-parameters: correct the architectures for numa_balancing · 00b072c0
      Barry Song 提交于
      X86 isn't the only architecture supporting NUMA_BALANCING. ARM64, PPC,
      S390 and RISCV also support it:
      
      arch$ git grep NUMA_BALANCING
      arm64/Kconfig:  select ARCH_SUPPORTS_NUMA_BALANCING
      arm64/configs/defconfig:CONFIG_NUMA_BALANCING=y
      arm64/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      powerpc/configs/powernv_defconfig:CONFIG_NUMA_BALANCING=y
      powerpc/configs/ppc64_defconfig:CONFIG_NUMA_BALANCING=y
      powerpc/configs/pseries_defconfig:CONFIG_NUMA_BALANCING=y
      powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      powerpc/include/asm/book3s/64/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */
      powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      powerpc/include/asm/book3s/64/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */
      powerpc/include/asm/nohash/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      powerpc/include/asm/nohash/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */
      powerpc/platforms/Kconfig.cputype:      select ARCH_SUPPORTS_NUMA_BALANCING
      riscv/Kconfig:  select ARCH_SUPPORTS_NUMA_BALANCING
      riscv/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      s390/Kconfig:   select ARCH_SUPPORTS_NUMA_BALANCING
      s390/configs/debug_defconfig:CONFIG_NUMA_BALANCING=y
      s390/configs/defconfig:CONFIG_NUMA_BALANCING=y
      s390/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      x86/Kconfig:    select ARCH_SUPPORTS_NUMA_BALANCING     if X86_64
      x86/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING
      x86/include/asm/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */
      
      On the other hand, setup_numabalancing() is implemented in mm/mempolicy.c
      which doesn't depend on architectures.
      Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      Acked-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210302084159.33688-1-song.bao.hua@hisilicon.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      00b072c0
    • U
      rcuscale: Add kfree_rcu() single-argument scale test · 686fe1bf
      Uladzislau Rezki (Sony) 提交于
      The single-argument variant of kfree_rcu() is currently not
      tested by any member of the rcutoture test suite.  This
      commit therefore adds rcuscale code to test it.  This
      testing is controlled by two new boolean module parameters,
      kfree_rcu_test_single and kfree_rcu_test_double.  If one
      is set and the other not, only the corresponding variant
      is tested, otherwise both are tested, with the variant to
      be tested determined randomly on each invocation.
      
      Both of these module parameters are initialized to false,
      so setting either to true will test only that variant.
      Suggested-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      686fe1bf
    • P
      rcu: deprecate "all" option to rcu_nocbs= · 3e70df91
      Paul Gortmaker 提交于
      With the core bitmap support now accepting "N" as a placeholder for
      the end of the bitmap, "all" can be represented as "0-N" and has the
      advantage of not being specific to RCU (or any other subsystem).
      
      So deprecate the use of "all" by removing documentation references
      to it.  The support itself needs to remain for now, since we don't
      know how many people out there are using it currently, but since it
      is in an __init area anyway, it isn't worth losing sleep over.
      
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Acked-by: NYury Norov <yury.norov@gmail.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      3e70df91