1. 07 1月, 2021 1 次提交
  2. 19 11月, 2020 2 次提交
    • N
      powerpc/64s: flush L1D after user accesses · 9a32a7e7
      Nicholas Piggin 提交于
      IBM Power9 processors can speculatively operate on data in the L1 cache
      before it has been completely validated, via a way-prediction mechanism. It
      is not possible for an attacker to determine the contents of impermissible
      memory using this method, since these systems implement a combination of
      hardware and software security measures to prevent scenarios where
      protected data could be leaked.
      
      However these measures don't address the scenario where an attacker induces
      the operating system to speculatively execute instructions using data that
      the attacker controls. This can be used for example to speculatively bypass
      "kernel user access prevention" techniques, as discovered by Anthony
      Steinhauser of Google's Safeside Project. This is not an attack by itself,
      but there is a possibility it could be used in conjunction with
      side-channels or other weaknesses in the privileged code to construct an
      attack.
      
      This issue can be mitigated by flushing the L1 cache between privilege
      boundaries of concern. This patch flushes the L1 cache after user accesses.
      
      This is part of the fix for CVE-2020-4788.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9a32a7e7
    • N
      powerpc/64s: flush L1D on kernel entry · f7964378
      Nicholas Piggin 提交于
      IBM Power9 processors can speculatively operate on data in the L1 cache
      before it has been completely validated, via a way-prediction mechanism. It
      is not possible for an attacker to determine the contents of impermissible
      memory using this method, since these systems implement a combination of
      hardware and software security measures to prevent scenarios where
      protected data could be leaked.
      
      However these measures don't address the scenario where an attacker induces
      the operating system to speculatively execute instructions using data that
      the attacker controls. This can be used for example to speculatively bypass
      "kernel user access prevention" techniques, as discovered by Anthony
      Steinhauser of Google's Safeside Project. This is not an attack by itself,
      but there is a possibility it could be used in conjunction with
      side-channels or other weaknesses in the privileged code to construct an
      attack.
      
      This issue can be mitigated by flushing the L1 cache between privilege
      boundaries of concern. This patch flushes the L1 cache on kernel entry.
      
      This is part of the fix for CVE-2020-4788.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f7964378
  3. 23 10月, 2020 1 次提交
  4. 20 10月, 2020 1 次提交
    • J
      xen/events: defer eoi in case of excessive number of events · e99502f7
      Juergen Gross 提交于
      In case rogue guests are sending events at high frequency it might
      happen that xen_evtchn_do_upcall() won't stop processing events in
      dom0. As this is done in irq handling a crash might be the result.
      
      In order to avoid that, delay further inter-domain events after some
      time in xen_evtchn_do_upcall() by forcing eoi processing into a
      worker on the same cpu, thus inhibiting new events coming in.
      
      The time after which eoi processing is to be delayed is configurable
      via a new module parameter "event_loop_timeout" which specifies the
      maximum event loop time in jiffies (default: 2, the value was chosen
      after some tests showing that a value of 2 was the lowest with an
      only slight drop of dom0 network throughput while multiple guests
      performed an event storm).
      
      How long eoi processing will be delayed can be specified via another
      parameter "event_eoi_delay" (again in jiffies, default 10, again the
      value was chosen after testing with different delay values).
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJulien Grall <julien@xen.org>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: NWei Liu <wl@xen.org>
      e99502f7
  5. 17 10月, 2020 1 次提交
  6. 06 10月, 2020 1 次提交
  7. 25 9月, 2020 9 次提交
  8. 17 9月, 2020 1 次提交
  9. 11 9月, 2020 1 次提交
  10. 01 9月, 2020 1 次提交
    • B
      dma-contiguous: provide the ability to reserve per-numa CMA · b7176c26
      Barry Song 提交于
      Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get
      coherent DMA buffers to save their command queues and page tables. As
      there is only one default CMA in the whole system, SMMUs on nodes other
      than node0 will get remote memory. This leads to significant latency.
      
      This patch provides per-numa CMA so that drivers like SMMU can get local
      memory. Tests show localizing CMA can decrease dma_unmap latency much.
      For instance, before this patch, SMMU on node2  has to wait for more than
      560ns for the completion of CMD_SYNC in an empty command queue; with this
      patch, it needs 240ns only.
      
      A positive side effect of this patch would be improving performance even
      further for those users who are worried about performance more than DMA
      security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all
      drivers can get local coherent DMA buffers.
      
      Also, this patch changes the default CONFIG_CMA_AREAS to 19 in NUMA. As
      1+CONFIG_CMA_AREAS should be quite enough for most servers on the market
      even they enable both hugetlb_cma and pernuma_cma.
      2 numa nodes: 2(hugetlb) + 2(pernuma) + 1(default global cma) = 5
      4 numa nodes: 4(hugetlb) + 4(pernuma) + 1(default global cma) = 9
      8 numa nodes: 8(hugetlb) + 8(pernuma) + 1(default global cma) = 17
      Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b7176c26
  11. 28 8月, 2020 1 次提交
    • M
      net: add option to not create fall-back tunnels in root-ns as well · 316cdaa1
      Mahesh Bandewar 提交于
      The sysctl that was added  earlier by commit 79134e6c ("net: do
      not create fallback tunnels for non-default namespaces") to create
      fall-back only in root-ns. This patch enhances that behavior to provide
      option not to create fallback tunnels in root-ns as well. Since modules
      that create fallback tunnels could be built-in and setting the sysctl
      value after booting is pointless, so added a kernel cmdline options to
      change this default. The default setting is preseved for backward
      compatibility. The kernel command line option of fb_tunnels=initns will
      set the sysctl value to 1 and will create fallback tunnels only in initns
      while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and
      fallback tunnels are skipped in every netns.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Maciej Zenczykowski <maze@google.com>
      Cc: Jian Yang <jianyang@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316cdaa1
  12. 25 8月, 2020 5 次提交
    • P
      rcutorture: Allow pointer leaks to test diagnostic code · d6855142
      Paul E. McKenney 提交于
      This commit adds an rcutorture.leakpointer module parameter that
      intentionally leaks an RCU-protected pointer out of the RCU read-side
      critical section and checks to see if the corresponding grace period
      has elapsed, emitting a WARN_ON_ONCE() if so.  This module parameter can
      be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end
      grace periods quickly.
      
      While in the area, also document rcutorture.irqreader, which was
      previously left out.
      
      Reported-by Jann Horn <jannh@google.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      d6855142
    • P
      rcu: Provide optional RCU-reader exit delay for strict GPs · 3d29aaf1
      Paul E. McKenney 提交于
      The goal of this series is to increase the probability of tools like
      KASAN detecting that an RCU-protected pointer was used outside of its
      RCU read-side critical section.  Thus far, the approach has been to make
      grace periods and callback processing happen faster.  Another approach
      is to delay the pointer leaker.  This commit therefore allows a delay
      to be applied to exit from RCU read-side critical sections.
      
      This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot
      parameter that specifies this delay in microseconds, defaulting to zero.
      
      Reported-by Jann Horn <jannh@google.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      3d29aaf1
    • P
      rcuperf: Change rcuperf to rcuscale · 4e88ec4a
      Paul E. McKenney 提交于
      This commit further avoids conflation of rcuperf with the kernel's perf
      feature by renaming kernel/rcu/rcuperf.c to kernel/rcu/rcuscale.c, and
      also by similarly renaming the functions and variables inside this file.
      This has the side effect of changing the names of the kernel boot
      parameters, so kernel-parameters.txt and ver_functions.sh are also
      updated.  The rcutorture --torture type was also updated from rcuperf
      to rcuscale.
      
      [ paulmck: Fix bugs located by Stephen Rothwell. ]
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      4e88ec4a
    • P
      scftorture: Add smp_call_function() torture test · e9d338a0
      Paul E. McKenney 提交于
      This commit adds an smp_call_function() torture test that repeatedly
      invokes this function and complains if things go badly awry.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      e9d338a0
    • P
      lib: Add backtrace_idle parameter to force backtrace of idle CPUs · 160c7ba3
      Paul E. McKenney 提交于
      Currently, the nmi_cpu_backtrace() declines to produce backtraces for
      idle CPUs.  This is a good choice in the common case in which problems are
      caused only by non-idle CPUs.  However, there are occasionally situations
      in which idle CPUs are helping to cause problems.  This commit therefore
      adds an nmi_backtrace.backtrace_idle kernel boot parameter that causes
      nmi_cpu_backtrace() to dump stacks even of idle CPUs.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <linux-doc@vger.kernel.org>
      160c7ba3
  13. 20 8月, 2020 1 次提交
    • A
      Documentation: efi: remove description of efi=old_map · fb1201ae
      Ard Biesheuvel 提交于
      The old EFI runtime region mapping logic that was kept around for some
      time has finally been removed entirely, along with the SGI UV1 support
      code that was its last remaining user. So remove any mention of the
      efi=old_map command line parameter from the docs.
      
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      fb1201ae
  14. 12 8月, 2020 1 次提交
  15. 08 8月, 2020 1 次提交
  16. 29 7月, 2020 1 次提交
  17. 23 7月, 2020 1 次提交
    • P
      debugfs: Add access restriction option · a24c6f7b
      Peter Enderborg 提交于
      Since debugfs include sensitive information it need to be treated
      carefully. But it also has many very useful debug functions for userspace.
      With this option we can have same configuration for system with
      need of debugfs and a way to turn it off. This gives a extra protection
      for exposure on systems where user-space services with system
      access are attacked.
      
      It is controlled by a configurable default value that can be override
      with a kernel command line parameter. (debugfs=)
      
      It can be on or off, but also internally on but not seen from user-space.
      This no-mount mode do not register a debugfs as filesystem, but client can
      register their parts in the internal structures. This data can be readed
      with a debugger or saved with a crashkernel. When it is off clients
      get EPERM error when accessing the functions for registering their
      components.
      Signed-off-by: NPeter Enderborg <peter.enderborg@sony.com>
      Link: https://lore.kernel.org/r/20200716071511.26864-3-peter.enderborg@sony.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a24c6f7b
  18. 09 7月, 2020 2 次提交
    • Z
      xen: Mark "xen_nopvspin" parameter obsolete · 9a3c05e6
      Zhenzhong Duan 提交于
      Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
      as we use qspinlock now.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a3c05e6
    • Z
      x86/kvm: Add "nopvspin" parameter to disable PV spinlocks · 05eee619
      Zhenzhong Duan 提交于
      There are cases where a guest tries to switch spinlocks to bare metal
      behavior (e.g. by setting "xen_nopvspin" on XEN platform and
      "hv_nopvspin" on HYPER_V).
      
      That feature is missed on KVM, add a new parameter "nopvspin" to disable
      PV spinlocks for KVM guest.
      
      The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
      parameters in future patches.
      
      Define variable nopvsin as global because it will be used in future
      patches as above.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05eee619
  19. 06 7月, 2020 1 次提交
  20. 02 7月, 2020 1 次提交
    • Q
      cpufreq: Specify default governor on command line · 8412b456
      Quentin Perret 提交于
      Currently, the only way to specify the default CPUfreq governor is
      via Kconfig options, which suits users who can build the kernel
      themselves perfectly.
      
      However, for those who use a distro-like kernel (such as Android,
      with the Generic Kernel Image project), the only way to use a
      non-default governor is to boot to userspace, and to then switch
      using the sysfs interface. Being able to specify the default governor
      on the command line, like is the case for cpuidle, would allow those
      users to specify their governor of choice earlier on, and to simplify
      the userspace boot procedure slighlty.
      
      To support this use-case, add a kernel command line parameter
      allowing the default governor for CPUfreq to be specified, which
      takes precedence over the built-in default.
      
      This implementation has one notable limitation: the default governor
      must be registered before the driver. This is solved for builtin
      governors and drivers using appropriate *_initcall() functions. And
      in the modular case, this must be reflected as a constraint on the
      module loading order.
      Signed-off-by: NQuentin Perret <qperret@google.com>
      [ Viresh: Converted 'default_governor' to a string and parsing it only
      	  at initcall level, and several updates to
      	  cpufreq_init_policy(). ]
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8412b456
  21. 30 6月, 2020 5 次提交
    • P
      torture: Dump ftrace at shutdown only if requested · 2102ad29
      Paul E. McKenney 提交于
      If there is a large number of torture tests running concurrently,
      all of which are dumping large ftrace buffers at shutdown time, the
      resulting dumping can take a very long time, particularly on systems
      with rotating-rust storage.  This commit therefore adds a default-off
      torture.ftrace_dump_at_shutdown module parameter that enables
      shutdown-time ftrace-buffer dumping.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      2102ad29
    • P
      rcutorture: Add races with task-exit processing · 4a5f133c
      Paul E. McKenney 提交于
      Several variants of Linux-kernel RCU interact with task-exit processing,
      including preemptible RCU, Tasks RCU, and Tasks Trace RCU.  This commit
      therefore adds testing of this interaction to rcutorture by adding
      rcutorture.read_exit_burst and rcutorture.read_exit_delay kernel-boot
      parameters.  These kernel parameters control the frequency and spacing
      of special read-then-exit kthreads that are spawned.
      
      [ paulmck: Apply feedback from Dan Carpenter's static checker. ]
      [ paulmck: Reduce latency to avoid false-positive shutdown hangs. ]
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      4a5f133c
    • P
      refperf: Rename refperf.c to refscale.c and change internal names · 1fbeb3a8
      Paul E. McKenney 提交于
      This commit further avoids conflation of refperf with the kernel's perf
      feature by renaming kernel/rcu/refperf.c to kernel/rcu/refscale.c,
      and also by similarly renaming the functions and variables inside
      this file.  This has the side effect of changing the names of the
      kernel boot parameters, so kernel-parameters.txt and ver_functions.sh
      are also updated.
      
      The rcutorture --torture type remains refperf, and this will be
      addressed in a separate commit.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      1fbeb3a8
    • P
      doc: Document rcuperf's module parameters · 847dd70a
      Paul E. McKenney 提交于
      This commit adds documentation for the rcuperf module parameters.
      
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      847dd70a
    • U
      rcu/tree: cache specified number of objects · 53c72b59
      Uladzislau Rezki (Sony) 提交于
      In order to reduce the dynamic need for pages in kfree_rcu(),
      pre-allocate a configurable number of pages per CPU and link
      them in a list. When kfree_rcu() reclaims objects, the object's
      container page is cached into a list instead of being released
      to the low-level page allocator.
      
      Such an approach provides O(1) access to free pages while also
      reducing the number of requests to the page allocator. It also
      makes the kfree_rcu() code to have free pages available during
      a low memory condition.
      
      A read-only sysfs parameter (rcu_min_cached_objs) reflects the
      minimum number of allowed cached pages per CPU.
      Signed-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      53c72b59
  22. 20 6月, 2020 1 次提交