1. 07 8月, 2019 2 次提交
    • J
      Documentation: Add swapgs description to the Spectre v1 documentation · 7634b9cd
      Josh Poimboeuf 提交于
      commit 4c92057661a3412f547ede95715641d7ee16ddac upstream
      
      Add documentation to the Spectre document about the new swapgs variant of
      Spectre v1.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7634b9cd
    • J
      x86/speculation: Enable Spectre v1 swapgs mitigations · 23e7a7b3
      Josh Poimboeuf 提交于
      commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23e7a7b3
  2. 26 7月, 2019 3 次提交
    • J
      dt-bindings: allow up to four clocks for orion-mdio · 404f59e2
      Josua Mayer 提交于
      commit 80785f5a22e9073e2ded5958feb7f220e066d17b upstream.
      
      Armada 8040 needs four clocks to be enabled for MDIO accesses to work.
      Update the binding to allow the extra clock to be specified.
      
      Cc: stable@vger.kernel.org
      Fixes: 6d6a331f ("dt-bindings: allow up to three clocks for orion-mdio")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJosua Mayer <josua@solid-run.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      404f59e2
    • P
      x86/atomic: Fix smp_mb__{before,after}_atomic() · 9e0bcb59
      Peter Zijlstra 提交于
      [ Upstream commit 69d927bba39517d0980462efc051875b7f4db185 ]
      
      Recent probing at the Linux Kernel Memory Model uncovered a
      'surprise'. Strongly ordered architectures where the atomic RmW
      primitive implies full memory ordering and
      smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
      fail for:
      
      	*x = 1;
      	atomic_inc(u);
      	smp_mb__after_atomic();
      	r0 = *y;
      
      Because, while the atomic_inc() implies memory order, it
      (surprisingly) does not provide a compiler barrier. This then allows
      the compiler to re-order like so:
      
      	atomic_inc(u);
      	*x = 1;
      	smp_mb__after_atomic();
      	r0 = *y;
      
      Which the CPU is then allowed to re-order (under TSO rules) like:
      
      	atomic_inc(u);
      	r0 = *y;
      	*x = 1;
      
      And this very much was not intended. Therefore strengthen the atomic
      RmW ops to include a compiler barrier.
      
      NOTE: atomic_{or,and,xor} and the bitops already had the compiler
      barrier.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9e0bcb59
    • Q
      sched/fair: Fix "runnable_avg_yN_inv" not used warnings · 7fc96cd2
      Qian Cai 提交于
      [ Upstream commit 509466b7d480bc5d22e90b9fbe6122ae0e2fbe39 ]
      
      runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
      included in several other places because they need other macros all
      came from kernel/sched/sched-pelt.h which was generated by
      Documentation/scheduler/sched-pelt. As the result, it causes compilation
      a lot of warnings,
      
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        ...
      
      Silence it by appending the __maybe_unused attribute for it, so all
      generated variables and macros can still be kept in the same file.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pwSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7fc96cd2
  3. 14 7月, 2019 4 次提交
  4. 03 7月, 2019 1 次提交
  5. 18 6月, 2019 1 次提交
    • E
      tcp: add tcp_min_snd_mss sysctl · 7f9f8a37
      Eric Dumazet 提交于
      commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.
      
      Some TCP peers announce a very small MSS option in their SYN and/or
      SYN/ACK messages.
      
      This forces the stack to send packets with a very high network/cpu
      overhead.
      
      Linux has enforced a minimal value of 48. Since this value includes
      the size of TCP options, and that the options can consume up to 40
      bytes, this means that each segment can include only 8 bytes of payload.
      
      In some cases, it can be useful to increase the minimal value
      to a saner value.
      
      We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
      reasons.
      
      Note that TCP_MAXSEG socket option enforces a minimal value
      of (TCP_MIN_MSS). David Miller increased this minimal value
      in commit c39508d6 ("tcp: Make TCP_MAXSEG minimum more correct.")
      from 64 to 88.
      
      We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
      
      CVE-2019-11479 -- tcp mss hardcoded to 48
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NJonathan Looney <jtl@netflix.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f9f8a37
  6. 09 6月, 2019 3 次提交
  7. 31 5月, 2019 2 次提交
    • W
      arm64: errata: Add workaround for Cortex-A76 erratum #1463225 · 2eefb4a3
      Will Deacon 提交于
      commit 969f5ea627570e91c9d54403287ee3ed657f58fe upstream.
      
      Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
      that can prevent interrupts from being taken when single-stepping.
      
      This patch implements a software workaround to prevent userspace from
      effectively being able to disable interrupts.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      2eefb4a3
    • D
      bpf: add bpf_jit_limit knob to restrict unpriv allocations · 43caa29c
      Daniel Borkmann 提交于
      commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3 upstream.
      
      Rick reported that the BPF JIT could potentially fill the entire module
      space with BPF programs from unprivileged users which would prevent later
      attempts to load normal kernel modules or privileged BPF programs, for
      example. If JIT was enabled but unsuccessful to generate the image, then
      before commit 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
      we would always fall back to the BPF interpreter. Nowadays in the case
      where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
      with a failure since the BPF interpreter was compiled out.
      
      Add a global limit and enforce it for unprivileged users such that in case
      of BPF interpreter compiled out we fail once the limit has been reached
      or we fall back to BPF interpreter earlier w/o using module mem if latter
      was compiled in. In a next step, fair share among unprivileged users can
      be resolved in particular for the case where we would fail hard once limit
      is reached.
      
      Fixes: 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
      Fixes: 0a14842f ("net: filter: Just In Time compiler for x86-64")
      Co-Developed-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43caa29c
  8. 26 5月, 2019 1 次提交
    • A
      dcache: sort the freeing-without-RCU-delay mess for good. · c939121b
      Al Viro 提交于
      commit 5467a68cbf6884c9a9d91e2a89140afb1839c835 upstream.
      
      For lockless accesses to dentries we don't have pinned we rely
      (among other things) upon having an RCU delay between dropping
      the last reference and actually freeing the memory.
      
      On the other hand, for things like pipes and sockets we neither
      do that kind of lockless access, nor want to deal with the
      overhead of an RCU delay every time a socket gets closed.
      
      So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
      made sure it would happen.  We tried to avoid setting it unless
      we knew we need it.  Unfortunately, that had led to recurring
      class of bugs, in which we missed the need to set it.
      
      We only really need it for dentries that are created by
      d_alloc_pseudo(), so let's not bother with trying to be smart -
      just make having an RCU delay the default.  The ones that do
      *not* get it set the replacement flag (DCACHE_NORCU) and we'd
      better use that sparingly.  d_alloc_pseudo() is the only
      such user right now.
      
      FWIW, the race that finally prompted that switch had been
      between __lock_parent() of immediate subdirectory of what's
      currently the root of a disconnected tree (e.g. from
      open-by-handle in progress) racing with d_splice_alias()
      elsewhere picking another alias for the same inode, either
      on outright corrupted fs image, or (in case of open-by-handle
      on NFS) that subdirectory having been just moved on server.
      It's not easy to hit, so the sky is not falling, but that's
      not the first race on similar missed cases and the logics
      for settinf DCACHE_RCUACCESS has gotten ridiculously
      convoluted.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c939121b
  9. 22 5月, 2019 2 次提交
  10. 15 5月, 2019 18 次提交
  11. 08 5月, 2019 1 次提交
    • A
      USB: core: Fix bug caused by duplicate interface PM usage counter · 83c6688d
      Alan Stern 提交于
      commit c2b71462d294cf517a0bc6e4fd6424d7cee5596f upstream.
      
      The syzkaller fuzzer reported a bug in the USB hub driver which turned
      out to be caused by a negative runtime-PM usage counter.  This allowed
      a hub to be runtime suspended at a time when the driver did not expect
      it.  The symptom is a WARNING issued because the hub's status URB is
      submitted while it is already active:
      
      	URB 0000000031fb463e submitted while active
      	WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363
      
      The negative runtime-PM usage count was caused by an unfortunate
      design decision made when runtime PM was first implemented for USB.
      At that time, USB class drivers were allowed to unbind from their
      interfaces without balancing the usage counter (i.e., leaving it with
      a positive count).  The core code would take care of setting the
      counter back to 0 before allowing another driver to bind to the
      interface.
      
      Later on when runtime PM was implemented for the entire kernel, the
      opposite decision was made: Drivers were required to balance their
      runtime-PM get and put calls.  In order to maintain backward
      compatibility, however, the USB subsystem adapted to the new
      implementation by keeping an independent usage counter for each
      interface and using it to automatically adjust the normal usage
      counter back to 0 whenever a driver was unbound.
      
      This approach involves duplicating information, but what is worse, it
      doesn't work properly in cases where a USB class driver delays
      decrementing the usage counter until after the driver's disconnect()
      routine has returned and the counter has been adjusted back to 0.
      Doing so would cause the usage counter to become negative.  There's
      even a warning about this in the USB power management documentation!
      
      As it happens, this is exactly what the hub driver does.  The
      kick_hub_wq() routine increments the runtime-PM usage counter, and the
      corresponding decrement is carried out by hub_event() in the context
      of the hub_wq work-queue thread.  This work routine may sometimes run
      after the driver has been unbound from its interface, and when it does
      it causes the usage counter to go negative.
      
      It is not possible for hub_disconnect() to wait for a pending
      hub_event() call to finish, because hub_disconnect() is called with
      the device lock held and hub_event() acquires that lock.  The only
      feasible fix is to reverse the original design decision: remove the
      duplicate interface-specific usage counter and require USB drivers to
      balance their runtime PM gets and puts.  As far as I know, all
      existing drivers currently do this.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com
      CC: <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83c6688d
  12. 04 5月, 2019 1 次提交
  13. 02 5月, 2019 1 次提交