提交 · a8a217c22116eff6c120d753c9934089fb229af0 · openanolis / cloud-kernel

10 10月, 2017 26 次提交

locking/core: Remove {read,spin,write}_can_lock() · a8a217c2

由 Will Deacon 提交于 10月 03, 2017

Outside of the locking code itself, {read,spin,write}_can_lock() have no
users in tree. Apparmor (the last remaining user of write_can_lock()) got
moved over to lockdep by the previous patch.

This patch removes the use of {read,spin,write}_can_lock() from the
BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
lock status, and subsequently removes the unused macros altogether. They
aren't guaranteed to work in a concurrent environment and can give
incorrect results in the case of qrwlock.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a8a217c2

locking/rwsem, security/apparmor: Replace homebrew use of write_can_lock() with lockdep · 26c4eb19

由 Will Deacon 提交于 10月 03, 2017

The lockdep subsystem provides a robust way to assert that a lock is
held, so use that instead of write_can_lock, which can give incorrect
results for qrwlocks.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Johansen <john.johansen@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

26c4eb19

locking/rwsem, fs: Use killable down_read() in iterate_dir() · 0dc208b5

由 Kirill Tkhai 提交于 9月 29, 2017

There was mutex_lock_interruptible() initially, and it was changed
to rwsem, but there were not killable rwsem primitives that time.
>From commit 9902af79:

    "The main issue is the lack of down_write_killable(), so the places
     like readdir.c switched to plain inode_lock(); once killable
     variants of rwsem primitives appear, that'll be dealt with"

Use down_read_killable() same as down_write_killable() in !shared
case, as concurrent inode_lock() may take much time, that may be
wanted to be interrupted by user.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670120820.23930.5455667921545937220.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

0dc208b5

locking/rwsem: Add down_read_killable() · 76f8507f

由 Kirill Tkhai 提交于 9月 29, 2017

Similar to down_read() and down_write_killable(),
add killable version of down_read(), based on
__down_read_killable() function, added in previous
patches.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670119884.23930.2585570605960763239.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

76f8507f

locking/arch, x86: Add __down_read_killable() · 19c60923

由 Kirill Tkhai 提交于 9月 29, 2017

Similar to __down_write_killable(), add read killable primitive:
extract current __down_read() code to macros and teach it to get
different functions as slow_path argument:
store ax register to ret, and add sp register and preserve its value.

Add call_rwsem_down_read_failed_killable() assembly entry similar
to call_rwsem_down_read_failed():
push dx register to stack in additional to common registers,
as it's not declarated as modifiable in ____down_read().
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670118802.23930.1316107715255410256.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

19c60923

locking/arch, s390: Add __down_read_killable() · a61ba2c8

由 Kirill Tkhai 提交于 9月 29, 2017

Similar to __down_write_killable(), and read killable primitive.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670117817.23930.13068785028558453848.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

a61ba2c8

locking/arch, ia64: Add __down_read_killable() · c0905115

由 Kirill Tkhai 提交于 9月 29, 2017

Similar to __down_write_killable(), and read killable primitive.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670116749.23930.14976888440968191759.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

c0905115

locking/arch, alpha: Add __down_read_killable() · 8c74392a

由 Kirill Tkhai 提交于 9月 29, 2017

Similar to __down_write_killable(), and read killable primitive.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670115677.23930.5711263025537758463.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

8c74392a

locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case · e6fd28eb

由 Juergen Gross 提交于 9月 06, 2017

With the boot parameter "xen_nopvspin" specified a Xen guest should not
make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this disable the virt_spin_lock_key.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWaiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: hpa@zytor.com
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170906173625.18158-3-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e6fd28eb

locking/paravirt: Use new static key for controlling call of virt_spin_lock() · 9043442b

由 Juergen Gross 提交于 9月 06, 2017

There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
has the downside of falling back to unfair test and set scheme for
qspinlocks due to virt_spin_lock() detecting the virtualized
environment.

Add a static key controlling whether virt_spin_lock() should be
called or not. When running on bare metal set the new key to false.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWaiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: hpa@zytor.com
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170906173625.18158-2-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9043442b

I
Merge branch 'locking/urgent' into locking/core, to pick up fixes · af1a34f2
由 Ingo Molnar 提交于 10月 10, 2017
```
Signed-off-by: NIngo Molnar <mingo@kernel.org>
```
af1a34f2

locking/selftest: Avoid false BUG report · c7e2f69d

由 Peter Zijlstra 提交于 10月 04, 2017

The work-around for the expected failure is providing another failure :/

Only when CONFIG_PROVE_LOCKING=y do we increment unexpected_testcase_failures,
so only then do we need to decrement, otherwise we'll end up with a negative
number and that will again trigger a BUG (printout, not crash).
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d82fed75 ("locking/lockdep/selftests: Fix mixed read-write ABBA tests")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c7e2f69d

locking/lockdep: Fix stacktrace mess · 8b405d5c

由 Peter Zijlstra 提交于 10月 04, 2017

There is some complication between check_prevs_add() and
check_prev_add() wrt. saving stack traces. The problem is that we want
to be frugal with saving stack traces, since it consumes static
resources.

We'll only know in check_prev_add() if we need the trace, but we can
call into it multiple times. So we want to do on-demand and re-use.

A further complication is that check_prev_add() can drop graph_lock
and mess with our static resources.

In any case, the current state; after commit:

  ce07a941 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")

is that we'll assume the trace contains valid data once
check_prev_add() returns '2'. However, as noted by Josh, this is
false, check_prev_add() can return '2' before having saved a trace,
this then result in the possibility of using uninitialized data.
Testing, as reported by Wu, shows a NULL deref.

So simplify.

Since the graph_lock() thing is a debug path that hasn't
really been used in a long while, take it out back and avoid the
head-ache.

Further initialize the stack_trace to a known 'empty' state; as long
as nr_entries == 0, nothing should deref entries. We can then use the
'entries == NULL' test for a valid trace / on-demand saving.
Analyzed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce07a941 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

8b405d5c

Merge branch 'ppc-bundle' (bundle from Michael Ellerman) · 529a86e0

由 Linus Torvalds 提交于 10月 09, 2017

Merge powerpc transactional memory fixes from Michael Ellerman:
 "I figured I'd still send you the commits using a bundle to make sure
  it works in case I need to do it again in future"

This fixes transactional memory state restore for powerpc.

* bundle'd patches from Michael Ellerman:
  powerpc/tm: Fix illegal TM state in signal handler
  powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks

529a86e0

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ff33952e

由 Linus Torvalds 提交于 10月 09, 2017

Pull networking fixes from David Miller:

 1) Fix object leak on IPSEC offload failure, from Steffen Klassert.

 2) Fix range checks in ipset address range addition operations, from
    Jozsef Kadlecsik.

 3) Fix pernet ops unregistration order in ipset, from Florian Westphal.

 4) Add missing netlink attribute policy for nl80211 packet pattern
    attrs, from Peng Xu.

 5) Fix PPP device destruction race, from Guillaume Nault.

 6) Write marks get lost when BPF verifier processes R1=R2 register
    assignments, causing incorrect liveness information and less state
    pruning. Fix from Alexei Starovoitov.

 7) Fix blockhole routes so that they are marked dead and therefore not
    cached in sockets, otherwise IPSEC stops working. From Steffen
    Klassert.

 8) Fix broadcast handling of UDP socket early demux, from Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
  cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
  net: thunderx: mark expected switch fall-throughs in nicvf_main()
  udp: fix bcast packet reception
  netlink: do not set cb_running if dump's start() errs
  ipv4: Fix traffic triggered IPsec connections.
  ipv6: Fix traffic triggered IPsec connections.
  ixgbe: incorrect XDP ring accounting in ethtool tx_frame param
  net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  Revert commit 1a8b6d76 ("net:add one common config...")
  ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register
  ixgbe: Return error when getting PHY address if PHY access is not supported
  netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook
  tipc: Unclone message at secondary destination lookup
  tipc: correct initialization of skb list
  gso: fix payload length when gso_size is zero
  mlxsw: spectrum_router: Avoid expensive lookup during route removal
  bpf: fix liveness marking
  doc: Fix typo "8023.ad" in bonding documentation
  ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real
  ...

ff33952e

cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan · fdfbad32

由 Aleksander Morgado 提交于 10月 09, 2017

The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback.
This module allows switching to different USB profiles with the
'AT+UUSBCONF' command, and provides a ECM network interface when the
'AT+UUSBCONF=2' profile is selected.

The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB
configuration includes a ECM network interface.

Both these modules are controlled via AT commands through one of the
TTYs exposed. Connecting these modules may be done just by activating
the desired PDP context with 'AT+CGACT=1,<cid>' and then running DHCP
on the ECM interface.
Signed-off-by: NAleksander Morgado <aleksander@aleksander.es>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdfbad32

Merge tag 'nfs-for-4.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 68ebe3cb

由 Linus Torvalds 提交于 10月 09, 2017

Pull NFS client bugfixes from Trond Myklebust:
 "Hightlights include:

  stable fixes:
   - nfs/filelayout: fix oops when freeing filelayout segment
   - NFS: Fix uninitialized rpc_wait_queue

  bugfixes:
   - NFSv4/pnfs: Fix an infinite layoutget loop
   - nfs: RPC_MAX_AUTH_SIZE is in bytes"

* tag 'nfs-for-4.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4/pnfs: Fix an infinite layoutget loop
  nfs/filelayout: fix oops when freeing filelayout segment
  sunrpc: remove redundant initialization of sock
  NFS: Fix uninitialized rpc_wait_queue
  NFS: Cleanup error handling in nfs_idmap_request_key()
  nfs: RPC_MAX_AUTH_SIZE is in bytes

68ebe3cb

net: thunderx: mark expected switch fall-throughs in nicvf_main() · 1a2ace56

由 Gustavo A. R. Silva 提交于 10月 09, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a2ace56

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · fb60bccc

由 David S. Miller 提交于 10月 09, 2017

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Fix packet drops due to incorrect ECN handling in IPVS, from Vadim
   Fedorenko.

2) Fix splat with mark restoration in xt_socket with non-full-sock,
   patch from Subash Abhinov Kasiviswanathan.

3) ipset bogusly bails out when adding IPv4 range containing more than
   2^31 addresses, from Jozsef Kadlecsik.

4) Incorrect pernet unregistration order in ipset, from Florian Westphal.

5) Races between dump and swap in ipset results in BUG_ON splats, from
   Ross Lagerwall.

6) Fix chain renames in nf_tables, from JingPiao Chen.

7) Fix race in pernet codepath with ebtables table registration, from
   Artem Savkov.

8) Memory leak in error path in set name allocation in nf_tables, patch
   from Arvind Yadav.

9) Don't dump chain counters if they are not available, this fixes a
   crash when listing the ruleset.

10) Fix out of bound memory read in strlcpy() in x_tables compat code,
    from Eric Dumazet.

11) Make sure we only process TCP packets in SYNPROXY hooks, patch from
    Lin Zhang.

12) Cannot load rules incrementally anymore after xt_bpf with pinned
    objects, added in revision 1. From Shmulik Ladkani.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb60bccc

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 5766cd68

由 David S. Miller 提交于 10月 09, 2017

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2017-10-09

This series contains updates to ixgbe and arch/Kconfig.

Mark fixes a case where PHY register access is not supported and we were
returning a PHY address, when we should have been returning -EOPNOTSUPP.

Sabrina Dubroca fixes the use of a logical "and" when it should have been
the bitwise "and" operator.

Ding Tianhong reverts the commit that added the Kconfig bool option
ARCH_WANT_RELAX_ORDER, since there is now a new flag
PCI_DEV_FLAGS_NO_RELAXED_ORDERING that has been added to indicate that
Relaxed Ordering Attributes should not be used for Transaction Layer
Packets.  Then follows up with making the needed changes to ixgbe to
use the new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag.

John Fastabend fixes an issue in the ring accounting when the transmit
ring parameters are changed via ethtool when an XDP program is attached.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5766cd68

udp: fix bcast packet reception · 996b44fc

由 Paolo Abeni 提交于 10月 09, 2017

The commit bc044e8d ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.

As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.

Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

996b44fc

netlink: do not set cb_running if dump's start() errs · 41c87425

由 Jason A. Donenfeld 提交于 10月 09, 2017

It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c87425

Merge tag 'mac80211-for-davem-2017-10-09' of... · 6df4d17c

由 David S. Miller 提交于 10月 09, 2017

Merge tag 'mac80211-for-davem-2017-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
pull-request: mac80211 2017-10-09

The QCA folks found another netlink problem - we were missing validation
of some attributes. It's not super problematic since one can only read a
few bytes beyond the message (and that memory must exist), but here's the
fix for it.

I thought perhaps we can make nla_parse_nested() require a policy, but
given the two-stage validation/parsing in regular netlink that won't work.

Please pull and let me know if there's any problem.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6df4d17c

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 93b03193

由 David S. Miller 提交于 10月 09, 2017

Steffen Klassert says:

====================
pull request (net): ipsec 2017-10-09

1) Fix some error paths of the IPsec offloading API.

2) Fix a NULL pointer dereference when IPsec is used
   with vti. From Alexey Kodanev.

3) Don't call xfrm_policy_cache_flush under xfrm_state_lock,
   it triggers several locking warnings. From Artem Savkov.

Please pull or let me know if there are problems.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93b03193

ipv4: Fix traffic triggered IPsec connections. · 6c0e7284

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv4_blackhole_route(). The dst_free() marked the
dst_entry as dead and added it to the gc list. I.e. it was setup
for a one time usage. As a result we may now have a blackhole
route cached at a socket on some IPsec scenarios. This makes the
connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: b838d5e1 ("ipv4: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c0e7284

ipv6: Fix traffic triggered IPsec connections. · 62cf27e5

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv6_blackhole_route(). The dst_free() marked
the dst_entry as dead and added it to the gc list. I.e. it
was setup for a one time usage. As a result we may now have
a blackhole route cached at a socket on some IPsec scenarios.
This makes the connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62cf27e5

09 10月, 2017 12 次提交

ixgbe: incorrect XDP ring accounting in ethtool tx_frame param · 8e679021

由 John Fastabend 提交于 9月 07, 2017

Changing the TX ring parameters with an XDP program attached may
cause the XDP queues to be cleared and the TX rings to be incorrectly
configured.

Fix by doing correct ring accounting in setup call.

Fixes: 33fdc82f ("ixgbe: add support for XDP_TX action")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8e679021

net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · 5e0fac63

由 Ding Tianhong 提交于 8月 18, 2017

The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Acked-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5e0fac63

Revert commit ("net:add one common config...") · f4986d25

由 Ding Tianhong 提交于 8月 18, 2017

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76 ("net:add one common config...") did,
so revert this commit.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f4986d25

ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register · a39221ce

由 Sabrina Dubroca 提交于 7月 03, 2017

In ixgbe_clear_udp_tunnel_port(), we read the IXGBE_VXLANCTRL register
and then try to mask some bits out of the value, using the logical
instead of bitwise and operator.

Fixes: a21d0822 ("ixgbe: add support for geneve Rx offload")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a39221ce

ixgbe: Return error when getting PHY address if PHY access is not supported · e0f06bba

由 Mark D Rustad 提交于 8月 31, 2016

In cases where PHY register access is not supported, don't mislead
a caller into thinking that it is supported by returning a PHY
address. Instead, return -EOPNOTSUPP when PHY access is not
supported.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e0f06bba

netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' · 98589a09

由 Shmulik Ladkani 提交于 10月 09, 2017

Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

98589a09

netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook · 49f817d7

由 Lin Zhang 提交于 10月 06, 2017

In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but
the real server maybe reply an icmp error packet related to the exist
tcp conntrack, so we will access wrong tcp data.

Fix it by checking for the protocol field and only process tcp traffic.
Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

49f817d7

tipc: Unclone message at secondary destination lookup · a9e2971b

由 Jon Maloy 提交于 10月 07, 2017

When a bundling message is received, the function tipc_link_input()
calls function tipc_msg_extract() to unbundle all inner messages of
the bundling message before adding them to input queue.

The function tipc_msg_extract() just clones all inner skb for all
inner messagges from the bundling skb. This means that the skb
headroom of an inner message overlaps with the data part of the
preceding message in the bundle.

If the message in question is a name addressed message, it may be
subject to a secondary destination lookup, and eventually be sent out
on one of the interfaces again. But, since what is perceived as headroom
by the device driver in reality is the last bytes of the preceding
message in the bundle, the latter will be overwritten by the MAC
addresses of the L2 header. If the preceding message has not yet been
consumed by the user, it will evenually be delivered with corrupted
contents.

This commit fixes this by uncloning all messages passing through the
function tipc_msg_lookup_dest(), hence ensuring that the headroom
is always valid when the message is passed on.
Signed-off-by: NTung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9e2971b

tipc: correct initialization of skb list · 3382605f

由 Jon Maloy 提交于 10月 07, 2017

We change the initialization of the skb transmit buffer queues
in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
initialize their spinlocks. This is needed because we may, during
error conditions, need to call skb_queue_purge() on those queues
further down the stack.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3382605f

L

Linux 4.14-rc4 · 8a5776a5
由 Linus Torvalds 提交于 10月 08, 2017

8a5776a5

gso: fix payload length when gso_size is zero · 3d0241d5

由 Alexey Kodanev 提交于 10月 06, 2017

When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
we will get incorrect results (payload length, pcsum) for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload.

The issue was found with LTP vxlan & gre tests over ixgbe NIC.

Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d0241d5

mlxsw: spectrum_router: Avoid expensive lookup during route removal · a69518cf

由 Ido Schimmel 提交于 10月 08, 2017

In commit fc922bb0 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.

In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.

With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.

For now, avoid this optimization until it can be properly re-added in
net-next.

Fixes: fc922bb0 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a69518cf

08 10月, 2017 2 次提交

bpf: fix liveness marking · 8fe2d6cc

由 Alexei Starovoitov 提交于 10月 05, 2017

while processing Rx = Ry instruction the verifier does
regs[insn->dst_reg] = regs[insn->src_reg]
which often clears write mark (when Ry doesn't have it)
that was just set by check_reg_arg(Rx) prior to the assignment.
That causes mark_reg_read() to keep marking Rx in this block as
REG_LIVE_READ (since the logic incorrectly misses that it's
screened by the write) and in many of its parents (until lucky
write into the same Rx or beginning of the program).
That causes is_state_visited() logic to miss many pruning opportunities.

Furthermore mark_reg_read() logic propagates the read mark
for BPF_REG_FP as well (though it's readonly) which causes
harmless but unnecssary work during is_state_visited().
Note that do_propagate_liveness() skips FP correctly,
so do the same in mark_reg_read() as well.
It saves 0.2 seconds for the test below

program               before  after
bpf_lb-DLB_L3.o       2604    2304
bpf_lb-DLB_L4.o       11159   3723
bpf_lb-DUNKNOWN.o     1116    1110
bpf_lxc-DDROP_ALL.o   34566   28004
bpf_lxc-DUNKNOWN.o    53267   39026
bpf_netdev.o          17843   16943
bpf_overlay.o         8672    7929
time                  ~11 sec  ~4 sec

Fixes: dc503a8a ("bpf/verifier: track liveness for pruning")
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fe2d6cc

doc: Fix typo "8023.ad" in bonding documentation · 00a534e5

由 Axel Beckert 提交于 10月 05, 2017

Should be "802.3ad" like everywhere else in the document.
Signed-off-by: NAxel Beckert <abe@deuxchevaux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00a534e5

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功