提交 · d6dd6431591b2af8580cd2b57f501352222edb4f · openanolis / cloud-kernel

13 8月, 2018 1 次提交

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d6dd6431

由 Linus Torvalds 提交于 8月 12, 2018

Pull vfs fixes from Al Viro:
 "A bunch of race fixes, mostly around lazy pathwalk.

  All of it is -stable fodder, a large part going back to 2013"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make sure that __dentry_kill() always invalidates d_seq, unhashed or not
  fix __legitimize_mnt()/mntput() race
  fix mntput/mntput race
  root dentries need RCU-delayed freeing

d6dd6431

12 8月, 2018 5 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ec0c9671

由 Linus Torvalds 提交于 8月 11, 2018

Pull networking fixes from David Miller:
 "Last bit of straggler fixes...

  1) Fix btf library licensing to LGPL, from Martin KaFai lau.

  2) Fix error handling in bpf sockmap code, from Daniel Borkmann.

  3) XDP cpumap teardown handling wrt. execution contexts, from Jesper
     Dangaard Brouer.

  4) Fix loss of runtime PM on failed vlan add/del, from Ivan
     Khoronzhuk.

  5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail()
     call, which potentially changes the skb's data buffer, and thus
     skb_shinfo(). Fix from Juergen Gross"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  xen/netfront: don't cache skb_shinfo()
  net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan
  net: ethernet: ti: cpsw: clear all entries when delete vid
  xdp: fix bug in devmap teardown code path
  samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
  xdp: fix bug in cpumap teardown code path
  bpf, sockmap: fix cork timeout for select due to epipe
  bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
  bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
  bpf: btf: Change tools/lib/bpf/btf to LGPL

ec0c9671

xen/netfront: don't cache skb_shinfo() · d472b3a6

由 Juergen Gross 提交于 8月 09, 2018

skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.

Cc: stable@vger.kernel.org
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d472b3a6

Merge branch 'cpsw-runtime-pm-fix' · 556fdd85

由 David S. Miller 提交于 8月 11, 2018

Grygorii Strashko says:

====================
net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid

Here 2 not critical fixes for:
- vlan ale table leak while error if deleting vlan (simplifies next fix)
- runtime pm while try to set reserved vlan
====================
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

556fdd85

net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan · 803c4f64

由 Ivan Khoronzhuk 提交于 8月 10, 2018

It's exclusive with normal behaviour but if try to set vlan to one of
the reserved values is made, the cpsw runtime pm is broken.

Fixes: a6c5d14f ("drivers: net: cpsw: ndev: fix accessing to suspended device")
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

803c4f64

net: ethernet: ti: cpsw: clear all entries when delete vid · be35b982

由 Ivan Khoronzhuk 提交于 8月 10, 2018

In cases if some of the entries were not found in forwarding table
while killing vlan, the rest not needed entries still left in the
table. No need to stop, as entry was deleted anyway. So fix this by
returning error only after all was cleaned. To implement this, return
-ENOENT in cpsw_ale_del_mcast() as it's supposed to be.
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be35b982

11 8月, 2018 5 次提交

zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature · 4f7a7bea

由 Minchan Kim 提交于 8月 10, 2018

If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.

Do not pretend to be synchronous IO device.  It makes the system very
sluggish due to waiting for IO completion from upper layers.

Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.

This patch fixes the problem.

  BUG: Bad page state in process qemu-system-x86  pfn:3dfab21
  page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
  flags: 0x17fffc000000008(uptodate)
  raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
  bad because of flags: 0x8(uptodate)
  CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G    B 4.18.0-rc5+ #1
  Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
  Call Trace:
    dump_stack+0x5c/0x7b
    bad_page+0xba/0x120
    get_page_from_freelist+0x1016/0x1250
    __alloc_pages_nodemask+0xfa/0x250
    alloc_pages_vma+0x7c/0x1c0
    do_swap_page+0x347/0x920
    __handle_mm_fault+0x7b4/0x1110
    handle_mm_fault+0xfc/0x1f0
    __get_user_pages+0x12f/0x690
    get_user_pages_unlocked+0x148/0x1f0
    __gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
    try_async_pf+0x87/0x230 [kvm]
    tdp_page_fault+0x132/0x290 [kvm]
    kvm_mmu_page_fault+0x74/0x570 [kvm]
    kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
    kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
    do_vfs_ioctl+0xa2/0x630
    ksys_ioctl+0x70/0x80
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x55/0x100
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
[minchan@kernel.org: fix changelog, add comment]
 Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
 Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
 Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NTino Lehnig <tino.lehnig@contabo.de>
Tested-by: NTino Lehnig <tino.lehnig@contabo.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>	[4.15+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f7a7bea

mm/memory.c: check return value of ioremap_prot · 24eee1e4

由 jie@chenjie6@huwei.com 提交于 8月 10, 2018

ioremap_prot() can return NULL which could lead to an oops.

Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.comSigned-off-by: Nchen jie <chenjie6@huawei.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: chenjie <chenjie6@huawei.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24eee1e4

lib/ubsan: remove null-pointer checks · 3ca17b1f

由 Andrey Ryabinin 提交于 8月 10, 2018

With gcc-8 fsanitize=null become very noisy.  GCC started to complain
about things like &a->b, where 'a' is NULL pointer.  There is no NULL
dereference, we just calculate address to struct member.  It's
technically undefined behavior so UBSAN is correct to report it.  But as
long as there is no real NULL-dereference, I think, we should be fine.

-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences.  So let's just no use -fsanitize=null as it's not useful
for us.  If there is a real NULL-deref we will see crash.  Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.

Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ca17b1f

MAINTAINERS: GDB: update e-mail address · 5832fcf9

由 Kieran Bingham 提交于 8月 10, 2018

This entry was created with my personal e-mail address. Update this entry
to my open-source kernel.org account.

Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.comSigned-off-by: NKieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5832fcf9

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · f313b43b

由 Linus Torvalds 提交于 8月 10, 2018

Pull i2c fix from Wolfram Sang:
 "A single driver bugfix for I2C.

  The bug was found by systematically stress testing the driver, so I am
  confident to merge it that late in the cycle although it is probably
  unusually large"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xlp9xx: Fix case where SSIF read transaction completes early

f313b43b

10 8月, 2018 10 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e91e2189

由 David S. Miller 提交于 8月 09, 2018

Daniel Borkmann says:

====================
pull-request: bpf 2018-08-10

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix cpumap and devmap on teardown as they're under RCU context
   and won't have same assumption as running under NAPI protection,
   from Jesper.

2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
   a bug where socket error was not propagated correctly, from Daniel.

3) Fix incompatible libbpf header license for BTF code and match it
   before it gets officially released with the rest of libbpf which
   is LGPL-2.1, from Martin.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e91e2189

make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 4c0d7cd5

由 Al Viro 提交于 8月 09, 2018

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: N"Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4c0d7cd5

fix __legitimize_mnt()/mntput() race · 119e1ef8

由 Al Viro 提交于 8月 09, 2018

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: NOleg Nesterov <oleg@redhat.com>
Fixes: 48a066e7 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

119e1ef8

fix mntput/mntput race · 9ea0a46c

由 Al Viro 提交于 8月 09, 2018

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
	* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
	* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: NJann Horn <jannh@google.com>
Tested-by: NJann Horn <jannh@google.com>
Fixes: 48a066e7 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9ea0a46c

Merge branch 'bpf-fix-cpu-and-devmap-teardown' · 9c954201

由 Daniel Borkmann 提交于 8月 09, 2018

Jesper Dangaard Brouer says:

====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet.  Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.

The optimization introduced in commt 389ab7f0 ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context.  Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.

The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9c954201

xdp: fix bug in devmap teardown code path · 1bf9116d

由 Jesper Dangaard Brouer 提交于 8月 08, 2018

Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.

Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc405 ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1bf9116d

samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier · 37d7ff25

由 Jesper Dangaard Brouer 提交于 8月 08, 2018

The teardown race in cpumap is really hard to reproduce.  These changes
makes it easier to reproduce, for QA.

The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.

Also increase MAX_CPUS, as my QA department have larger machines than me.
Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

37d7ff25

xdp: fix bug in cpumap teardown code path · ad0ab027

由 Jesper Dangaard Brouer 提交于 8月 08, 2018

When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.

The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path.  Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.

This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.

Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ad0ab027

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 112cbae2

由 Linus Torvalds 提交于 8月 09, 2018

Pull crypto fix from Herbert Xu:
 "This fixes a performance regression in arm64 NEON crypto as well as a
  crash in x86 aegis/morus on unsupported CPUs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aegis,morus - Fix and simplify CPUID checks
  crypto: arm64 - revert NEON yield for fast AEAD implementations

112cbae2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6395ad85

由 Linus Torvalds 提交于 8月 09, 2018

Pull networking fixes from David Miller:

 1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
    Cong Wang.

 2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
    conditions, from Willem de Bruijn.

 3) vsock can reinitialize active work struct, fix from Cong Wang.

 4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.

 5) Fix locking in AF_SMC ioctl, from Ursula Braun.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  dsa: slave: eee: Allow ports to use phylink
  net/smc: move sock lock in smc_ioctl()
  net/smc: allow sysctl rmem and wmem defaults for servers
  net/smc: no shutdown in state SMC_LISTEN
  net: aquantia: Fix IFF_ALLMULTI flag functionality
  rxrpc: Fix the keepalive generator [ver #2]
  net/mlx5e: Cleanup of dcbnl related fields
  net/mlx5e: Properly check if hairpin is possible between two functions
  vhost: reset metadata cache when initializing new IOTLB
  llc: use refcount_inc_not_zero() for llc_sap_find()
  dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
  tipc: fix an interrupt unsafe locking scenario
  vsock: split dwork to avoid reinitializations
  net: thunderx: check for failed allocation lmac->dmacs
  cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
  packet: refine ring v3 block size test to hold one frame
  ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
  ipv6: fix double refcount of fib6_metrics

6395ad85

09 8月, 2018 18 次提交

i2c: xlp9xx: Fix case where SSIF read transaction completes early · 5eb173f5

由 George Cherian 提交于 8月 08, 2018

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.
Signed-off-by: NGeorge Cherian <george.cherian@cavium.com>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org

5eb173f5

dsa: slave: eee: Allow ports to use phylink · 1be52e97

由 Andrew Lunn 提交于 8月 08, 2018

For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.

Fixes: aab9c406 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1be52e97

Merge branch 'smc-fixes' · ef91b6f9

由 David S. Miller 提交于 8月 08, 2018

Ursula Braun says:

====================
net/smc: fixes 2018-08-08

here are small fixes for SMC: The first patch makes sure, shutdown code
is not executed for sockets in state SMC_LISTEN. The second patch resets
send and receive buffer values for accepted sockets, since TCP buffer size
optimizations for the internal CLC socket should not be forwarded to the
outer SMC socket. The third patch solves a race between connect and ioctl
reported by syzbot.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef91b6f9

net/smc: move sock lock in smc_ioctl() · 7311d665

由 Ursula Braun 提交于 8月 08, 2018

When an SMC socket is connecting it is decided whether fallback to
TCP is needed. To avoid races between connect and ioctl move the
sock lock before the use_fallback check.

Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
Fixes: 1992d998 ("net/smc: take sock lock in smc_ioctl()")
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7311d665

net/smc: allow sysctl rmem and wmem defaults for servers · bd58c7e0

由 Ursula Braun 提交于 8月 08, 2018

Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd58c7e0

net/smc: no shutdown in state SMC_LISTEN · caa21e19

由 Ursula Braun 提交于 8月 08, 2018

Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa21e19

net: aquantia: Fix IFF_ALLMULTI flag functionality · 11ba961c

由 Dmitry Bogdanov 提交于 8月 08, 2018

It was noticed that NIC always pass all multicast traffic to the host
regardless of IFF_ALLMULTI flag on the interface.
The rule in MC Filter Table in NIC, that is configured to accept any
multicast packets, is turning on if IFF_MULTICAST flag is set on the
interface. It leads to passing all multicast traffic to the host.
This fix changes the condition to turn on that rule by checking
IFF_ALLMULTI flag as it should.

Fixes: b21f502f ("net:ethernet:aquantia: Fix for multicast filter handling.")
Signed-off-by: NDmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11ba961c

rxrpc: Fix the keepalive generator [ver ] · 330bdcfa

由 David Howells 提交于 8月 08, 2018

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open.  The implementation is incorrect in the following ways:

 (1) It mixes up ktime_t and time64_t types.

 (2) It uses ktime_get_real(), the output of which may jump forward or
     backward due to adjustments to the time of day.

 (3) If the current time jumps forward too much or jumps backwards, the
     generator function will crank the base of the time ring round one slot
     at a time (ie. a 1s period) until it catches up, spewing out VERSION
     packets as it goes.

Fix the problem by:

 (1) Only using time64_t.  There's no need for sub-second resolution.

 (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
     isn't perceived to go backwards.

 (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
     parts:

     (a) The "worker" function that manages the buckets and the timer.

     (b) The "dispatch" function that takes the pending peers and
     	 potentially transmits a keepalive packet before putting them back
     	 in the ring into the slot appropriate to the revised last-Tx time.

 (4) Taking everything that's pending out of the ring and splicing it into
     a temporary collector list for processing.

     In the case that there's been a significant jump forward, the ring
     gets entirely emptied and then the time base can be warped forward
     before the peers are processed.

     The warping can't happen if the ring isn't empty because the slot a
     peer is in is keepalive-time dependent, relative to the base time.

 (5) Limit the number of iterations of the bucket array when scanning it.

 (6) Set the timer to skip any empty slots as there's no point waking up if
     there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started.  The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

	watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
	...
	Workqueue: krxrpcd rxrpc_peer_keepalive_worker
	EIP: lock_acquire+0x69/0x80
	...
	Call Trace:
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? _raw_spin_lock_bh+0x29/0x60
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? __lock_acquire+0x3d3/0x870
	 ? process_one_work+0x110/0x340
	 ? process_one_work+0x166/0x340
	 ? process_one_work+0x110/0x340
	 ? worker_thread+0x39/0x3c0
	 ? kthread+0xdb/0x110
	 ? cancel_delayed_work+0x90/0x90
	 ? kthread_stop+0x70/0x70
	 ? ret_from_fork+0x19/0x24

Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

330bdcfa

Merge branch 'mlx5-fixes' · f39cc1c7

由 David S. Miller 提交于 8月 08, 2018

Saeed Mahameed says:

====================
Mellanox, mlx5e fixes 2018-08-07

I know it is late into 4.18 release, and this is why I am submitting
only two mlx5e ethernet fixes.

The first one from Or, is needed for -stable and it fixes hairpin
for "same device" check.

The second fix is a non risk fix from Huy which cleans up and improves
error return value reporting for dcbnl_ieee_setapp.

For -stable v4.16
- net/mlx5e: Properly check if hairpin is possible between two functions
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f39cc1c7

net/mlx5e: Cleanup of dcbnl related fields · f280c6a1

由 Huy Nguyen 提交于 8月 08, 2018

Remove unused netdev_registered_init/remove in en.h
Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
Remove extra white space

Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f280c6a1

net/mlx5e: Properly check if hairpin is possible between two functions · 816f6706

由 Or Gerlitz 提交于 8月 08, 2018

The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.

Fixes: 5c65c564 ('net/mlx5e: Support offloading TC NIC hairpin flows')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reported-by: NAlaa Hleihel <alaa@mellanox.com>
Tested-by: NAlaa Hleihel <alaa@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

816f6706

parisc: Define mb() and add memory barriers to assembler unlock sequences · fedb8da9

由 John David Anglin 提交于 8月 05, 2018

For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.

This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml

For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NHelge Deller <deller@gmx.de>

fedb8da9

parisc: Enable CONFIG_MLONGCALLS by default · 66509a27

由 Helge Deller 提交于 7月 28, 2018

Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NHelge Deller <deller@gmx.de>

66509a27

Merge branch 'sockmap-fixes' · bf9bae0e

由 Alexei Starovoitov 提交于 8月 08, 2018

Daniel Borkmann says:

====================
Two sockmap fixes in bpf_tcp_sendmsg(), and one fix for the
sockmap kernel selftest. Thanks!
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

bf9bae0e

bpf, sockmap: fix cork timeout for select due to epipe · 3c6ed988

由 Daniel Borkmann 提交于 8月 08, 2018

I ran into the same issue as a009f1f3 ("selftests/bpf:
test_sockmap, timing improvements") where I had a broken
pipe error on the socket due to remote end timing out on
select and then shutting down it's sockets while the other
side was still sending. We may need to do a bigger rework
in general on the test_sockmap.c, but for now increase it
to a more suitable timeout.

Fixes: a18fda1a ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

3c6ed988

bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path · 7c81c717

由 Daniel Borkmann 提交于 8月 08, 2018

In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of
ENOMEM, it may also mean that we've partially filled the scatterlist
entries with pages. Later jumping to sk_stream_wait_memory()
we could further fail with an error for several reasons, however
we miss to call free_start_sg() if the local sk_msg_buff was used.

Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

7c81c717

bpf, sockmap: fix bpf_tcp_sendmsg sock error handling · 5121700b

由 Daniel Borkmann 提交于 8月 08, 2018

While working on bpf_tcp_sendmsg() code, I noticed that when a
sk->sk_err is set we error out with err = sk->sk_err. However
this is problematic since sk->sk_err is a positive error value
and therefore we will neither go into sk_stream_error() nor will
we report an error back to user space. I had this case with EPIPE
and user space was thinking sendmsg() succeeded since EPIPE is
a positive value, thinking we submitted 32 bytes. Fix it by
negating the sk->sk_err value.

Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

5121700b

vhost: reset metadata cache when initializing new IOTLB · b13f9c63

由 Jason Wang 提交于 8月 08, 2018

We need to reset metadata cache during new IOTLB initialization,
otherwise the stale pointers to previous IOTLB may be still accessed
which will lead a use after free.

Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com
Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b13f9c63

08 8月, 2018 1 次提交

llc: use refcount_inc_not_zero() for llc_sap_find() · 0dcb8225

由 Cong Wang 提交于 8月 07, 2018

llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.

Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.

Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dcb8225

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功