提交 · f7225172f25aaf0dfd9ad65f05be8da5d6108b12 · openeuler / Kernel

05 6月, 2018 40 次提交

net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172

由 David Ahern 提交于 6月 04, 2018

syzbot reported a use-after-free:

BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Allocated by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:104
 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Freed by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:140
 dst_release_immediate+0x71/0x9e net/core/dst.c:205
 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

The problem is that rt_last can point to a deleted route if the insert
fails.

One reproducer is to insert a route and then add a multipath route that
has a duplicate nexthop.e.g,:
    $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
    $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2

Fix by not setting rt_last until the it is verified the insert succeeded.

Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7225172

net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default. · 69e2eccc

由 Kun Yi 提交于 6月 04, 2018

BCM54612E have 4 multi-functional LED pins that can be configured
through register setting; the LED4 pin can be configured to a 125MHz
reference clock output by setting the spare register. Since the dedicated
CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
pin is the only pin to provide such function in this package, and therefore
it is beneficial to just enable the reference clock by default.
Signed-off-by: NKun Yi <kunyi@google.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69e2eccc

l2tp: fix refcount leakage on PPPoL2TP sockets · 3d609342

由 Guillaume Nault 提交于 6月 04, 2018

Commit d02ba2a6 ("l2tp: fix race in pppol2tp_release with session
object destroy") tried to fix a race condition where a PPPoL2TP socket
would disappear while the L2TP session was still using it. However, it
missed the root issue which is that an L2TP session may accept to be
reconnected if its associated socket has entered the release process.

The tentative fix makes the session hold the socket it is connected to.
That saves the kernel from crashing, but introduces refcount leakage,
preventing the socket from completing the release process. Once stalled,
everything the socket depends on can't be released anymore, including
the L2TP session and the l2tp_ppp module.

The root issue is that, when releasing a connected PPPoL2TP socket, the
session's ->sk pointer (RCU-protected) is reset to NULL and we have to
wait for a grace period before destroying the socket. The socket drops
the session in its ->sk_destruct callback function, so the session
will exist until the last reference on the socket is dropped.
Therefore, there is a time frame where pppol2tp_connect() may accept
reconnecting a session, as it only checks ->sk to figure out if the
session is connected. This time frame is shortened by the fact that
pppol2tp_release() calls l2tp_session_delete(), making the session
unreachable before resetting ->sk. However, pppol2tp_connect() may
grab the session before it gets unhashed by l2tp_session_delete(), but
it may test ->sk after the later got reset. The race is not so hard to
trigger and syzbot found a pretty reliable reproducer:
https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf

Before d02ba2a6, another race could let pppol2tp_release()
overwrite the ->__sk pointer of an L2TP session, thus tricking
pppol2tp_put_sk() into calling sock_put() on a socket that is different
than the one for which pppol2tp_release() was originally called. To get
there, we had to trigger the race described above, therefore having one
PPPoL2TP socket being released, while the session it is connected to is
reconnecting to a different PPPoL2TP socket. When releasing this new
socket fast enough, pppol2tp_release() overwrites the session's
->__sk pointer with the address of the new socket, before the first
pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call
invoked by the original socket will sock_put() the new socket,
potentially dropping its last reference. When the second
pppol2tp_put_sk() finally runs, its socket has already been freed.

With d02ba2a6, the session takes a reference on both sockets.
Furthermore, the session's ->sk pointer is reset in the
pppol2tp_session_close() callback function rather than in
pppol2tp_release(). Therefore, ->__sk can't be overwritten and
pppol2tp_put_sk() is called only once (l2tp_session_delete() will only
run pppol2tp_session_close() once, to protect the session against
concurrent deletion requests). Now pppol2tp_put_sk() will properly
sock_put() the original socket, but the new socket will remain, as
l2tp_session_delete() prevented the release process from completing.
Here, we don't depend on the ->__sk race to trigger the bug. Getting
into the pppol2tp_connect() race is enough to leak the reference, no
matter when new socket is released.

So it all boils down to pppol2tp_connect() failing to realise that the
session has already been connected. This patch drops the unneeded extra
reference counting (mostly reverting d02ba2a6) and checks that
neither ->sk nor ->__sk is set before allowing a session to be
connected.

Fixes: d02ba2a6 ("l2tp: fix race in pppol2tp_release with session object destroy")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d609342

Merge branch 'net-phy-improve-PM-handling-of-PHY-MDIO' · 7a723099

由 David S. Miller 提交于 6月 05, 2018

Heiner Kallweit says:

====================
net: phy: improve PM handling of PHY/MDIO

Current implementation of MDIO bus PM ops doesn't actually implement
bus-specific PM ops but just calls PM ops defined on a device level
what doesn't seem to be fully in line with the core PM model.

When looking e.g. at __device_suspend() the PM core looks for PM ops
of a device in a specific order:
1. device PM domain
2. device type
3. device class
4. device bus

I think it has good reason that there's no PM ops on device level.
The situation can be improved by modeling PHY's as device type of
a MDIO device. If for some other type of MDIO device PM ops are
needed, it could be modeled as struct device_type as well.
====================
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a723099

net: phy: remove PM ops from MDIO bus · 9107c05e

由 Heiner Kallweit 提交于 6月 02, 2018

Current implementation of MDIO bus PM ops doesn't actually implement
bus-specific PM ops but just calls PM ops defined on a device level
what doesn't seem to be fully in line with the core PM model.

When looking e.g. at __device_suspend() the PM core looks for PM ops
of a device in a specific order:
1. device PM domain
2. device type
3. device class
4. device bus

I think it has good reason that there's no PM ops on device level.

Now that a device type representation of PHY's as special type of MDIO
devices was added (only user of MDIO bus PM ops), the MDIO bus
PM ops can be removed including member pm of struct mdio_device.

If for some other type of MDIO device PM ops are needed, it should be
modeled as struct device_type as well.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9107c05e

net: phy: add struct device_type representation of a PHY · 7f4828ff

由 Heiner Kallweit 提交于 6月 02, 2018

A PHY is a type of MDIO device, so let's model it as struct device_type
and place PM ops, attribute groups and release callback on device type
level. For this the attribute definitions have to be moved.
This change allows us to get rid of the PM ops on a bus level in a second
step.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f4828ff

D

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7d840a60
由 David S. Miller 提交于 6月 04, 2018

7d840a60

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · d67b66b4

由 David S. Miller 提交于 6月 04, 2018

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-06-04

This series contains a smorgasbord of updates to documentation, e1000e,
igb, ixgbe, ixgbevf and i40e.

Benjamin Poirier fixes a potential kernel crash due to NULL pointer
dereference in e1000e.

Jeff updates the kernel documentation for e100 and e1000 to correct
default values and URLs which were incorrect in the documentation.  Also
took the time to update these to the new reStructured text format for
kernel documentation.

Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that
TSICR gets cleared when ICR is cleared.

Sergey updates igb to reset all the transmit queues at one time so that
we only have to wait once for all the queues to be reset.

Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist
with XDP.

Emil and Tony extend the RTNL lock to ensure we get the most up-to-date
values for the bits and avoid a possible race condition when going down.

YueHaibing from Huawei introduces a helper function in ixgbe for
operation reads to simplify the code a bit more.

Daniel Borkmann adds support for XDP meta data when using build SKB
for i40e.

Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is
to make sure we do not try to offload the decryption of any incoming
packet that is destined for the management engine.  The other fix is to
resolve a cast problem introduced by a sparse cleanup patch.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d67b66b4

net: hns: Fix the process of adding broadcast addresses to tcam · f0b964e5

由 Xi Wang 提交于 6月 04, 2018

If the multicast mask value in device tree is configured not all
0xff, the broadcast mac will be lost from tcam table after the
execution of command 'ifconfig up'. The address is appended by
hns_ae_start, but will be clear later by hns_nic_set_rx_mode
called in dev_open process.

This patch fixed it by not use the multicast mask when add a
broadcast address.

Fixes: b5996f11 ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: NXi Wang <wangxi11@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0b964e5

net: sched: return error code when tcf proto is not found · 0e399035

由 Vlad Buslov 提交于 6月 04, 2018

If requested tcf proto is not found, get and del filter netlink protocol
handlers output error message to extack, but do not return actual error
code. Add check to return ENOENT when result of tp find function is NULL
pointer.

Fixes: c431f89b ("net: sched: split tc_ctl_tfilter into three handlers")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e399035

team: use netdev_features_t instead of u32 · 25ea6654

由 Dan Carpenter 提交于 6月 04, 2018

This code was introduced in 2011 around the same time that we made
netdev_features_t a u64 type.  These days a u32 is not big enough to
hold all the potential features.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25ea6654

net_failover: Use netdev_features_t instead of u32 · a746407a

由 Dan Carpenter 提交于 6月 04, 2018

The features mask needs to be a netdev_features_t (u64) because a u32
is not big enough.

Fixes: cfc80d9a ("net: Introduce net_failover driver")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a746407a

qed: use dma_zalloc_coherent instead of allocator/memset · ff2e351e

由 YueHaibing 提交于 6月 04, 2018

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff2e351e

wan/fsl_ucc_hdlc: use dma_zalloc_coherent instead of allocator/memset · 1f55c286

由 YueHaibing 提交于 6月 04, 2018

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f55c286

Merge branch 'for-upstream' of... · 828da432

由 David S. Miller 提交于 6月 04, 2018

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-06-04

Here's one last bluetooth-next pull request for the 4.18 kernel:

 - New USB device IDs for Realtek 8822BE and 8723DE
 - reset/resume fix for Dell Inspiron 5565
 - Fix HCI_UART_INIT_PENDING flag behavior
 - Fix patching behavior for some ATH3012 models
 - A few other minor cleanups & fixes

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

828da432

docs: networking: fix minor typos in various documentation files · bb38ccce

由 Olivier Gayot 提交于 6月 04, 2018

This patch fixes some typos/misspelling errors in the
Documentation/networking files.
Signed-off-by: NOlivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb38ccce

net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets · f396922d

由 Maciej Żenczykowski 提交于 6月 03, 2018

It is not safe to do so because such sockets are already in the
hash tables and changing these options can result in invalidating
the tb->fastreuse(port) caching.

This can have later far reaching consequences wrt. bind conflict checks
which rely on these caches (for optimization purposes).

Not to mention that you can currently end up with two identical
non-reuseport listening sockets bound to the same local ip:port
by clearing reuseport on them after they've already both been bound.

There is unfortunately no EISBOUND error or anything similar,
and EISCONN seems to be misleading for a bound-but-not-connected
socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
is the closest you can get to meaning 'socket in bad state'.
(although perhaps EINVAL wouldn't be a bad choice either?)

This does unfortunately run the risk of breaking buggy
userspace programs...
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f396922d

net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization · 79e9fed4

由 Maciej Żenczykowski 提交于 6月 03, 2018

This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.

It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
  ie. bound to 'lo' interface or where one of source or destination
  IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.

This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').

This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)

Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.

This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>
Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e9fed4

qed: Add srq core support for RoCE and iWARP · 39dbc646

由 Yuval Bason 提交于 6月 03, 2018

This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
Signed-off-by: NYuval Bason <yuval.bason@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39dbc646

Merge branch 'bnx2-warnings' · 7a9ee41b

由 David S. Miller 提交于 6月 04, 2018

Varsha Rao says:

====================
net: bnx2: Fix checkpatch and clang warnings

This patchset fixes NULL comparison and extra parentheses, checkpatch
and clang warnings.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a9ee41b

net: ethernet: bnx2: Replace NULL comparison · b8aac410

由 Varsha Rao 提交于 6月 03, 2018

This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL
with !x, by using the following coccinelle script:

@disable is_null@
expression e;
@@
-e==NULL
+!e
Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8aac410

net: ethernet: bnx2: Remove extra parentheses · 6dc5aa21

由 Varsha Rao 提交于 6月 03, 2018

The following coccinelle script removes extra parentheses to fix the
clang warning of extraneous parentheses.

@disable paren@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s
Suggested-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dc5aa21

net: gemini: fix spelling mistake: "it" -> "is" · 13ce3bc9

由 YueHaibing 提交于 6月 03, 2018

Trivial fix to spelling mistake in gemini dev_warn message
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13ce3bc9

cls_flower: Fix comparing of old filter mask with new filter · f6521c58

由 Paul Blakey 提交于 6月 03, 2018

We incorrectly compare the mask and the result is that we can't modify
an already existing rule.

Fix that by comparing correctly.

Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
Reported-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6521c58

cls_flower: Fix missing free of rhashtable · de9dc650

由 Paul Blakey 提交于 6月 03, 2018

When destroying the instance, destroy the head rhashtable.

Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
Reported-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de9dc650

net: skbuff.h: drop unneeded <linux/slab.h> · 1f4c7413

由 Randy Dunlap 提交于 6月 02, 2018

<linux/skbuff.h> does not use nor need <linux/slab.h>, so drop this
header file from skbuff.h.

<linux/skbuff.h> is currently #included in around 1200 C source and
header files, making it the 31st most-used header file.

Build tested [allmodconfig] on 20 arch-es.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f4c7413

net: chelsio: Use zeroing memory allocator instead of allocator/memset · 40434a67

由 YueHaibing 提交于 6月 03, 2018

Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40434a67

rxrpc: Fix handling of call quietly cancelled out on server · 1a025028

由 David Howells 提交于 6月 03, 2018

Sometimes an in-progress call will stop responding on the fileserver when
the fileserver quietly cancels the call with an internally marked abort
(RX_CALL_DEAD), without sending an ABORT to the client.

This causes the client's call to eventually expire from lack of incoming
packets directed its way, which currently leads to it being cancelled
locally with ETIME.  Note that it's not currently clear as to why this
happens as it's really hard to reproduce.

The rotation policy implement by kAFS, however, doesn't differentiate
between ETIME meaning we didn't get any response from the server and ETIME
meaning the call got cancelled mid-flow.  The latter leads to an oops when
fetching data as the rotation partially resets the afs_read descriptor,
which can result in a cleared page pointer being dereferenced because that
page has already been filled.

Handle this by the following means:

 (1) Set a flag on a call when we receive a packet for it.

 (2) Store the highest packet serial number so far received for a call
     (bearing in mind this may wrap).

 (3) If, when the "not received anything recently" timeout expires on a
     call, we've received at least one packet for a call and the connection
     as a whole has received packets more recently than that call, then
     cancel the call locally with ECONNRESET rather than ETIME.

     This indicates that the call was definitely in progress on the server.

 (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
     don't try the next server, but rather abort the call.

     This avoids the oops as we don't try to reuse the afs_read struct.
     Rather, as-yet ungotten pages will be reread at a later data.

Also:

 (5) Add an rxrpc tracepoint to log detection of the call being reset.

Without this, I occasionally see an oops like the following:

    general protection fault: 0000 [#1] SMP PTI
    ...
    RIP: 0010:_copy_to_iter+0x204/0x310
    RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
    RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
    RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
    RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
    R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
    R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
    FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
    Call Trace:
     skb_copy_datagram_iter+0x14e/0x289
     rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
     ? trace_buffer_unlock_commit_regs+0x4f/0x89
     rxrpc_kernel_recv_data+0x149/0x421
     afs_extract_data+0x1e0/0x798
     ? afs_wait_for_call_to_complete+0xc9/0x52e
     afs_deliver_fs_fetch_data+0x33a/0x5ab
     afs_deliver_to_call+0x1ee/0x5e0
     ? afs_wait_for_call_to_complete+0xc9/0x52e
     afs_wait_for_call_to_complete+0x12b/0x52e
     ? wake_up_q+0x54/0x54
     afs_make_call+0x287/0x462
     ? afs_fs_fetch_data+0x3e6/0x3ed
     ? rcu_read_lock_sched_held+0x5d/0x63
     afs_fs_fetch_data+0x3e6/0x3ed
     afs_fetch_data+0xbb/0x14a
     afs_readpages+0x317/0x40d
     __do_page_cache_readahead+0x203/0x2ba
     ? ondemand_readahead+0x3a7/0x3c1
     ondemand_readahead+0x3a7/0x3c1
     generic_file_buffered_read+0x18b/0x62f
     __vfs_read+0xdb/0xfe
     vfs_read+0xb2/0x137
     ksys_read+0x50/0x8c
     do_syscall_64+0x7d/0x1a0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note the weird value in RDI which is a result of trying to kmap() a NULL
page pointer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a025028

Allow ethtool to change tun link settings · 4e24f2dd

由 Chas Williams 提交于 6月 02, 2018

Let user space set whatever it would like to advertise for the
tun interface.  Preserve the existing defaults.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e24f2dd

Merge branch 'sh_eth-fix-and-clean-up-sh_eth_soft_swap' · 4cd328f8

由 David S. Miller 提交于 6月 04, 2018

Sergei Shtylyov says:

====================
sh_eth: fix & clean up sh_eth_soft_swap()

Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an
old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually
care) plus couple clean ups around sh_eth_soft_swap()...
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cd328f8

sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap() · 1100149a

由 Sergei Shtylyov 提交于 6月 02, 2018

When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs
to be rounded up -- that's just asking for DIV_ROUND_UP()!
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1100149a

sh_eth: uninline sh_eth_soft_swap() · bb2fa4e8

由 Sergei Shtylyov 提交于 6月 02, 2018

sh_eth_tsu_soft_swap() is called twice by the driver, remove *inline* and
move  that function  from the header to the driver itself to let gcc decide
whether to expand it inline or not...
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb2fa4e8

sh_eth: make sh_eth_soft_swap() work on ARM · 232b6743

由 Sergei Shtylyov 提交于 6月 02, 2018

Browsing thru the driver disassembly, I noticed that ARM gcc generated
no code whatsoever for sh_eth_soft_swap() while building a little-endian
kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however
it got implicitly #define'd when building with the SH gcc (I could only
find the explicit #define __LITTLE_ENDIAN that was #include'd when building
a little-endian kernel). Luckily, the Ether controller only doing big-
endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs
implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we
need to fix the #ifdef inside sh_eth_soft_swap() to something that would
work on all architectures...
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

232b6743

ixgbe: fix broken ipsec Rx with proper cast on spi · 9a75fa5c

由 Shannon Nelson 提交于 5月 31, 2018

Fix up a cast problem introduced by a sparse cleanup patch.  This fixes
a problem where the encrypted packets were not recognized on Rx and
subsequently dropped.

Fixes: 9cfbfa70 ("ixgbe: cleanup sparse warnings")
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9a75fa5c

ixgbe: check ipsec ip addr against mgmt filters · 2a8a1552

由 Shannon Nelson 提交于 5月 30, 2018

Make sure we don't try to offload the decryption of an incoming
packet that should get delivered to the management engine.  This
is a corner case that will likely be very seldom seen, but could
really confuse someone if they were to hit it.
Suggested-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2a8a1552

Merge branch 'mlxsw-Fixes-in-offloading-of-mirror-to-gretap' · 20677108

由 David S. Miller 提交于 6月 04, 2018

Ido Schimmel says:

====================
mlxsw: Fixes in offloading of mirror-to-gretap

Petr says:

These two patches fix issues in offloading of mirror-to-gretap when
bridge is present in the underlay.

In patch #1, reconsideration of SPAN configuration is not done right at
the point that SWITCHDEV_OBJ_ID_PORT_VLAN deletion notification is
distributed, but is postponed, because the notifications are actually
distributed before the relevant change is implemented in the bridge.

In patch #2, a problem in configuring VLAN tagging in situations when a
VLAN device is on top of an 802.1Q bridge whose egress port is marked as
"egress untagged". In that case, mlxsw would neglect to suppress the
tagging implicitly assumed after the VLAN device was seen.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20677108

mlxsw: spectrum_span: Suppress VLAN on BRIDGE_VLAN_INFO_UNTAGGED · 1fc68bb7

由 Petr Machata 提交于 6月 02, 2018

When offloading mirroring to gretap or ip6gretap netdevices, an 802.1q
bridge is one of the soft devices permissible in the underlay when
resolving the packet path. After the packet path is resolved to a
particular bridge egress device, flags on packet VLAN determine whether
the egressed packet should be tagged.

The current logic however only ever sets the VLAN tag, never suppresses
it. Thus if there's a VLAN netdevice above the bridge that determines
the packet VLAN, that VLAN is never unset, and mirroring is configured
with VLAN tagging.

Fix by setting the packet VLAN on both branches: set to zero (for unset)
when BRIDGE_VLAN_INFO_UNTAGGED, copy the resolved VLAN (e.g. from bridge
PVID) otherwise.

Fixes: 946a11e7 ("mlxsw: spectrum_span: Allow bridge for gretap mirror")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fc68bb7

mlxsw: spectrum_switchdev: Postpone respin on object deletion · f07ff014

由 Petr Machata 提交于 6月 02, 2018

VLAN deletion notifications are emitted before the relevant change is
projected to bridge configuration. Thus, like with VLAN addition,
schedule SPAN respin for later.

Fixes: c520bc69 ("mlxsw: Respin SPAN on switchdev events")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f07ff014

ixgbe: fix possible race in reset subtask · 88adce4e

由 Tony Nguyen 提交于 5月 30, 2018

Similar to ixgbevf, the same possibility for race exists. Extend the RTNL
lock in ixgbe_reset_subtask() to protect the state bits; this is to make
sure that we get the most up-to-date values for the bits and avoid a
possible race when going down.
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

88adce4e

bpf, i40e: add meta data support · cc5b114d

由 Daniel Borkmann 提交于 5月 28, 2018

Add support for XDP meta data when using build skb variant of
the i40e driver. Implementation is analogous to the existing
ixgbe and ixgbevf support for meta data from 366a88fe ("bpf,
ixgbe: add meta data support") and be833332 ("ixgbevf: Add
support for meta data"). With the build skb variant we get
192 bytes of extra headroom which can be used for encaps or
meta data.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Tested-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cc5b114d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功