提交 · da96786e26c3ae47316db2b92046b11268c4379c · openanolis / cloud-kernel

04 11月, 2016 5 次提交

net: tcp: check skb is non-NULL for exact match on lookups · da96786e

由 David Ahern 提交于 11月 02, 2016

Andrey reported the following error report while running the syzkaller
fuzzer:

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800398c4480 task.stack: ffff88003b468000
RIP: 0010:[<ffffffff83091106>]  [<     inline     >]
inet_exact_dif_match include/net/tcp.h:808
RIP: 0010:[<ffffffff83091106>]  [<ffffffff83091106>]
__inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
RSP: 0018:ffff88003b46f270  EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000004242 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc90000e3c000 RDI: 0000000000000054
RBP: ffff88003b46f2d8 R08: 0000000000004000 R09: ffffffff830910e7
R10: 0000000000000000 R11: 000000000000000a R12: ffffffff867fa0c0
R13: 0000000000004242 R14: 0000000000000003 R15: dffffc0000000000
FS:  00007fb135881700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020cc3000 CR3: 000000006d56a000 CR4: 00000000000006f0
Stack:
 0000000000000000 000000000601a8c0 0000000000000000 ffffffff00004242
 424200003b9083c2 ffff88003def4041 ffffffff84e7e040 0000000000000246
 ffff88003a0911c0 0000000000000000 ffff88003a091298 ffff88003b9083ae
Call Trace:
 [<ffffffff831100f4>] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
 [<ffffffff83115b1b>] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
 [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
...

MD5 has a code path that calls __inet_lookup_listener with a null skb,
so inet{6}_exact_dif_match needs to check skb against null before pulling
the flag.

Fixes: a04a480d ("net: Require exact match for TCP socket lookups if
       dif is l3mdev")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da96786e

tcp: fix potential memory corruption · ac9e70b1

由 Eric Dumazet 提交于 11月 02, 2016

Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.

Then max_skb_frags is lowered to 14 or smaller value.

tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.

Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac9e70b1

qede: Correctly map aggregation replacement pages · 9512925a

由 Mintz, Yuval 提交于 11月 02, 2016

Driver allocates replacement buffers before-hand to make
sure whenever an aggregation begins there would be a replacement
for the Rx buffers, as we can't release the buffer until
aggregation is terminated and driver logic assumes the Rx rings
are always full.

For every other Rx page that's being allocated [I.e., regular]
the page is being completely mapped while for the replacement
buffers only the first portion of the page is being mapped.
This means that:
  a. Once replacement buffer replenishes the regular Rx ring,
assuming there's more than a single packet on page we'd post unmapped
memory toward HW [assuming mapping is actually done in granularity
smaller than page].
  b. Unmaps are being done for the entire page, which is incorrect.

Fixes: 55482edc ("qede: Add slowpath/fastpath support and enable hardware GRO")
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9512925a

cxgb4: correct device ID of T6 adapter · 0b53df1e

由 Hariprasad Shenai 提交于 11月 02, 2016

Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b53df1e

inet: fix sleeping inside inet_wait_for_connect() · 14135f30

由 WANG Cong 提交于 11月 01, 2016

Andrey reported this kernel warning:

  WARNING: CPU: 0 PID: 4608 at kernel/sched/core.c:7724
  __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
  do not call blocking ops when !TASK_RUNNING; state=1 set at
  [<ffffffff811f5a5c>] prepare_to_wait+0xbc/0x210
  kernel/sched/wait.c:178
  Modules linked in:
  CPU: 0 PID: 4608 Comm: syz-executor Not tainted 4.9.0-rc2+ #320
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   ffff88006625f7a0 ffffffff81b46914 ffff88006625f818 0000000000000000
   ffffffff84052960 0000000000000000 ffff88006625f7e8 ffffffff81111237
   ffff88006aceac00 ffffffff00001e2c ffffed000cc4beff ffffffff84052960
  Call Trace:
   [<     inline     >] __dump_stack lib/dump_stack.c:15
   [<ffffffff81b46914>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
   [<ffffffff81111237>] __warn+0x1a7/0x1f0 kernel/panic.c:550
   [<ffffffff8111132c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:565
   [<ffffffff811922fc>] __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
   [<     inline     >] slab_pre_alloc_hook mm/slab.h:393
   [<     inline     >] slab_alloc_node mm/slub.c:2634
   [<     inline     >] slab_alloc mm/slub.c:2716
   [<ffffffff81508da0>] __kmalloc_track_caller+0x150/0x2a0 mm/slub.c:4240
   [<ffffffff8146be14>] kmemdup+0x24/0x50 mm/util.c:113
   [<ffffffff8388b2cf>] dccp_feat_clone_sp_val.part.5+0x4f/0xe0 net/dccp/feat.c:374
   [<     inline     >] dccp_feat_clone_sp_val net/dccp/feat.c:1141
   [<     inline     >] dccp_feat_change_recv net/dccp/feat.c:1141
   [<ffffffff8388d491>] dccp_feat_parse_options+0xaa1/0x13d0 net/dccp/feat.c:1411
   [<ffffffff83894f01>] dccp_parse_options+0x721/0x1010 net/dccp/options.c:128
   [<ffffffff83891280>] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644
   [<ffffffff838b8a94>] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681
   [<     inline     >] sk_backlog_rcv ./include/net/sock.h:872
   [<ffffffff82b7ceb6>] __release_sock+0x126/0x3a0 net/core/sock.c:2044
   [<ffffffff82b7d189>] release_sock+0x59/0x1c0 net/core/sock.c:2502
   [<     inline     >] inet_wait_for_connect net/ipv4/af_inet.c:547
   [<ffffffff8316b2a2>] __inet_stream_connect+0x5d2/0xbb0 net/ipv4/af_inet.c:617
   [<ffffffff8316b8d5>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:656
   [<ffffffff82b705e4>] SYSC_connect+0x244/0x2f0 net/socket.c:1533
   [<ffffffff82b72dd4>] SyS_connect+0x24/0x30 net/socket.c:1514
   [<ffffffff83fbf701>] entry_SYSCALL_64_fastpath+0x1f/0xc2
  arch/x86/entry/entry_64.S:209

Unlike commit 26cabd31
("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the
sleeping function is called before schedule_timeout(), this is indeed
a bug. Fix this by moving the wait logic to the new API, it is similar
to commit ff960a73
("netdev, sched/wait: Fix sleeping inside wait event").
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14135f30

03 11月, 2016 4 次提交

xen-netfront: cast grant table reference first to type int · 269ebce4

由 Dongli Zhang 提交于 11月 02, 2016

IS_ERR_VALUE() in commit 87557efc
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

269ebce4

ip6_udp_tunnel: remove unused IPCB related codes · 4fd19c15

由 Eli Cooper 提交于 11月 01, 2016

Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are
never used before it reaches ip6tunnel_xmit(), and past that point the
control buffer is no longer interpreted as IPCB.

This clears these unused IPCB related codes. Currently there is no skb
scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be
cleared for IPv4 packets, as shown in 5146d1f1
("tunnel: Clear IPCB(skb)->opt before dst_link_failure called").
Signed-off-by: NEli Cooper <elicooper@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fd19c15

ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · 23f4ffed

由 Eli Cooper 提交于 11月 01, 2016

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: stable@vger.kernel.org
Signed-off-by: NEli Cooper <elicooper@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23f4ffed

MAINTAINERS: Update MELLANOX MLX5 core VPI driver maintainers · 45788f1f

由 Saeed Mahameed 提交于 11月 01, 2016

Add myself as a maintainer for mlx5 core driver as well.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45788f1f

02 11月, 2016 7 次提交

net: mv643xx_eth: ensure coalesce settings survive read-modify-write · b9b84fc0

由 Russell King 提交于 11月 01, 2016

The coalesce settings behave badly when changing just one value:

... # ethtool -c eth0
rx-usecs: 249
... # ethtool -C eth0 tx-usecs 250
... # ethtool -c eth0
rx-usecs: 248

This occurs due to rounding errors when calculating the microseconds
value - the divisons round down.  This causes (eg) the rx-usecs to
decrease by one every time the tx-usecs value is set as per the above.

Fix this by making the divison round-to-nearest.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9b84fc0

net/mlx5: Simplify a test · 42fb18fd

由 Christophe Jaillet 提交于 11月 01, 2016

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42fb18fd

unix: escape all null bytes in abstract unix domain socket · e7947ea7

由 Isaac Boukris 提交于 11月 01, 2016

Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed out to
proc the same way the null prefix is currently being
translated.

This helps for tools such as netstat, lsof and the proc
based implementation in ss to show all the significant
bytes of the name (instead of getting cut at the first
null occurrence).
Signed-off-by: NIsaac Boukris <iboukris@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7947ea7

net: qcom/emac: use correct value for SGMII_LN_UCDR_SO_GAIN_MODE0 · 7bb9f731

由 Timur Tabi 提交于 10月 31, 2016

The documentation says that SGMII_LN_UCDR_SO_GAIN_MODE0 should be
set to 0, not 6, on the Qualcomm Technologies QDF2432.
Signed-off-by: NTimur Tabi <timur@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bb9f731

Merge branch 'xgene-coalescing-bugs' · d483ff23

由 David S. Miller 提交于 11月 01, 2016

Iyappan Subramanian says:

====================
drivers: net: xgene: Fix coalescing bugs

This patch set fixes the following,

     1. Since ethernet v1 hardware has a bug related to coalescing,
	disabling this feature
     2. Fixing ethernet v2 hardware, interrupt trigger region
	id to 2, to kickoff coalescing
====================
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d483ff23

drivers: net: xgene: fix: Coalescing values for v2 hardware · f126df85

由 Iyappan Subramanian 提交于 10月 31, 2016

Changing the interrupt trigger region id to 2 and the
corresponding threshold set0/set1 values to 8/16.
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NToan Le <toanle@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f126df85

drivers: net: xgene: fix: Disable coalescing on v1 hardware · b5a4a3eb

由 Iyappan Subramanian 提交于 10月 31, 2016

Since ethernet v1 hardware has a bug related to coalescing, disabling
this feature.
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NToan Le <toanle@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5a4a3eb

01 11月, 2016 17 次提交

Merge tag 'linux-can-fixes-for-4.9-20161031' of... · 2c8657d2

由 David S. Miller 提交于 10月 31, 2016

Merge tag 'linux-can-fixes-for-4.9-20161031' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-10-31

this is a pull request of two patches for the upcoming v4.9 release.

The first patch is by Lukas Resch for the sja1000 plx_pci driver that adds
support for Moxa CAN devices. The second patch is by Oliver Hartkopp and fixes
a potential kernel panic in the CAN broadcast manager.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c8657d2

bgmac: stop clearing DMA receive control register right after it is set · fcdefcca

由 Andy Gospodarek 提交于 10月 31, 2016

Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today. I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.

On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.

This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.
Signed-off-by: NAndy Gospodarek <gospo@broadcom.com>
Fixes: 56ceecde ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcdefcca

Merge branch 'sctp-hold-transport-fixes' · 1137086f

由 David S. Miller 提交于 10月 31, 2016

Xin Long says:

====================
sctp: a bunch of fixes by holding transport

There are several places where it holds assoc after getting transport by
searching from transport rhashtable, it may cause use-after-free issue.

This patchset is to fix them by holding transport instead.

v1->v2:
  Fix the changelog of patch 2/3
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1137086f

sctp: hold transport instead of assoc when lookup assoc in rx path · dae399d7

由 Xin Long 提交于 10月 31, 2016

Prior to this patch, in rx path, before calling lock_sock, it needed to
hold assoc when got it by __sctp_lookup_association, in case other place
would free/put assoc.

But in __sctp_lookup_association, it lookup and hold transport, then got
assoc by transport->assoc, then hold assoc and put transport. It means
it didn't hold transport, yet it was returned and later on directly
assigned to chunk->transport.

Without the protection of sock lock, the transport may be freed/put by
other places, which would cause a use-after-free issue.

This patch is to fix this issue by holding transport instead of assoc.
As holding transport can make sure to access assoc is also safe, and
actually it looks up assoc by searching transport rhashtable, to hold
transport here makes more sense.

Note that the function will be renamed later on on another patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dae399d7

sctp: return back transport in __sctp_rcv_init_lookup · 7c17fcc7

由 Xin Long 提交于 10月 31, 2016

Prior to this patch, it used a local variable to save the transport that is
looked up by __sctp_lookup_association(), and didn't return it back. But in
sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
even if it found the transport, it was still initializing chunk->transport
with null instead.

This patch is to return the transport back through transport pointer
that is from __sctp_rcv_lookup_harder().
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c17fcc7

sctp: hold transport instead of assoc in sctp_diag · cd26da4f

由 Xin Long 提交于 10月 31, 2016

In sctp_transport_lookup_process(), Commit 1cceda78 ("sctp: fix
the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out
of rcu lock, but it put transport and hold assoc instead, and ignore
that cb() still uses transport. It may cause a use-after-free issue.

This patch is to hold transport instead of assoc there.

Fixes: 1cceda78 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd26da4f

xen-netfront: do not cast grant table reference to signed short · 87557efc

由 Dongli Zhang 提交于 10月 31, 2016

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87557efc

can: bcm: fix warning in bcm_connect/proc_register · deb507f9

由 Oliver Hartkopp 提交于 10月 24, 2016

Andrey Konovalov reported an issue with proc_register in bcm.c.
As suggested by Cong Wang this patch adds a lock_sock() protection and
a check for unsuccessful proc_create_data() in bcm_connect().

Reference: http://marc.info/?l=linux-netdev&m=147732648731237Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

deb507f9

can: sja1000: plx_pci: Add support for Moxa CAN devices · 460d2830

由 Lukas Resch 提交于 10月 10, 2016

This patch adds support for Moxa CAN devices.
Signed-off-by: NLukas Resch <l.resch@incubedit.com>
Signed-off-by: NChristoph Zehentner <c.zehentner@incubedit.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

460d2830

Merge tag 'wireless-drivers-for-davem-2016-10-30' of... · df67e97a

由 David S. Miller 提交于 10月 31, 2016

Merge tag 'wireless-drivers-for-davem-2016-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.9

iwlwifi

* some fixes for suspend/resume with unified FW images
* a fix for a false-positive lockdep report
* a fix for multi-queue that caused an unnecessary 1 second latency
* a fix for an ACPI parsing bug that caused a misleading error message

brcmfmac

* fix a variable uninitialised warning in brcmf_cfg80211_start_ap()
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df67e97a

mlxsw: spectrum: Fix incorrect reuse of MID entries · 46d0847c

由 Ido Schimmel 提交于 10月 30, 2016

In the device, a MID entry represents a group of local ports, which can
later be bound to a MDB entry.

The lookup of an existing MID entry is currently done using the provided
MC MAC address and VID, from the Linux bridge. However, this can result
in an incorrect reuse of the same MID index in different VLAN-unaware
bridges (same IP MC group and VID 0).

Fix this by performing the lookup based on FID instead of VID, which is
unique across different bridges.

Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d0847c

qede: Fix statistics' strings for Tx/Rx queues · cbbf049a

由 Mintz, Yuval 提交于 10月 30, 2016

When an interface is configured to use Tx/Rx-only queues,
the length of the statistics would be shortened to accomodate only the
statistics required per-each queue, and the values would be provided
accordingly.
However, the strings provided would still contain both Tx and Rx strings
for each one of the queues [regardless of its configuration], which might
lead to out-of-bound access when filling the buffers as well as incorrect
statistics presented.

Fixes: 9a4d7e86 ("qede: Add support for Tx/Rx-only queues.")
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbbf049a

net: mangle zero checksum in skb_checksum_help() · 4f2e4ad5

由 Eric Dumazet 提交于 10月 29, 2016

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f2e4ad5

net: clear sk_err_soft in sk_clone_lock() · e551c32d

由 Eric Dumazet 提交于 10月 28, 2016

At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.

Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e551c32d

dctcp: avoid bogus doubling of cwnd after loss · ce6dd233

由 Florian Westphal 提交于 10月 28, 2016

If a congestion control module doesn't provide .undo_cwnd function,
tcp_undo_cwnd_reduction() will set cwnd to

   tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);

... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.

This can cause severe growth of cwnd (eventually overflowing u32).

Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.

Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce6dd233

ipv6: add mtu lock check in __ip6_rt_update_pmtu · 19bda36c

由 Xin Long 提交于 10月 28, 2016

Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
It leaded to that mtu lock doesn't really work when receiving the pkt
of ICMPV6_PKT_TOOBIG.

This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
did in __ip_rt_update_pmtu.
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19bda36c

ipv6: Don't use ufo handling on later transformed packets · f89c56ce

由 Jakub Sitnicki 提交于 10月 26, 2016

Similar to commit c146066a ("ipv4: Don't use ufo handling on later
transformed packets"), don't perform UFO on packets that will be IPsec
transformed. To detect it we rely on the fact that headerlen in
dst_entry is non-zero only for transformation bundles (xfrm_dst
objects).

Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
such as a dummy device:

  DEV=dum0 LEN=1493

  ip li add $DEV type dummy
  ip addr add fc00::1/64 dev $DEV nodad
  ip link set $DEV up
  ip xfrm policy add dir out src fc00::1 dst fc00::2 \
     tmpl src fc00::1 dst fc00::2 proto esp spi 1
  ip xfrm state add src fc00::1 dst fc00::2 \
     proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b

  tcpdump -n -nn -i $DEV -t &
  socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN

tcpdump output before:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|48)
  IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88

... and after:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|80)

Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f89c56ce

31 10月, 2016 1 次提交

r8152: Fix broken RX checksums. · b9a321b4

由 Mark Lord 提交于 10月 30, 2016

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware RX checksum support for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9a321b4

30 10月, 2016 6 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 2a26d99b

由 Linus Torvalds 提交于 10月 29, 2016

Pull networking fixes from David Miller:
 "Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...

2a26d99b

geneve: avoid using stale geneve socket. · fceb9c3e

由 pravin shelar 提交于 10月 28, 2016

This patch is similar to earlier vxlan patch.
Geneve device close operation frees geneve socket. This
operation can race with geneve-xmit function which
dereferences geneve socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fceb9c3e

vxlan: avoid using stale vxlan socket. · c6fcc4fc

由 pravin shelar 提交于 10月 28, 2016

When vxlan device is closed vxlan socket is freed. This
operation can race with vxlan-xmit function which
dereferences vxlan socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6fcc4fc

qede: Fix out-of-bound fastpath memory access · 087892d2

由 Mintz, Yuval 提交于 10月 29, 2016

Driver allocates a shadow array for transmitted SKBs with X entries;
That means valid indices are {0,...,X - 1}. [X == 8191]
Problem is the driver also uses X as a mask for a
producer/consumer in order to choose the right entry in the
array which allows access to entry X which is out of bounds.

To fix this, simply allocate X + 1 entries in the shadow array.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

087892d2

net: phy: dp83848: add dp83822 PHY support · 30347834

由 Roger Quadros 提交于 10月 28, 2016

This PHY has a compatible register set with DP83848x so
add support for it.
Acked-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NRoger Quadros <rogerq@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30347834

enic: fix rq disable · 9fe1c98a

由 Govindarajulu Varadarajan 提交于 10月 27, 2016

When MTU is changed from 9000 to 1500 while there is burst of inbound 9000
bytes packets, adaptor sometimes delivers 9000 bytes packets to 1500 bytes
buffers. This causes memory corruption and sometimes crash.

This is because of a race condition in adaptor between "RQ disable"
clearing descriptor mini-cache and mini-cache valid bit being set by
completion of descriptor fetch. This can result in stale RQ desc being
cached and used when packets arrive. In this case, the stale descriptor
have old MTU value.

Solution is to write RQ->disable twice. The first write will stop any
further desc fetches, allowing the second disable to clear the mini-cache
valid bit without danger of a race.

Also, the check for rq->running becoming 0 after writing rq->enable to 0
is not done properly. When incoming packets are flooding the interface,
rq->running will pulse high for each dropped packet. Since the driver was
waiting for 10us between each poll, it is possible to see rq->running = 1
1000 times in a row, even though it is not actually stuck running.
This results in false failure of vnic_rq_disable(). Fix is to try more
than 1000 time without delay between polls to ensure we do not miss when
running goes low.

In old adaptors rq->enable needs to be re-written to 0 when posted_index
is reset in vnic_rq_clean() in order to keep rq->prefetch_index in sync.
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fe1c98a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功