提交 · e2a4800e75780ccf4e6c2487f82b688ba736eb18 · xiphi1978 / linux

30 1月, 2015 5 次提交

ppp: deflate: never return len larger than output buffer · e2a4800e

由 Florian Westphal 提交于 1月 28, 2015

When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.

When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.

This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.
Reported-by: NIain Douglas <centos@1n6.org.uk>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2a4800e

Merge branch 'netns' · d445d63b

由 David S. Miller 提交于 1月 29, 2015

Nicolas Dichtel says:

====================
netns: audit netdevice creation with IFLA_NET_NS_[PID|FD]

When one of these attributes is set, the netdevice is created into the netns
pointed by IFLA_NET_NS_[PID|FD] (see the call to rtnl_create_link() in
rtnl_newlink()). Let's call this netns the dest_net. After this creation, if the
newlink handler exists, it is called with a netns argument that points to the
netns where the netlink message has been received (called src_net in the code)
which is the link netns.
Hence, with one of these attributes, it's possible to create a x-netns
netdevice.

Here is the result of my code review:
- all ip tunnels (sit, ipip, ip6_tunnels, gre[tap][v6], ip_vti[6]) does not
  really allows to use this feature: the netdevice is created in the dest_net
  and the src_net is completely ignored in the newlink handler.
- VLAN properly handles this x-netns creation.
- bridge ignores src_net, which seems fine (NETIF_F_NETNS_LOCAL is set).
- CAIF subsystem is not clear for me (I don't know how it works), but it seems
  to wrongly use src_net. Patch #1 tries to fix this, but it was done only by
  code review (and only compile-tested), so please carefully review it. I may
  miss something.
- HSR subsystem uses src_net to parse IFLA_HSR_SLAVE[1|2], but the netdevice has
  the flag NETIF_F_NETNS_LOCAL, so the question is: does this netdevice really
  supports x-netns? If not, the newlink handler should use the dest_net instead
  of src_net, I can provide the patch.
- ieee802154 uses also src_net and does not have NETIF_F_NETNS_LOCAL. Same
  question: does this netdevice really supports x-netns?
- bonding ignores src_net and flag NETIF_F_NETNS_LOCAL is set, ie x-netns is not
  supported. Fine.
- CAN does not support rtnl/newlink, ok.
- ipvlan uses src_net and does not have NETIF_F_NETNS_LOCAL. After looking at
  the code, it seems that this drivers support x-netns. Am I right?
- macvlan/macvtap uses src_net and seems to have x-netns support.
- team ignores src_net and has the flag NETIF_F_NETNS_LOCAL, ie x-netns is not
  supported. Ok.
- veth uses src_net and have x-netns support ;-) Ok.
- VXLAN didn't properly handle this. The link netns (vxlan->net) is the src_net
  and not dest_net (see patch #2). Note that it was already possible to create a
  x-netns vxlan before the commit f01ec1c0 ("vxlan: add x-netns support")
  but the nedevice remains broken.

To summarize:
 - CAIF patch must be carefully reviewed
 - for HSR, ieee802154, ipvlan: is x-netns supported?
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d445d63b

vxlan: setup the right link netns in newlink hdlr · 33564bbb

由 Nicolas Dichtel 提交于 1月 26, 2015

Rename the netns to src_net to avoid confusion with the netns where the
interface stands. The user may specify IFLA_NET_NS_[PID|FD] to create
a x-netns netndevice: IFLA_NET_NS_[PID|FD] points to the netns where the
netdevice stands and src_net to the link netns.

Note that before commit f01ec1c0 ("vxlan: add x-netns support"), it was
possible to create a x-netns vxlan netdevice, but the netdevice was not
operational.

Fixes: f01ec1c0 ("vxlan: add x-netns support")
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33564bbb

caif: remove wrong dev_net_set() call · 8997c27e

由 Nicolas Dichtel 提交于 1月 26, 2015

src_net points to the netns where the netlink message has been received. This
netns may be different from the netns where the interface is created (because
the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.

It seems wrong to override the netns in the newlink() handler because if it
was not already src_net, it means that the user explicitly asks to create the
netdevice in another netns.

CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Fixes: 8391c4aa ("caif: Bugfixes in CAIF netdevice for close and flow control")
Fixes: c4125400 ("caif-hsi: Add rtnl support")
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8997c27e

lib/checksum.c: fix build for generic csum_tcpudp_nofold · 9ce35779

由 karl beldan 提交于 1月 29, 2015

Fixed commit added from64to32 under _#ifndef do_csum_ but used it
under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
robot reported TILEGX's). Move from64to32 under the latter.

Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NKarl Beldan <karl.beldan@rivierawaves.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce35779

29 1月, 2015 9 次提交

tcp: ipv4: initialize unicast_sock sk_pacing_rate · 811230cd

由 Eric Dumazet 提交于 1月 28, 2015

When I added sk_pacing_rate field, I forgot to initialize its value
in the per cpu unicast_sock used in ip_send_unicast_reply()

This means that for sch_fq users, RST packets, or ACK packets sent
on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
once we reach the per flow limit.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

811230cd

lib/checksum.c: fix carry in csum_tcpudp_nofold · 150ae0e9

由 karl beldan 提交于 1月 28, 2015

The carry from the 64->32bits folding was dropped, e.g with:
saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
csum_tcpudp_nofold returned 0 instead of 1.
Signed-off-by: NKarl Beldan <karl.beldan@rivierawaves.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

150ae0e9

bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify · 59ccaaaa

由 Roopa Prabhu 提交于 1月 28, 2015

Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081

This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
handler does not return any bytes in the skb.

Alternately, the skb->len check can be moved inside rtnl_notify.

For the bridge vlan case described in 92081, there is also a fix needed
in bridge driver to generate a proper notification. Will fix that in
subsequent patch.

v2: rebase patch on net tree
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59ccaaaa

Merge branch 'tcp_stretch_acks' · 95224ac1

由 David S. Miller 提交于 1月 28, 2015

Neal Cardwell says:

====================
fix stretch ACK bugs in TCP CUBIC and Reno

This patch series fixes the TCP CUBIC and Reno congestion control
modules to properly handle stretch ACKs in their respective additive
increase modes, and in the transitions from slow start to additive
increase.

This finishes the project started by commit 9f9843a7 ("tcp:
properly handle stretch acks in slow start"), which fixed behavior for
TCP congestion control when handling stretch ACKs in slow start mode.

Motivation: In the Jan 2015 netdev thread 'BW regression after "tcp:
refine TSO autosizing"', Eyal Perry documented a regression that Eric
Dumazet determined was caused by improper handling of TCP stretch
ACKs.

Background: LRO, GRO, delayed ACKs, and middleboxes can cause "stretch
ACKs" that cover more than the RFC-specified maximum of 2
packets. These stretch ACKs can cause serious performance shortfalls
in common congestion control algorithms, like Reno and CUBIC, which
were designed and tuned years ago with receiver hosts that were not
using LRO or GRO, and were instead ACKing every other packet.

Testing: at Google we have been using this approach for handling
stretch ACKs for CUBIC datacenter and Internet traffic for several
years, with good results.

v2:
 * fixed return type of tcp_slow_start() to be u32 instead of int
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95224ac1

tcp: fix timing issue in CUBIC slope calculation · d6b1a8a9

由 Neal Cardwell 提交于 1月 28, 2015

This patch fixes a bug in CUBIC that causes cwnd to increase slightly
too slowly when multiple ACKs arrive in the same jiffy.

If cwnd is supposed to increase at a rate of more than once per jiffy,
then CUBIC was sometimes too slow. Because the bic_target is
calculated for a future point in time, calculated with time in
jiffies, the cwnd can increase over the course of the jiffy while the
bic_target calculated as the proper CUBIC cwnd at time
t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only
increases on jiffy tick boundaries.

So since the cnt is set to:
	ca->cnt = cwnd / (bic_target - cwnd);
as cwnd increases but bic_target does not increase due to jiffy
granularity, the cnt becomes too large, causing cwnd to increase
too slowly.

For example:
- suppose at the beginning of a jiffy, cwnd=40, bic_target=44
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10
- suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai()
   increases cwnd to 41
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13

So now CUBIC will wait for 13 packets to be ACKed before increasing
cwnd to 42, insted of 10 as it should.

The fix is to avoid adjusting the slope (determined by ca->cnt)
multiple times within a jiffy, and instead skip to compute the Reno
cwnd, the "TCP friendliness" code path.
Reported-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6b1a8a9

tcp: fix stretch ACK bugs in CUBIC · 9cd981dc

由 Neal Cardwell 提交于 1月 28, 2015

Change CUBIC to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, because we are now precisely accounting for stretch ACKs,
including delayed ACKs, we can now remove the delayed ACK tracking and
estimation code that tracked recent delayed ACK behavior in
ca->delayed_ack.
Reported-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cd981dc

tcp: fix stretch ACK bugs in Reno · c22bdca9

由 Neal Cardwell 提交于 1月 28, 2015

Change Reno to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, if snd_cwnd crosses snd_ssthresh during slow start
processing, and we then exit slow start mode, we need to carry over
any remaining "credit" for packets ACKed and apply that to additive
increase by passing this remaining "acked" count to
tcp_cong_avoid_ai().
Reported-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c22bdca9

tcp: fix the timid additive increase on stretch ACKs · 814d488c

由 Neal Cardwell 提交于 1月 28, 2015

tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on
"stretch ACKs" -- cases where the receiver ACKed more than 1 packet in
a single ACK. For example, suppose w is 10 and we get a stretch ACK
for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2
(since acked/w = 20/10 = 2), but instead we were only increasing cwnd
by 1. This patch fixes that behavior.
Reported-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

814d488c

tcp: stretch ACK fixes prep · e73ebb08

由 Neal Cardwell 提交于 1月 28, 2015

LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
cover more than the RFC-specified maximum of 2 packets. These stretch
ACKs can cause serious performance shortfalls in common congestion
control algorithms that were designed and tuned years ago with
receiver hosts that were not using LRO or GRO, and were instead
politely ACKing every other packet.

This patch series fixes Reno and CUBIC to handle stretch ACKs.

This patch prepares for the upcoming stretch ACK bug fix patches. It
adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
changes all congestion control algorithms to pass in 1 for the ACKed
count. It also changes tcp_slow_start() to return the number of packet
ACK "credits" that were not processed in slow start mode, and can be
processed by the congestion control module in additive increase mode.

In future patches we will fix tcp_cong_avoid_ai() to handle stretch
ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
and additive increase mode.
Reported-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e73ebb08

28 1月, 2015 5 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 59343cd7

由 Linus Torvalds 提交于 1月 27, 2015

Pull networking fixes from David Miller:

 1) Don't OOPS on socket AIO, from Christoph Hellwig.

 2) Scheduled scans should be aborted upon RFKILL, from Emmanuel
    Grumbach.

 3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish.

 4) Fix RCU locking across copy_to_user() in bpf code, from Alexei
    Starovoitov.

 5) Lots of crash, memory leak, short TX packet et al bug fixes in
    sh_eth from Ben Hutchings.

 6) Fix memory corruption in SCTP wrt.  INIT collitions, from Daniel
    Borkmann.

 7) Fix return value logic for poll handlers in netxen, enic, and bnx2x.
    From Eric Dumazet and Govindarajulu Varadarajan.

 8) Header length calculation fix in mac80211 from Fred Chou.

 9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths.
    From Ezequiel Garcia.

10) udp_diag has bogus logic in it's hash chain skipping, copy same fix
    tcp diag used.  From Herbert Xu.

11) amd-xgbe programs wrong rx flow control register, from Thomas
    Lendacky.

12) Fix race leading to use after free in ping receive path, from Subash
    Abhinov Kasiviswanathan.

13) Cache redirect routes otherwise we can get a heavy backlog of rcu
    jobs liberating DST_NOCACHE entries.  From Hannes Frederic Sowa.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
  net: don't OOPS on socket aio
  stmmac: prevent probe drivers to crash kernel
  bnx2x: fix napi poll return value for repoll
  ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
  sh_eth: Fix DMA-API usage for RX buffers
  sh_eth: Check for DMA mapping errors on transmit
  sh_eth: Ensure DMA engines are stopped before freeing buffers
  sh_eth: Remove RX overflow log messages
  ping: Fix race in free in receive path
  udp_diag: Fix socket skipping within chain
  can: kvaser_usb: Fix state handling upon BUS_ERROR events
  can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
  can: kvaser_usb: Send correct context to URB completion
  can: kvaser_usb: Do not sleep in atomic context
  ipv4: try to cache dst_entries which would cause a redirect
  samples: bpf: relax test_maps check
  bpf: rcu lock must not be held when calling copy_to_user()
  net: sctp: fix slab corruption from use after free on INIT collisions
  net: mv643xx_eth: Fix highmem support in non-TSO egress path
  sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers
  ...

59343cd7

net: don't OOPS on socket aio · 06539d30

由 Christoph Hellwig 提交于 1月 27, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06539d30

stmmac: prevent probe drivers to crash kernel · 9afec6ef

由 Andy Shevchenko 提交于 1月 27, 2015

In the case when alloc_netdev fails we return NULL to a caller. But there is no
check for NULL in the probe drivers. This patch changes NULL to an error
pointer. The function description is amended to reflect what we may get
returned.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9afec6ef

Merge tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 7da323bb

由 Linus Torvalds 提交于 1月 27, 2015

Pull powerpc fixes from Michael Ellerman:
 "Two powerpc fixes"

* tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared
  powerpc/xmon: Fix another endiannes issue in RTAS call from xmon

7da323bb

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 41592e2f

由 Linus Torvalds 提交于 1月 27, 2015

Pull one more module fix from Rusty Russell:
 "SCSI was using module_refcount() to figure out when the module was
  unloading: this broke with new atomic refcounting.  The code is still
  suspicious, but this solves the WARN_ON()"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  scsi: always increment reference count

41592e2f

27 1月, 2015 21 次提交

bnx2x: fix napi poll return value for repoll · 24e579c8

由 Govindarajulu Varadarajan 提交于 1月 25, 2015

With the commit d75b1ade ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget. When in busy_poll is we return 0
in napi_poll. We should return budget.
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24e579c8

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · bf693f7b

由 David S. Miller 提交于 1月 27, 2015

Steffen Klassert says:

====================
ipsec 2015-01-26

Just two small fixes for _decode_session6() where we
might decode to wrong header information in some rare
situations.

Please pull or let me know if there are problems.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf693f7b

ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · 6e9e16e6

由 Hannes Frederic Sowa 提交于 1月 26, 2015

Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: NLubomir Rintel <lkundrak@v3.sk>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: NLubomir Rintel <lkundrak@v3.sk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e9e16e6

Merge branch 'sh_eth' · 22577609

由 David S. Miller 提交于 1月 27, 2015

Ben Hutchings says:

====================
Fixes for sh_eth #3

I'm continuing review and testing of Ethernet support on the R-Car H2
chip.  This series fixes the last of the more serious issues I've found.

These are not tested on any of the other supported chips.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22577609

sh_eth: Fix DMA-API usage for RX buffers · 52b9fa36

由 Ben Hutchings 提交于 1月 27, 2015

- Use the return value of dma_map_single(), rather than calling
  virt_to_page() separately
- Check for mapping failue
- Call dma_unmap_single() rather than dma_sync_single_for_cpu()
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52b9fa36

sh_eth: Check for DMA mapping errors on transmit · aa3933b8

由 Ben Hutchings 提交于 1月 27, 2015

dma_map_single() may fail if an IOMMU or swiotlb is in use, so
we need to check for this.
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa3933b8

sh_eth: Ensure DMA engines are stopped before freeing buffers · 740c7f31

由 Ben Hutchings 提交于 1月 27, 2015

Currently we try to clear EDRRR and EDTRR and immediately continue to
free buffers.  This is unsafe because:

- In general, register writes are not serialised with DMA, so we still
  have to wait for DMA to complete somehow
- The R8A7790 (R-Car H2) manual states that the TX running flag cannot
  be cleared by writing to EDTRR
- The same manual states that clearing the RX running flag only stops
  RX DMA at the next packet boundary

I applied this patch to the driver to detect DMA writes to freed
buffers:

> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device *ndev)
>  	/* Free Rx skb ringbuffer */
>  	if (mdp->rx_skbuff) {
>  		for (i = 0; i < mdp->num_rx_ring; i++)
> +			memcpy(mdp->rx_skbuff[i]->data,
> +			       "Hello, world", 12);
> +		msleep(100);
> +		for (i = 0; i < mdp->num_rx_ring; i++) {
> +			WARN_ON(memcmp(mdp->rx_skbuff[i]->data,
> +				       "Hello, world", 12));
>  			dev_kfree_skb(mdp->rx_skbuff[i]);
> +		}
>  	}
>  	kfree(mdp->rx_skbuff);
>  	mdp->rx_skbuff = NULL;

then ran the loop:

    while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done

and 'ping -f' toward the sh_eth port from another machine.  The
warning fired several times a minute.

To fix these issues:

- Deactivate all TX descriptors rather than writing to EDTRR
- As there seems to be no way of telling when RX DMA is stopped,
  perform a soft reset to ensure that both DMA enginess are stopped
- To reduce the possibility of the reset truncating a transmitted
  frame, disable egress and wait a reasonable time to reach a
  packet boundary before resetting
- Update statistics before resetting

(The 'reasonable time' does not allow for CS/CD in half-duplex
mode, but half-duplex no longer seems reasonable!)
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

740c7f31

sh_eth: Remove RX overflow log messages · dc1d0e6d

由 Ben Hutchings 提交于 1月 27, 2015

If RX traffic is overflowing the FIFO or DMA ring, logging every time
this happens just makes things worse.  These errors are visible in the
statistics anyway.
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc1d0e6d

Merge tag 'linux-can-fixes-for-3.19-20150127' of... · 8d8d67f1

由 David S. Miller 提交于 1月 27, 2015

Merge tag 'linux-can-fixes-for-3.19-20150127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2015-01-27

this is another pull request for net/master which consists of 4 patches.

All 4 patches are contributed by Ahmed S. Darwish, he fixes more problems in
the kvaser_usb driver.

David, please merge net/master to net-next/master, as we have more kvaser_usb
patches in the queue, that target net-next.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d8d67f1

ping: Fix race in free in receive path · fc752f1f

由 subashab@codeaurora.org 提交于 1月 23, 2015

An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet
Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc752f1f

udp_diag: Fix socket skipping within chain · 86f3cddb

由 Herbert Xu 提交于 1月 24, 2015

While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy.  In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.

As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86f3cddb

can: kvaser_usb: Fix state handling upon BUS_ERROR events · e638642b

由 Ahmed S. Darwish 提交于 1月 26, 2015

While being in an ERROR_WARNING state, and receiving further
bus error events with error counters still in the ERROR_WARNING
range of 97-127 inclusive, the state handling code erroneously
reverts back to ERROR_ACTIVE.

Per the CAN standard, only revert to ERROR_ACTIVE when the
error counters are less than 96.

Moreover, in certain Kvaser models, the BUS_ERROR flag is
always set along with undefined bits in the M16C status
register. Thus use bitwise operators instead of full equality
for checking that register against bus errors.
Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

e638642b

can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT · 14c10c2a

由 Ahmed S. Darwish 提交于 1月 26, 2015

On some x86 laptops, plugging a Kvaser device again after an
unplug makes the firmware always ignore the very first command.
For such a case, provide some room for retries instead of
completely exiting the driver init code.
Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

14c10c2a

can: kvaser_usb: Send correct context to URB completion · 3803fa69

由 Ahmed S. Darwish 提交于 1月 26, 2015

Send expected argument to the URB completion hander: a CAN
netdevice instead of the network interface private context
`kvaser_usb_net_priv'.

This was discovered by having some garbage in the kernel
log in place of the netdevice names: can0 and can1.
Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

3803fa69

can: kvaser_usb: Do not sleep in atomic context · ded50066

由 Ahmed S. Darwish 提交于 1月 26, 2015

Upon receiving a hardware event with the BUS_RESET flag set,
the driver kills all of its anchored URBs and resets all of
its transmit URB contexts.

Unfortunately it does so under the context of URB completion
handler `kvaser_usb_read_bulk_callback()', which is often
called in an atomic context.

While the device is flooded with many received error packets,
usb_kill_urb() typically sleeps/reschedules till the transfer
request of each killed URB in question completes, leading to
the sleep in atomic bug. [3]

In v2 submission of the original driver patch [1], it was
stated that the URBs kill and tx contexts reset was needed
since we don't receive any tx acknowledgments later and thus
such resources will be locked down forever. Fortunately this
is no longer needed since an earlier bugfix in this patch
series is now applied: all tx URB contexts are reset upon CAN
channel close. [2]

Moreover, a BUS_RESET is now treated _exactly_ like a BUS_OFF
event, which is the recommended handling method advised by
the device manufacturer.

[1] http://article.gmane.org/gmane.linux.network/239442
    http://www.webcitation.org/6Vr2yagAQ

[2] can: kvaser_usb: Reset all URB tx contexts upon channel close
    889b77f7

[3] Stacktrace:

 <IRQ>  [<ffffffff8158de87>] dump_stack+0x45/0x57
 [<ffffffff8158b60c>] __schedule_bug+0x41/0x4f
 [<ffffffff815904b1>] __schedule+0x5f1/0x700
 [<ffffffff8159360a>] ? _raw_spin_unlock_irqrestore+0xa/0x10
 [<ffffffff81590684>] schedule+0x24/0x70
 [<ffffffff8147d0a5>] usb_kill_urb+0x65/0xa0
 [<ffffffff81077970>] ? prepare_to_wait_event+0x110/0x110
 [<ffffffff8147d7d8>] usb_kill_anchored_urbs+0x48/0x80
 [<ffffffffa01f4028>] kvaser_usb_unlink_tx_urbs+0x18/0x50 [kvaser_usb]
 [<ffffffffa01f45d0>] kvaser_usb_rx_error+0xc0/0x400 [kvaser_usb]
 [<ffffffff8108b14a>] ? vprintk_default+0x1a/0x20
 [<ffffffffa01f5241>] kvaser_usb_read_bulk_callback+0x4c1/0x5f0 [kvaser_usb]
 [<ffffffff8147a73e>] __usb_hcd_giveback_urb+0x5e/0xc0
 [<ffffffff8147a8a1>] usb_hcd_giveback_urb+0x41/0x110
 [<ffffffffa0008748>] finish_urb+0x98/0x180 [ohci_hcd]
 [<ffffffff810cd1a7>] ? acct_account_cputime+0x17/0x20
 [<ffffffff81069f65>] ? local_clock+0x15/0x30
 [<ffffffffa000a36b>] ohci_work+0x1fb/0x5a0 [ohci_hcd]
 [<ffffffff814fbb31>] ? process_backlog+0xb1/0x130
 [<ffffffffa000cd5b>] ohci_irq+0xeb/0x270 [ohci_hcd]
 [<ffffffff81479fc1>] usb_hcd_irq+0x21/0x30
 [<ffffffff8108bfd3>] handle_irq_event_percpu+0x43/0x120
 [<ffffffff8108c0ed>] handle_irq_event+0x3d/0x60
 [<ffffffff8108ec84>] handle_fasteoi_irq+0x74/0x110
 [<ffffffff81004dfd>] handle_irq+0x1d/0x30
 [<ffffffff81004727>] do_IRQ+0x57/0x100
 [<ffffffff8159482a>] common_interrupt+0x6a/0x6a
Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

ded50066

Merge tag 'mac80211-for-davem-2015-01-23' of... · 7d63585b

由 David S. Miller 提交于 1月 26, 2015

Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Another set of last-minute fixes:
 * fix station double-removal when suspending while associating
 * fix the HT (802.11n) header length calculation
 * fix the CCK radiotap flag used for monitoring, a pretty
   old regression but a simple one-liner
 * fix per-station group-key handling
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d63585b

ipv4: try to cache dst_entries which would cause a redirect · df4d9254

由 Hannes Frederic Sowa 提交于 1月 23, 2015

Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f8864972 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NMarcelo Leitner <mleitner@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df4d9254

Merge branch 'bpf' · 412d2907

由 David S. Miller 提交于 1月 26, 2015

Alexei Starovoitov says:

====================
bpf: fix two bugs

Michael Holzheu caught two issues (in bpf syscall and in the test).
Fix them. Details in corresponding patches.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

412d2907

samples: bpf: relax test_maps check · ba1a68bf

由 Alexei Starovoitov 提交于 1月 22, 2015

hash map is unordered, so get_next_key() iterator shouldn't
rely on particular order of elements. So relax this test.

Fixes: ffb65f27 ("bpf: add a testsuite for eBPF maps")
Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba1a68bf

bpf: rcu lock must not be held when calling copy_to_user() · 8ebe667c

由 Alexei Starovoitov 提交于 1月 22, 2015

BUG: sleeping function called from invalid context at mm/memory.c:3732
in_atomic(): 0, irqs_disabled(): 0, pid: 671, name: test_maps
1 lock held by test_maps/671:
 #0:  (rcu_read_lock){......}, at: [<0000000000264190>] map_lookup_elem+0xe8/0x260
Call Trace:
([<0000000000115b7e>] show_trace+0x12e/0x150)
 [<0000000000115c40>] show_stack+0xa0/0x100
 [<00000000009b163c>] dump_stack+0x74/0xc8
 [<000000000017424a>] ___might_sleep+0x23a/0x248
 [<00000000002b58e8>] might_fault+0x70/0xe8
 [<0000000000264230>] map_lookup_elem+0x188/0x260
 [<0000000000264716>] SyS_bpf+0x20e/0x840

Fix it by allocating temporary buffer to store map element value.

Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps")
Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ebe667c

net: sctp: fix slab corruption from use after free on INIT collisions · 600ddd68

由 Daniel Borkmann 提交于 1月 22, 2015

When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950 ...

[  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[  533.940559] PGD 5030f2067 PUD 0
[  533.957104] Oops: 0000 [#1] SMP
[  533.974283] Modules linked in: sctp mlx4_en [...]
[  534.939704] Call Trace:
[  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
[  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[  669.826424]  __slab_alloc+0x4bf/0x566
[  669.826433]  __kmalloc+0x280/0x310
[  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
[  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[  669.826635]  __slab_free+0x39/0x2a8
[  669.826643]  kfree+0x1d6/0x230
[  669.826650]  kzfree+0x31/0x40
[  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
[  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
[  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

600ddd68