提交 · 6bb0fef489f667cf701853054f44579754f00a06 · openeuler / raspberrypi-kernel

10 9月, 2015 26 次提交

netlink, mmap: fix edge-case leakages in nf queue zero-copy · 6bb0fef4

由 Daniel Borkmann 提交于 9月 10, 2015

When netlink mmap on receive side is the consumer of nf queue data,
it can happen that in some edge cases, we write skb shared info into
the user space mmap buffer:

Assume a possible rx ring frame size of only 4096, and the network skb,
which is being zero-copied into the netlink skb, contains page frags
with an overall skb->len larger than the linear part of the netlink
skb.

skb_zerocopy(), which is generic and thus not aware of the fact that
shared info cannot be accessed for such skbs then tries to write and
fill frags, thus leaking kernel data/pointers and in some corner cases
possibly writing out of bounds of the mmap area (when filling the
last slot in the ring buffer this way).

I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
an advertised length larger than 4096, where the linear part is visible
at the slot beginning, and the leaked sizeof(struct skb_shared_info)
has been written to the beginning of the next slot (also corrupting
the struct nl_mmap_hdr slot header incl. status etc), since skb->end
points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.

The fix adds and lets __netlink_alloc_skb() take the actual needed
linear room for the network skb + meta data into account. It's completely
irrelevant for non-mmaped netlink sockets, but in case mmap sockets
are used, it can be decided whether the available skb_tailroom() is
really large enough for the buffer, or whether it needs to internally
fallback to a normal alloc_skb().

>From nf queue side, the information whether the destination port is
an mmap RX ring is not really available without extra port-to-socket
lookup, thus it can only be determined in lower layers i.e. when
__netlink_alloc_skb() is called that checks internally for this. I
chose to add the extra ldiff parameter as mmap will then still work:
We have data_len and hlen in nfqnl_build_packet_message(), data_len
is the full length (capped at queue->copy_range) for skb_zerocopy()
and hlen some possible part of data_len that needs to be copied; the
rem_len variable indicates the needed remaining linear mmap space.

The only other workaround in nf queue internally would be after
allocation time by f.e. cap'ing the data_len to the skb_tailroom()
iff we deal with an mmap skb, but that would 1) expose the fact that
we use a mmap skb to upper layers, and 2) trim the skb where we
otherwise could just have moved the full skb into the normal receive
queue.

After the patch, in my test case the ring slot doesn't fit and therefore
shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
thus needs to be picked up via recv().

Fixes: 3ab1f683 ("nfnetlink: add support for memory mapped netlink")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb0fef4

netlink, mmap: don't walk rx ring on poll if receive queue non-empty · a66e3656

由 Daniel Borkmann 提交于 9月 10, 2015

In case of netlink mmap, there can be situations where received frames
have to be placed into the normal receive queue. The ring buffer indicates
this through NL_MMAP_STATUS_COPY, so the user is asked to pick them up
via recvmsg(2) syscall, and to put the slot back to NL_MMAP_STATUS_UNUSED.

Commit 0ef70770 ("netlink: rx mmap: fix POLLIN condition") changed
polling, so that we walk in the worst case the whole ring through the
new netlink_has_valid_frame(), for example, when the ring would have no
NL_MMAP_STATUS_VALID, but at least one NL_MMAP_STATUS_COPY frame.

Since we do a datagram_poll() already earlier to pick up a mask that could
possibly contain POLLIN | POLLRDNORM already (due to NL_MMAP_STATUS_COPY),
we can skip checking the rx ring entirely.

In case the kernel is compiled with !CONFIG_NETLINK_MMAP, then all this is
irrelevant anyway as netlink_poll() is just defined as datagram_poll().
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a66e3656

cxgb4: changes for new firmware 1.14.4.0 · f2be053c

由 Hariprasad Shenai 提交于 9月 10, 2015

Incorporate fw_ldst_cmd structure change for new firmware and also
update version string for the same
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2be053c

net: fec: add netif status check before set mac address · 9638d19e

由 Nimrod Andy 提交于 9月 10, 2015

There exist one issue by below case that case system hang:
ifconfig eth0 down
ifconfig eth0 hw ether 00:10:19:19:81:19

After eth0 down, all fec clocks are gated off. In the .fec_set_mac_address()
function, it will set new MAC address to registers, which causes system hang.

So it needs to add netif status check to avoid registers access when clocks are
gated off. Until eth0 up the new MAC address are wrote into related registers.

V2:
As Lucas Stach's suggestion, add a comment in the code to explain why it needed.

CC: Lucas Stach <l.stach@pengutronix.de>
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9638d19e

Merge branch 'r8152-autoresume' · 1d68c286

由 David S. Miller 提交于 9月 09, 2015

Hayes Wang says:

====================
r8152: fix the autoresume may fail

Fix the autosuspend issues which occur about linking change.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d68c286

r8152: fix the runtime suspend issues · 2dd49e0f

由 hayeswang 提交于 9月 07, 2015

Fix the runtime suspend issues result from the linking change.

Case 1:
a) link down occurs.
b) driver disable tx/rx.
c) autosuspend occurs.
d) hw linking up.
e) device suspends without enabling tx/rx.
f) couldn't wake up when receiving packets.

Case 2:
a) Nway results in linking down.
b) autosuspend occurs.
c) device suspends.
d) device may not wake up when linking up.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2dd49e0f

r8152: split DRIVER_VERSION · d0942473

由 hayeswang 提交于 9月 07, 2015

Split DRIVER_VERSION into NETNEXT_VERSION and NET_VERSION. Then,
according to the value of DRIVER_VERSION, we could know which
patches are used generally without comparing the source code.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0942473

ipv6: fix ifnullfree.cocci warnings · 52fe51f8

由 Wu Fengguang 提交于 9月 10, 2015

net/ipv6/route.c:2946:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

NULL check before some freeing functions is not needed.

Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52fe51f8

add microchip LAN88xx phy driver · 792aec47

由 Woojung.Huh@microchip.com 提交于 9月 09, 2015

Add Microchip LAN88XX phy driver for phylib.
Signed-off-by: NWoojung Huh <woojung.huh@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

792aec47

stmmac: fix check for phydev being open · dfc50fca

由 Alexey Brodkin 提交于 9月 09, 2015

Current check of phydev with IS_ERR(phydev) may make not much sense
because of_phy_connect() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() makes perfect sense.

So let's use combined check IS_ERR_OR_NULL() that covers both cases.

Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfc50fca

net: qlcnic: delete redundant memsets · 1f0ca208

由 Rasmus Villemoes 提交于 9月 09, 2015

In all cases, mbx->req.arg and mbx->rsp.arg have just been allocated
using kcalloc(), so these six memsets are redundant.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f0ca208

net: mv643xx_eth: use kzalloc · b66a6085

由 Rasmus Villemoes 提交于 9月 09, 2015

The double memset is a little ugly; using kzalloc avoids it altogether.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b66a6085

net: jme: use kzalloc() instead of kmalloc+memset · e9b5ac27

由 Rasmus Villemoes 提交于 9月 09, 2015

Using kzalloc saves a tiny bit on .text.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9b5ac27

net: cavium: liquidio: use kzalloc in setup_glist() · ce8e5c70

由 Rasmus Villemoes 提交于 9月 09, 2015

We save a little .text and get rid of the sizeof(...) style
inconsistency.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce8e5c70

net: ipv6: use common fib_default_rule_pref · f53de1e9

由 Phil Sutter 提交于 9月 09, 2015

This switches IPv6 policy routing to use the shared
fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
multicast routing for IPv4 as well as IPv6.

The motivation for this patch is a complaint about iproute2 behaving
inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
assigned priority value was decreased with each rule added.

Since then all users of the default_pref field have been converted to
assign the generic function fib_default_rule_pref(), fib_nl_newrule()
may just use it directly instead. Therefore get rid of the function
pointer altogether and make fib_default_rule_pref() static, as it's not
used outside fib_rules.c anymore.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f53de1e9

net: ethoc: Remove unnecessary #ifdef CONFIG_OF · 444c5f92

由 Tobias Klauser 提交于 9月 09, 2015

For !CONFIG_OF of_get_property() is defined to always return NULL. Thus
there's no need to protect the call to of_get_property() with #ifdef
CONFIG_OF.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

444c5f92

net: dsa: bcm_sf2: Fix 64-bits register writes · 03679a14

由 Florian Fainelli 提交于 9月 08, 2015

The macro to write 64-bits quantities to the 32-bits register swapped
the value and offsets arguments, we want to preserve the ordering of the
arguments with respect to how writel() is implemented for instance:
value first, offset/base second.

Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03679a14

bpf: fix out of bounds access in verifier log · 687f0715

由 Alexei Starovoitov 提交于 9月 08, 2015

when the verifier log is enabled the print_bpf_insn() is doing
bpf_alu_string[BPF_OP(insn->code) >> 4]
and
bpf_jmp_string[BPF_OP(insn->code) >> 4]
where BPF_OP is a 4-bit instruction opcode.
Malformed insns can cause out of bounds access.
Fix it by sizing arrays appropriately.

The bug was found by clang address sanitizer with libfuzzer.
Reported-by: NYonghong Song <yhs@plumgrid.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

687f0715

ipv6: fix multipath route replace error recovery · 6b9ea5a6

由 Roopa Prabhu 提交于 9月 08, 2015

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route leaving the fib
in an inconsistent state.

This patch reduces the possibility of this by doing the following:
a) Changes the existing multipath route add code to a two stage process:
  build rt6_infos + insert them
	ip6_route_add rt6_info creation code is moved into
	ip6_route_info_create.
b) This ensures that most errors are caught during building rt6_infos
  and we fail early
c) Separates multipath add and del code. Because add needs the special
  two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
  warning is printed (This should be unlikely)

Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
 * kernel */

After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

Fixes: 27596472 ("ipv6: fix ECMP route replacement")
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b9ea5a6

ebpf: fix fd refcount leaks related to maps in bpf syscall · 592867bf

由 Daniel Borkmann 提交于 9月 08, 2015

We may already have gotten a proper fd struct through fdget(), so
whenever we return at the end of an map operation, we need to call
fdput(). However, each map operation from syscall side first probes
CHECK_ATTR() to verify that unused fields in the bpf_attr union are
zero.

In case of malformed input, we return with error, but the lookup to
the map_fd was already performed at that time, so that we return
without an corresponding fdput(). Fix it by performing an fdget()
only right before bpf_map_get(). The fdget() invocation on maps in
the verifier is not affected.

Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

592867bf

RDS: verify the underlying transport exists before creating a connection · 74e98eb0

由 Sasha Levin 提交于 9月 08, 2015

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74e98eb0

xen-netback: require fewer guest Rx slots when not using GSO · 1d5d4852

由 David Vrabel 提交于 9月 08, 2015

Commit f48da8b1 (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d545 (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d5d4852

Merge branch 'cxgb4-fixes' · 9b57ab8b

由 David S. Miller 提交于 9月 09, 2015

Hariprasad Shenai says:

====================
cxgb4: Fix tx flit calculation and wc stat configuration

This patch series fixes the following:
Patch 1/2 fixes tx flit calculation, which if wrong can lead to
stall, hang, data corrpution, write combining failure. Patch 2/2 fixes
PCI-E write combining stats configuration.

This patch series has been created against net tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b57ab8b

cxgb4: Fix for write-combining stats configuration · 2a485cf7

由 Hariprasad Shenai 提交于 9月 08, 2015

The write-combining configuration register SGE_STAT_CFG_A needs to
be configured after FW initializes the adapter, else FW will reset
the configuration
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a485cf7

cxgb4: Fix tx flit calculation · fd1754fb

由 Hariprasad Shenai 提交于 9月 08, 2015

In commit 0aac3f56 ("cxgb4: Add comment for calculate tx flits
and sge length code") introduced a regression where tx flit calculation
is going wrong, which can lead to data corruption, hang, stall and
write-combining failure. Fixing it.
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd1754fb

net: eth: altera: Fix the initial device operstate · d43cefcd

由 Atsushi Nemoto 提交于 9月 08, 2015

Call netif_carrier_off() prior to register_netdev(), otherwise
userspace can see incorrect link state.
Signed-off-by: NAtsushi Nemoto <nemoto@toshiba-tops.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d43cefcd

09 9月, 2015 7 次提交

net: tipc: fix stall during bclink wakeup procedure · 7845989c

由 Kolmakov Dmitriy 提交于 9月 07, 2015

If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:

INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
					g=54225 c=54224 q=11465)
Task dump for CPU 0:
tpch            R  running task        0 39949  39948 0x0000000a
 ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
 ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
 0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
Call Trace:
 <IRQ>  [<ffffffff8106a4be>] sched_show_task+0xae/0x120
 [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
 [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
 [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
 [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
 [<ffffffff81099e64>] update_process_times+0x34/0x60
 [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
 [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
 [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
 [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
 [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
 [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
 [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
 [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
 [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
 [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
 [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
 [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
 [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
 [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
 [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
 [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
 [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
 [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
 [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
 [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
 [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
 [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
 [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
 [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
 [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
 [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
 [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
 [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
 [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
 [<ffffffff8104bc26>] irq_exit+0x76/0xa0
 [<ffffffff81004355>] do_IRQ+0x55/0xf0
 [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
 <EOI>

The issue occurs only when tipc_sk_rcv() is used to wake up postponed
senders:

	tipc_bclink_wakeup_users()
		// wakeupq - is a queue which consists of special
		// 		 messages with SOCK_WAKEUP type.
		tipc_sk_rcv(wakeupq)
			...
			while (skb_queue_len(inputq)) {
				filter_rcv(skb)
					// Here the type of message is checked
					// and if it is SOCK_WAKEUP then
					// it tries to wake up a sender.
					tipc_write_space(sk)
						wake_up_interruptible_sync_poll()
			}

After the sender thread is woke up it can gather control and perform
an attempt to send a message. But if there is no enough place in send
queue it will call link_schedule_user() function which puts a message
of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
Thus the size of the queue actually is not changed and the while()
loop never exits.

The approach I proposed is to wake up only senders for which there is
enough place in send queue so the described issue can't occur.
Moreover the same approach is already used to wake up senders on
unicast links.

I have got into the issue on our product code but to reproduce the
issue I changed a benchmark test application (from
tipcutils/demos/benchmark) to perform the following scenario:
	1. Run 64 instances of test application (nodes). It can be done
	   on the one physical machine.
	2. Each application connects to all other using TIPC sockets in
	   RDM mode.
	3. When setup is done all nodes start simultaneously send
	   broadcast messages.
	4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link
occurs. For example, when there are only 8 nodes it works fine since
congestion doesn't occur. Send queue limit is 40 in my case (I use a
critical importance level) and when 64 nodes send a message at the
same moment a congestion occurs every time.
Signed-off-by: NDmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7845989c

dm9000: fix a typo · 7b901873

由 Barry Song 提交于 9月 07, 2015

Signed-off-by: NBarry Song <Baohua.Song@csr.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b901873

net: bridge: remove unnecessary switchdev include · 7a577f01

由 Vivien Didelot 提交于 9月 05, 2015

Remove the unnecessary switchdev.h include from br_netlink.c.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a577f01

net: bridge: check __vlan_vid_del for error · bf361ad3

由 Vivien Didelot 提交于 9月 05, 2015

Since __vlan_del can return an error code, change its inner function
__vlan_vid_del to return an eventual error from switchdev_port_obj_del.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf361ad3

net: dsa: bcm_sf2: Fix ageing conditions and operation · 39797a27

由 Florian Fainelli 提交于 9月 05, 2015

The comparison check between cur_hw_state and hw_state is currently
invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
going to cause an additional aging to occur. Fix this by not shifting
cur_hw_state while reading it, but instead, mask the value with the
appropriately shitfted bitmask.

The other problem with the fast-ageing process is that we did not set
the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
register to avoid leaving spurious bits sets from one operation to the
other.

Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39797a27

device property: Don't overwrite addr when failing in device_get_mac_address · 5b902d6f

由 Julien Grall 提交于 9月 03, 2015

The function device_get_mac_address is trying different property names
in order to get the mac address. To check the return value, the variable
addr (which contain the buffer pass by the caller) will be re-used. This
means that if the previous property is not found, the next property will
be read using a NULL buffer.

Therefore it's only possible to retrieve the mac if node contains a
property "mac-address". Fix it by using a temporary buffer for the
return value.

This has been introduced by commit 4c96b7dc
"Add a matching set of device_ functions for determining mac/phy"
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b902d6f

usbnet: Fix a race between usbnet_stop() and the BH · fcb0bb6a

由 Eugene Shatokhin 提交于 9月 01, 2015

The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.

Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.

The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().

Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:

*0 __skb_unlink (skbuff.h:1517)
    prev->next = next;
*1 defer_bh (usbnet.c:430)
    spin_lock_irqsave(&list->lock, flags);
    old_state = entry->state;
    entry->state = state;
    __skb_unlink(skb, list);
    spin_unlock(&list->lock);
    spin_lock(&dev->done.lock);
    __skb_queue_tail(&dev->done, skb);
    if (dev->done.qlen == 1)
        tasklet_schedule(&dev->bh);
    spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
    state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:

*0  usbnet_terminate_urbs (usbnet.c:765)
    /* maybe wait for deletions to finish. */
    while (!skb_queue_empty(&dev->rxq)
        && !skb_queue_empty(&dev->txq)
        && !skb_queue_empty(&dev->done)) {
            schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
            set_current_state(TASK_UNINTERRUPTIBLE);
            netif_dbg(dev, ifdown, dev->net,
                  "waited for %d urb completions\n", temp);
    }
*1  usbnet_stop (usbnet.c:806)
    if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
        usbnet_terminate_urbs(dev);

As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.

Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
this race.
Signed-off-by: NEugene Shatokhin <eugene.shatokhin@rosalab.ru>
Reviewed-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NOliver Neukum <oneukum@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcb0bb6a

07 9月, 2015 7 次提交

cxgb4: fix usage of uninitialized variable · 46cdc9be

由 françois romieu 提交于 9月 04, 2015

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function ‘init_one’:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4579:8: warning: ‘chip’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   chip |= CHELSIO_CHIP_CODE(CHELSIO_T4, pl_rev);
        ^
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4571:11: note: ‘chip’ was declared here
  int ver, chip;
           ^

Fixes: d86bd29e ("cxgb4/cxgb4vf: read the correct bits of PL Who Am I register")
Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>
Cc: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46cdc9be

fixed_phy: pass 'irq' to fixed_phy_add() · bd1a05ee

由 Sergei Shtylyov 提交于 9月 03, 2015

I've noticed that fixed_phy_register() ignores its 'irq' parameter instead of
passing it to fixed_phy_add(). Luckily, fixed_phy_register() seems to always
be called with PHY_POLL for 'irq'... :-)

Fixes: a7595121 ("net: phy: extend fixed driver with fixed_phy_register()")
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd1a05ee

openvswitch: Remove conntrack Kconfig option. · f88f69dd

由 Joe Stringer 提交于 9月 04, 2015

There's no particular desire to have conntrack action support in Open
vSwitch as an independently configurable bit, rather just to ensure
there is not a hard dependency. This exposed option doesn't accurately
reflect the conntrack dependency when enabled, so simplify this by
removing the option. Compile the support if NF_CONNTRACK is enabled.

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f88f69dd

net: dsa: mv88e6171: add hardware 802.1Q support · 585e7e1a

由 Vivien Didelot 提交于 9月 04, 2015

The Marvell 88E6171 switch is in the 88E6351 family, which supports
802.1Q, thus add support from the generic mv88e6xxx functions.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

585e7e1a

Merge tag 'mac80211-for-davem-2015-09-04' of... · 080fff50

由 David S. Miller 提交于 9月 06, 2015

Merge tag 'mac80211-for-davem-2015-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
For the first round of fixes, we have this:
 * fix for the sizeof() pointer type issue
 * a fix for regulatory getting into a restore loop
 * a fix for rfkill global 'all' state, it needs to be stored
   everywhere to apply correctly to new rfkill instances
 * properly refuse CQM RSSI when it cannot actually be used
 * protect HT TDLS traffic properly in non-HT networks
 * don't incorrectly advertise 80 MHz support when not allowed
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

080fff50

ethernet: synopsys: SYNOPSYS_DWC_ETH_QOS should depend on HAS_DMA · e5a5837d

由 Geert Uytterhoeven 提交于 9月 04, 2015

If NO_DMA=y:

ERROR: "dma_alloc_coherent" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
ERROR: "dma_free_coherent" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
ERROR: "dma_unmap_single" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
ERROR: "dma_map_page" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
ERROR: "dma_mapping_error" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
ERROR: "dma_map_single" [drivers/net/ethernet/synopsys/dwc_eth_qos.ko] undefined!
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NLars Persson <larper@axis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5a5837d

vxlan: Refactor vxlan_udp_encap_recv() to kill compiler warning · 0f1b7354

由 Geert Uytterhoeven 提交于 9月 04, 2015

drivers/net/vxlan.c: In function ‘vxlan_udp_encap_recv’:
drivers/net/vxlan.c:1226: warning: ‘info’ may be used uninitialized in this function

While this warning is a false positive, it can be killed easily by
getting rid of the pointer intermediary and referring directly to the
ip_tunnel_info structure.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f1b7354