提交 · dbde497966804e63a38fdedc1e3815e77097efc2 · OpenHarmony / kernel_linux

20 11月, 2013 3 次提交

tcp: don't update snd_nxt, when a socket is switched from repair mode · dbde4979

由 Andrey Vagin 提交于 11月 19, 2013

snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbde4979

xfrm: Release dst if this dst is improper for vti tunnel · 236c9f84

由 fan.du 提交于 11月 19, 2013

After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.
Signed-off-by: NFan Du <fan.du@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

236c9f84

netlink: fix documentation typo in netlink_set_err() · 840e93f2

由 Johannes Berg 提交于 11月 19, 2013

The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

840e93f2

19 11月, 2013 5 次提交

ping: prevent NULL pointer dereference on write to msg_name · cf970c00

由 Hannes Frederic Sowa 提交于 11月 18, 2013

A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf970c00

ipv6: Fix inet6_init() cleanup order · eca42aaf

由 Vlad Yasevich 提交于 11月 16, 2013

Commit 6d0bfe22
	net: ipv6: Add IPv6 support to the ping socket

introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.

CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eca42aaf

genetlink: rename shadowed variable · 029b234f

由 Johannes Berg 提交于 11月 18, 2013

Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

029b234f

inet: prevent leakage of uninitialized memory to user in recv syscalls · bceaa902

由 Hannes Frederic Sowa 提交于 11月 18, 2013

Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: Nmpb <mpb.mail@gmail.com>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bceaa902

net: ipv6: ndisc: Fix warning when CONFIG_SYSCTL=n · bcd081a3

由 Fabio Estevam 提交于 11月 16, 2013

When CONFIG_SYSCTL=n the following build warning happens:

net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label]

The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the
'ifdef CONFIG_SYSCTL' block.
Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcd081a3

16 11月, 2013 4 次提交

pkt_sched: fq: fix pacing for small frames · f52ed899

由 Eric Dumazet 提交于 11月 15, 2013

For performance reasons, sch_fq tried hard to not setup timers for every
sent packet, using a quantum based heuristic : A delay is setup only if
the flow exhausted its credit.

Problem is that application limited flows can refill their credit
for every queued packet, and they can evade pacing.

This problem can also be triggered when TCP flows use small MSS values,
as TSO auto sizing builds packets that are smaller than the default fq
quantum (3028 bytes)

This patch adds a 40 ms delay to guard flow credit refill.

Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f52ed899

pkt_sched: fq: warn users using defrate · 65c5189a

由 Eric Dumazet 提交于 11月 15, 2013

Commit 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing")
obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.

Suggested by David Miller
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65c5189a

genetlink: unify registration functions · 568508aa

由 Johannes Berg 提交于 11月 15, 2013

Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

568508aa

macvlan: disable LRO on lower device instead of macvlan · 529d0489

由 Michal Kubeček 提交于 11月 15, 2013

A macvlan device has always LRO disabled so that calling
dev_disable_lro() on it does nothing. If we need to disable LRO
e.g. because

  - the macvlan device is inserted into a bridge
  - IPv6 forwarding is enabled for it
  - it is in a different namespace than lowerdev and IPv4
    forwarding is enabled in it

we need to disable LRO on its underlying device instead (as we
do for 802.1q VLAN devices).

v2: use newly introduced netif_is_macvlan()
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

529d0489

15 11月, 2013 23 次提交

6lowpan: Uncompression of traffic class field was incorrect · 1188f054

由 Jukka Rissanen 提交于 11月 13, 2013

If priority/traffic class field in IPv6 header is set (seen when
using ssh), the uncompression sets the TC and Flow fields incorrectly.

Example:

This is IPv6 header of a sent packet. Note the priority/TC (=1) in
the first byte.

00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: 02 1e ab ff fe 4c 52 57

This gets compressed like this in the sending side

00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
00000010: aa 2d fe 92 86 4e be c6 ....

In the receiving end, the packet gets uncompressed to this
IPv6 header

00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: ab ff fe 4c 52 57 ec c2

First four bytes are set incorrectly and we have also lost
two bytes from destination address.

The fix is to switch the case values in switch statement
when checking the TC field.
Signed-off-by: NJukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1188f054

tipc: fix dereference before check warning · 3db0a197

由 Erik Hugne 提交于 11月 13, 2013

This fixes the following Smatch warning:
net/tipc/link.c:2364 tipc_link_recv_fragment()
    warn: variable dereferenced before check '*head' (see line 2361)

A null pointer might be passed to skb_try_coalesce if
a malicious sender injects orphan fragments on a link.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3db0a197

ipv4: fix possible seqlock deadlock · c9e90429

由 Eric Dumazet 提交于 11月 14, 2013

ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.

Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9e90429

net/hsr: Fix possible leak in 'hsr_get_node_status()' · 84a035f6

由 Geyslan G. Bem 提交于 11月 14, 2013

If 'hsr_get_node_data()' returns error, going directly to 'fail' label
doesn't free the memory pointed by 'skb_out'.
Signed-off-by: NGeyslan G. Bem <geyslan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84a035f6

pkt_sched: fq: change classification of control packets · 2abc2f07

由 Maciej Żenczykowski 提交于 11月 14, 2013

Initial sch_fq implementation copied code from pfifo_fast to classify
a packet as a high prio packet.

This clashes with setups using PRIO with say 7 bands, as one of the
band could be incorrectly (mis)classified by FQ.

Packets would be queued in the 'internal' queue, and no pacing ever
happen for this special queue.

Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2abc2f07

genetlink: make all genl_ops users const · 4534de83