提交 · 989e04c5bc3ff77d65e1f0d87bf7904dfa30d41c · openeuler / raspberrypi-kernel

23 8月, 2014 11 次提交

由 Yuchung Cheng 提交于 8月 22, 2014

Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
disabled if any retransmits were still in flight.  The concern was
perhaps that spurious retransmission sent in a previous recovery
episode may trigger DSACKs to falsely undo the current recovery.

However, this inadvertently misses undo opportunities (using either
TCP timestamps or DSACKs) when timeout occurs during a loss episode,
i.e.  recurring timeouts or timeout during fast recovery. In these
cases some retransmissions will be in flight but we should allow
undo. Furthermore, we should only reset undo_marker and undo_retrans
upon timeout if we are starting a new recovery episode. Finally,
when we do reset our undo state, we now do so in a manner similar
to tcp_enter_recovery(), so that we require a DSACK for each of
the outstsanding retransmissions. This will achieve the original
goal by requiring that we receive the same number of DSACKs as
retransmissions.

This patch increases the undo events by 50% on Google servers.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

989e04c5

net: remove dead code after sk_data_ready change · 884cf705

由 Eric Dumazet 提交于 8月 22, 2014

As a followup to commit 676d2369 ("net: Fix use after free by
removing length arg from sk_data_ready callbacks"), we can remove
some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

884cf705

net: use ktime_get_ns() and ktime_get_real_ns() helpers · d2de875c

由 Eric Dumazet 提交于 8月 22, 2014

ktime_get_ns() replaces ktime_to_ns(ktime_get())

ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2de875c

af_decnet: Use time_after_eq · c0b80236

由 Himangi Saraogi 提交于 8月 20, 2014

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2,E3;
@@
- jiffies - E1 >= (E2*E3)
+ time_after_eq(jiffies, E1+E2*E3)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0b80236

decnet: Use time_after_eq · 8b1b1eb5

由 Himangi Saraogi 提交于 8月 20, 2014

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- (jiffies - E1) >= E2
+ time_after_eq(jiffies, E1+E2)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b1b1eb5

ipconfig: Use time_before · c72c95a0

由 Himangi Saraogi 提交于 8月 20, 2014

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- jiffies - E1 < E2
+ time_before(jiffies, E1+E2)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c72c95a0

dn_dev: Use time_before · b5c5c36d

由 Himangi Saraogi 提交于 8月 20, 2014

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@

(
- (jiffies - E1) < E2
+ time_before(jiffies, E1+E2)
)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5c5c36d

br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER() · 0932997e

由 Andreea-Cristina Bernat 提交于 8月 22, 2014

The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.

The following Coccinelle semantic patch was used:
@@
@@

- rcu_assign_pointer
+ RCU_INIT_POINTER
  (..., NULL)
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0932997e

net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer() · 8c6b00c8

由 Andreea-Cristina Bernat 提交于 8月 17, 2014

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c6b00c8

net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer() · e6b68883

由 Andreea-Cristina Bernat 提交于 8月 17, 2014

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6b68883

ipv4: Restore accept_local behaviour in fib_validate_source() · 1dced6a8

由 Sébastien Barré 提交于 8月 17, 2014

Commit 7a9bc9b8 ("ipv4: Elide fib_validate_source() completely when possible.")
introduced a short-circuit to avoid calling fib_validate_source when not
needed. That change took rp_filter into account, but not accept_local.
This resulted in a change of behaviour: with rp_filter and accept_local
off, incoming packets with a local address in the source field should be
dropped.

Here is how to reproduce the change pre/post 7a9bc9b8 commit:
-configure the same IPv4 address on hosts A and B.
-try to send an ARP request from B to A.
-The ARP request will be dropped before that commit, but accepted and answered
after that commit.

This adds a check for ACCEPT_LOCAL, to maintain full
fib validation in case it is 0. We also leave __fib_validate_source() earlier
when possible, based on the same check as fib_validate_source(), once the
accept_local stuff is verified.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dced6a8

15 8月, 2014 6 次提交

netlink: Annotate RCU locking for seq_file walker · 9ce12eb1

由 Thomas Graf 提交于 8月 13, 2014

Silences the following sparse warnings:
net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit
net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce12eb1

tcp: fix ssthresh and undo for consecutive short FRTO episodes · 0c9ab092

由 Neal Cardwell 提交于 8月 14, 2014

Fix TCP FRTO logic so that it always notices when snd_una advances,
indicating that any RTO after that point will be a new and distinct
loss episode.

Previously there was a very specific sequence that could cause FRTO to
fail to notice a new loss episode had started:

(1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
(2) receiver ACKs packet 1
(3) FRTO sends 2 more packets
(4) RTO timer fires again (should start a new loss episode)

The problem was in step (3) above, where tcp_process_loss() returned
early (in the spot marked "Step 2.b"), so that it never got to the
logic to clear icsk_retransmits. Thus icsk_retransmits stayed
non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
icsk_retransmits, decide that this RTO is not a new episode, and
decide not to cut ssthresh and remember the current cwnd and ssthresh
for undo.

There were two main consequences to the bug that we have
observed. First, ssthresh was not decreased in step (4). Second, when
there was a series of such FRTO (1-4) sequences that happened to be
followed by an FRTO undo, we would restore the cwnd and ssthresh from
before the entire series started (instead of the cwnd and ssthresh
from before the most recent RTO). This could result in cwnd and
ssthresh being restored to values much bigger than the proper values.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c9ab092

tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af

由 Hannes Frederic Sowa 提交于 8月 14, 2014

tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.

During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.

Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.

In general, use of tcp_tw_recycle is disadvised.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a26552af

tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 4fab9071

由 Neal Cardwell 提交于 8月 14, 2014

Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Diagnosed-by: NWillem de Bruijn <willemb@google.com>
Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fab9071

sit: Fix ipip6_tunnel_lookup device matching criteria · bc8fc7b8

由 Shmulik Ladkani 提交于 8月 14, 2014

As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.

However the comparison was incorrectly based on skb->dev->iflink.

Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.

This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc8fc7b8

tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 9d186cac

由 Andrey Vagin 提交于 8月 13, 2014

We don't know right timestamp for repaired skb-s. Wrong RTT estimations
isn't good, because some congestion modules heavily depends on it.

This patch adds the TCPCB_REPAIRED flag, which is included in
TCPCB_RETRANS.

Thanks to Eric for the advice how to fix this issue.

This patch fixes the warning:
[  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
[  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
[  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
[  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
[  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
[  879.569982] Call Trace:
[  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
[  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
[  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
[  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
[  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
[  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
[  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
[  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
[  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
[  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
[  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
[  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
[  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
[  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---

v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
    where it's set for other skb-s.

Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d186cac

14 8月, 2014 6 次提交

net-timestamp: fix missing tcp fragmentation cases · 490cc7d0

由 Willem de Bruijn 提交于 8月 12, 2014

Bytestream timestamps are correlated with a single byte in the skbuff,
recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure
that the tskey is set for the fragment in which the tskey falls
(seqno <= tskey < end_seqno).

The original implementation did not address fragmentation in
tcp_fragment or tso_fragment. Add code to inspect the sequence numbers
and move both tskey and the relevant tx_flags if necessary.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

490cc7d0

net-timestamp: fix missing ACK timestamp · 712a7221

由 Willem de Bruijn 提交于 8月 12, 2014

ACK timestamps are generated in tcp_clean_rtx_queue. The TSO datapath
can break out early, causing the timestamp code to be skipped. Move
the code up before the break.
Reported-by: NDavid S. Miller <davem@davemloft.net>

Also fix a boundary condition: tp->snd_una is the next unacknowledged
byte and between tests inclusive (a <= b <= c), so generate a an ACK
timestamp if (prior_snd_una <= tskey <= tp->snd_una - 1).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

712a7221

M
irda: Fix rd_frame control field initialization in irlap_send_rd_frame() · efd50290
由 Maks Naumov 提交于 8月 12, 2014
```
Signed-off-by: NMaks Naumov <maksqwe1@ukr.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
efd50290

lec: Fix bug introduced by · 8356f9d5

由 chas williams - CONTRACTOR 提交于 8月 12, 2014

b67bfe0d (hlist: drop the node
parameter from iterators) dropped the node parameter from
iterators which lec_tbl_walk() was using to iterate the list.
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8356f9d5

atm/svc: Fix blocking in wait loop · de713b57

由 chas williams - CONTRACTOR 提交于 8月 12, 2014

One should not call blocking primitives inside a wait loop, since both
require task_struct::state to sleep, so the inner will destroy the
outer state.

sigd_enq() will possibly sleep for alloc_skb().  Move sigd_enq() before
prepare_to_wait() to avoid sleeping while waiting interruptibly.  You do
not actually need to call sigd_enq() after the initial prepare_to_wait()
because we test the termination condition before calling schedule().

Based on suggestions from Peter Zijlstra.
Signed-off-by: NChas Williams <chas@cmf.n4rl.navy.mil>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de713b57

openvswitch: Fix memory leak in ovs_vport_alloc() error path · 3791b3f6

由 Christoph Jaeger 提交于 8月 12, 2014

ovs_vport_alloc() bails out without freeing the memory 'vport' points to.

Picked up by Coverity - CID 1230503.

Fixes: 5cd667b0 ("openvswitch: Allow each vport to have an array of 'port_id's.")
Signed-off-by: NChristoph Jaeger <cj@linux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3791b3f6

12 8月, 2014 1 次提交

net: Always untag vlan-tagged traffic on input. · 0d5501c1

由 Vlad Yasevich 提交于 8月 08, 2014

Currently the functionality to untag traffic on input resides
as part of the vlan module and is build only when VLAN support
is enabled in the kernel.  When VLAN is disabled, the function
vlan_untag() turns into a stub and doesn't really untag the
packets.  This seems to create an interesting interaction
between VMs supporting checksum offloading and some network drivers.

There are some drivers that do not allow the user to change
tx-vlan-offload feature of the driver.  These drivers also seem
to assume that any VLAN-tagged traffic they transmit will
have the vlan information in the vlan_tci and not in the vlan
header already in the skb.  When transmitting skbs that already
have tagged data with partial checksum set, the checksum doesn't
appear to be updated correctly by the card thus resulting in a
failure to establish TCP connections.

The following is a packet trace taken on the receiver where a
sender is a VM with a VLAN configued.  The host VM is running on
doest not have VLAN support and the outging interface on the
host is tg3:
10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294837885 ecr 0,nop,wscale 7], length 0
10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294838888 ecr 0,nop,wscale 7], length 0

This connection finally times out.

I've only access to the TG3 hardware in this configuration thus have
only tested this with TG3 driver.  There are a lot of other drivers
that do not permit user changes to vlan acceleration features, and
I don't know if they all suffere from a similar issue.

The patch attempt to fix this another way.  It moves the vlan header
stipping code out of the vlan module and always builds it into the
kernel network core.  This way, even if vlan is not supported on
a virtualizatoin host, the virtual machines running on top of such
host will still work with VLANs enabled.

CC: Patrick McHardy <kaber@trash.net>
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d5501c1

09 8月, 2014 3 次提交

libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly · 5f740d7e

由 Ilya Dryomov 提交于 8月 08, 2014

Determining ->last_piece based on the value of ->page_offset + length
is incorrect because length here is the length of the entire message.
->last_piece set to false even if page array data item length is <=
PAGE_SIZE, which results in invalid length passed to
ceph_tcp_{send,recv}page() and causes various asserts to fire.

    # cat pages-cursor-init.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    # rbd_resize calls librbd rbd_resize(), size is in bytes
    ./rbd_resize bar $(((4 << 20) + 512))
    rbd resize --size 10 bar
    BAR_DEV=$(rbd map bar)
    # trigger a 512-byte copyup -- 512-byte page array data item
    dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5

The problem exists only in ceph_msg_data_pages_cursor_init(),
ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
unnecessary.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

5f740d7e

rtnetlink: fix VF info size · 945a3676

由 Jiri Benc 提交于 8月 08, 2014

Commit 1d8faf48 ("net/core: Add VF link state control") added new
attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
may trigger warnings in rtnl_getlink and similar functions when many VF
links are enabled, as the information does not fit into the allocated skb.

Fixes: 1d8faf48 ("net/core: Add VF link state control")
Reported-by: NYulong Pei <ypei@redhat.com>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

945a3676

ipv4: removed redundant conditional · b7a71b51

由 Niv Yehezkel 提交于 8月 08, 2014

Since fib_lookup cannot return ESRCH no longer,
checking for this error code is no longer neccesary.
Signed-off-by: NNiv Yehezkel <executerx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7a71b51

08 8月, 2014 8 次提交

netfilter: nf_tables: fix error return code · 609ccf08

由 Julia Lawall 提交于 8月 07, 2014

Convert a zero return value on error to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

609ccf08

netfilter: don't use mutex_lock_interruptible() · 7926dbfa

由 Pablo Neira Ayuso 提交于 7月 31, 2014

Eric Dumazet reports that getsockopt() or setsockopt() sometimes
returns -EINTR instead of -ENOPROTOOPT, causing headaches to
application developers.

This patch replaces all the mutex_lock_interruptible() by mutex_lock()
in the netfilter tree, as there is no reason we should sleep for a
long time there.
Reported-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJulian Anastasov <ja@ssi.bg>

7926dbfa

netfilter: nf_tables: don't update chain with unset counters · b88825de

由 Pablo Neira Ayuso 提交于 8月 05, 2014

Fix possible replacement of the per-cpu chain counters by null
pointer when updating an existing chain in the commit path.
Reported-by: NMatteo Croce <technoboy85@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b88825de

netfilter: nf_tables: uninitialize element key/data from the commit path · a3716e70

由 Pablo Neira Ayuso 提交于 8月 01, 2014

This should happen once the element has been effectively released in
the commit path, not before. This fixes a possible chain refcount leak
if the transaction is aborted.
Reported-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a3716e70

netlink: reset network header before passing to taps · 4e48ed88

由 Daniel Borkmann 提交于 8月 07, 2014

netlink doesn't set any network header offset thus when the skb is
being passed to tap devices via dev_queue_xmit_nit(), it emits klog
false positives due to it being unset like:

  ...
  [  124.990397] protocol 0000 is buggy, dev nlmon0
  [  124.990411] protocol 0000 is buggy, dev nlmon0
  ...

So just reset the network header before passing to the device; for
packet sockets that just means nothing will change - mac and net
offset hold the same value just as before.
Reported-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e48ed88

batman: fix duplicate #include of multicast.h · 0a4dd0d7

由 Jean Sacren 提交于 8月 07, 2014

The header multicast.h was included twice, so delete one of them.
Signed-off-by: NJean Sacren <sakiwit@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a4dd0d7

openvswitch: fix duplicate #include headers · 2072ec84

由 Jean Sacren 提交于 8月 07, 2014

The #include headers net/genetlink.h and linux/genetlink.h both were
included twice, so delete each of the duplicate.
Signed-off-by: NJean Sacren <sakiwit@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2072ec84

6lowpan: Allow 6LoWPAN to be modular · 2d177f31

由 Geert Uytterhoeven 提交于 8月 07, 2014

Change config symbol 6LOWPAN from type bool to type tristate, so
6LoWPAN can be built modular, just like IPV6
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d177f31

07 8月, 2014 5 次提交

netlink: hold nl_sock_hash_lock during diag dump · 6c8f7e70

由 Thomas Graf 提交于 8月 07, 2014

Although RCU protection would be possible during diag dump, doing
so allows for concurrent table mutations which can render the
in-table offset between individual Netlink messages invalid and
thus cause legitimate sockets to be skipped in the dump.

Since the diag dump is relatively low volume and consistency is
more important than performance, the table mutex is held during
dump.
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c8f7e70

list: fix order of arguments for hlist_add_after(_rcu) · 1d023284

由 Ken Helias 提交于 8月 06, 2014

All other add functions for lists have the new item as first argument
and the position where it is added as second argument.  This was changed
for no good reason in this function and makes using it unnecessary
confusing.

The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKen Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d023284

tcp: md5: check md5 signature without socket lock · 9ea88a15

由 Dmitry Popov 提交于 8月 07, 2014

Since a8afca03 (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup
doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is
no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these
calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock:
from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv.
Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ea88a15

net-timestamp: cumulative tcp timestamping fixes · f066e2b0

由 Willem de Bruijn 提交于 8月 06, 2014

A set of small fixes pointed out just after the merge:
- make tcp_tx_timestamp static
- make tcp_gso_tstamp static
- use before() to compare TCP seqno, instead of cast to u64
- add tstamp to tx_flags in GSO, instead of overwrite tx_flags
- record skb_shinfo(skb)->tskey for all timestamps, also HW.
- optimization in tcp_tx_timestamp:
  call sock_tx_timestamp only if a tstamp option is set.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Fixes: 4ed2d765 ("net-timestamp: TCP timestamping")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f066e2b0

net-timestamp: sock_tx_timestamp() fix · 140c55d4

由 Eric Dumazet 提交于 8月 06, 2014

sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.

Also first argument (struct sock *) can be const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 4ed2d765 ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

140c55d4