提交 · cffaba15cd95d4a16eb5a6aa5c22a79f67d555ab · openeuler / raspberrypi-kernel

22 3月, 2012 15 次提交

ceph: ensure Boolean options support both senses · cffaba15

由 Alex Elder 提交于 2月 15, 2012

Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

cffaba15

libceph: a few small changes · d3002b97

由 Alex Elder 提交于 2月 14, 2012

This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3002b97

libceph: make ceph_tcp_connect() return int · 41617d0c

由 Alex Elder 提交于 2月 14, 2012

There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

41617d0c

libceph: encapsulate some messenger cleanup code · 6173d1f0

由 Alex Elder 提交于 2月 14, 2012

Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

6173d1f0

libceph: make ceph_msgr_wq private · e0f43c94

由 Alex Elder 提交于 2月 14, 2012

The messenger workqueue has no need to be public.  So give it static
scope.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e0f43c94

libceph: encapsulate connection kvec operations · 859eb799

由 Alex Elder 提交于 2月 14, 2012

Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

859eb799

libceph: move prepare_write_banner() · 963be4d7

由 Alex Elder 提交于 2月 14, 2012

One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

963be4d7

rbd: make ceph_parse_options() return a pointer · ee57741c

由 Alex Elder 提交于 1月 24, 2012

ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ee57741c

ceph: eliminate some abusive casts · 99f0f3b2

由 Alex Elder 提交于 1月 23, 2012

This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

99f0f3b2

ceph: eliminate some needless casts · bd406145

由 Alex Elder 提交于 1月 23, 2012

This eliminates type casts in some places where they are not
required.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

bd406145

ceph: kill addr_str_lock spinlock; use atomic instead · f64a9317

由 Alex Elder 提交于 1月 23, 2012

A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f64a9317

ceph: make use of "else" where appropriate · a5bc3129

由 Alex Elder 提交于 1月 23, 2012

Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a5bc3129

ceph: use a shared zero page rather than one per messenger · 57666519

由 Alex Elder 提交于 1月 23, 2012

Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

57666519

libceph: fix overflow check in crush_decode() · 64486697

由 Xi Wang 提交于 2月 16, 2012

The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

64486697

net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer · 182fac26

由 Jim Schutt 提交于 2月 29, 2012

The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

182fac26

17 3月, 2012 2 次提交

netfilter: ctnetlink: fix race between delete and timeout expiration · a16a1647

由 Pablo Neira Ayuso 提交于 3月 16, 2012

Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: NKerin Millar <kerframil@gmail.com>
Reported-by: NKerin Millar <kerframil@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a16a1647

ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu. · c5779237

由 RongQing.Li 提交于 3月 15, 2012

ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61 (ipv6: mcast: RCU conversions) ]
Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5779237

16 3月, 2012 2 次提交

sch_sfq: revert dont put new flow at the end of flows · cc34eb67

由 Eric Dumazet 提交于 3月 13, 2012

This reverts commit d47a0ac7 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: NJesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc34eb67

ipv6: fix icmp6_dst_alloc() · 122bdf67

由 Eric Dumazet 提交于 3月 14, 2012

commit 87a11578 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

122bdf67

12 3月, 2012 1 次提交

tcp: fix syncookie regression · dfd25fff

由 Eric Dumazet 提交于 3月 10, 2012

commit ea4fc0d6 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.

Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.

In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.

In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.

As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().
Reported-by: NSimon Kirby <sim@netnation.com>
Bisected-by: NSimon Kirby <sim@netnation.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfd25fff

08 3月, 2012 5 次提交

route: Remove redirect_genid · ac3f48de

由 Steffen Klassert 提交于 3月 06, 2012

As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3f48de

inetpeer: Invalidate the inetpeer tree along with the routing cache · 5faa5df1

由 Steffen Klassert 提交于 3月 06, 2012

We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5faa5df1

bridge: fix state reporting when port is disabled · 5200959b

由 Paulius Zaleckas 提交于 3月 06, 2012

Now we have:
eth0: link *down*
br0: port 1(eth0) entered *forwarding* state

br_log_state(p) should be called *after* p->state is set
to BR_STATE_DISABLED.
Reported-by: NZilvinas Valinskas <zilvinas@wilibox.com>
Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5200959b

bridge: br_log_state() s/entering/entered/ · d9e179ec

由 Paulius Zaleckas 提交于 3月 06, 2012

When br_log_state() is reporting state it should say "entered"
istead of "entering" since state at this point is already
changed.
Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9e179ec

openvswitch: Fix checksum update for actions on UDP packets. · 81e5d41d

由 Jesse Gross 提交于 3月 06, 2012

When modifying IP addresses or ports on a UDP packet we don't
correctly follow the rules for unchecksummed packets.  This meant
that packets without a checksum can be given a incorrect new checksum
and packets with a checksum can become marked as being unchecksummed.
This fixes it to handle those requirements.
Signed-off-by: NJesse Gross <jesse@nicira.com>

81e5d41d

07 3月, 2012 8 次提交

openvswitch: Honor dp_ifindex, when specified, for vport lookup by name. · 651a68ea

由 Ben Pfaff 提交于 3月 06, 2012

When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the
logical behavior would be for the vport name lookup scope to be limited
to the specified datapath, but in fact the dp_ifindex value was ignored.
This commit causes the search scope to be honored.
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

651a68ea

IPv6: Fix not join all-router mcast group when forwarding set. · d6ddef9e

由 Li Wei 提交于 3月 05, 2012

When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.
Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6ddef9e

netfilter: nf_conntrack: fix early_drop with reliable event delivery · 74138511

由 Pablo Neira Ayuso 提交于 3月 06, 2012

If reliable event delivery is enabled and ctnetlink fails to deliver
the destroy event in early_drop, the conntrack subsystem cannot
drop any the candidate flow that was planned to be evicted.
Reported-by: NKerin Millar <kerframil@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74138511

bridge: netfilter: don't call iptables on vlan packets if sysctl is off · 739e4505

由 Florian Westphal 提交于 3月 06, 2012

When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets
arriving should not be sent to ip(6)tables by bridge netfilter.

However, it turns out that we currently always send VLAN packets to
netfilter, if ..
a), CONFIG_VLAN_8021Q is enabled ; or
b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled
   on the bridge port.

This is because bridge netfilter treats skb with
skb->protocol == ETH_P_IP{V6} as "non-vlan packet".

With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has
already been removed here, and we cannot rely on skb->protocol alone.

Fix this by only using skb->protocol if the skb has no vlan tag,
or if a vlan tag is present and filter-vlan-tagged bridge netfilter
sysctl is enabled.

We cannot remove the skb->protocol == htons(ETH_P_8021Q) test
because the vlan tag is still around in the CONFIG_VLAN_8021Q=n &&
"ethtool -K $itf rxvlan off" case.

reproducer:
iptables -t raw -I PREROUTING -i br0
iptables -t raw -I PREROUTING -i br0.1

Then send packets to an ip address configured on br0.1 interface.
Even with net.bridge.bridge-nf-filter-vlan-tagged=0, the 1st rule
will match instead of the 2nd one.

With this patch applied, the 2nd rule will match instead.
In the non-local address case, netfilter won't be consulted after
this patch unless the sysctl is switched on.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

739e4505

netfilter: bridge: fix wrong pointer dereference · a157b9d5

由 Pablo Neira Ayuso 提交于 3月 06, 2012

In adf7ff8, a invalid dereference was added in ebt_make_names.

CC [M] net/bridge/netfilter/ebtables.o
net/bridge/netfilter/ebtables.c: In function `ebt_make_names':
net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a157b9d5

netfilter: ctnetlink: remove incorrect spin_[un]lock_bh on NAT module autoload · 8be619d1

由 Pablo Neira Ayuso 提交于 3月 06, 2012

Since 7d367e06, ctnetlink_new_conntrack is called without holding
the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup
does not require to release that spinlock anymore in the NAT module
autoload case.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8be619d1

netfilter: ebtables: fix wrong name length while copying to user-space · 848edc69

由 Santosh Nayak 提交于 3月 06, 2012

user-space ebtables expects 32 bytes-long names, but xt_match names
use 29 bytes. We have to copy less 29 bytes and then, make sure we
fill the remaining bytes with zeroes.
Signed-off-by: NSantosh Nayak <santoshprasadnayak@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

848edc69

tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97

由 Neal Cardwell 提交于 3月 05, 2012

This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.

This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).

Since 2008 (832d11c5)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.

Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.

Before the fixes in commits cc9a672e
and daef52ba, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.

After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:

(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
    then shifts a prefix of that skb that is below snd_una

(2) tcp_shifted_skb() increments the packet count of the
    already-SACKed prev sk_buff

(3) tcp_sacktag_one() sees the end of the new SACKed range is below
    snd_una, so it short-circuits and doesn't increase tp->sacked_out

(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
    decrements tp->sacked_out by this "inflated" pcount that was
    missing a matching increase in tp->sacked_out, and hence
    tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
    to s32 is negative.

(6) this leads to the warnings seen in the recent "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
    tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);

More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.

This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4648dc97

06 3月, 2012 1 次提交

bridge: check return value of ipv6_dev_get_saddr() · d1d81d4c

由 Ulrich Weber 提交于 3月 05, 2012

otherwise source IPv6 address of ICMPV6_MGM_QUERY packet
might be random junk if IPv6 is disabled on interface or
link-local address is not yet ready (DAD).
Signed-off-by: NUlrich Weber <ulrich.weber@sophos.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1d81d4c

05 3月, 2012 3 次提交

rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo() · a4b64fbe

由 Eric Dumazet 提交于 3月 04, 2012

nlmsg_parse() might return an error, so test its return value before
potential random memory accesses.

Errors introduced in commit 115c9b81 (rtnetlink: Fix problem with
buffer allocation)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4b64fbe

bridge: message age needs to increase, not decrease. · 709e1b5c

由 Joakim Tjernlund 提交于 3月 01, 2012

commit bridge: send proper message_age in config BPDU
added this gem:
  bpdu.message_age = (jiffies - root->designated_age)
  p->designated_age = jiffies + bpdu->message_age;
Notice how bpdu->message_age is negated when reassigned to
bpdu.message_age. This causes message age to decrease breaking the
STP protocol.
Signed-off-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709e1b5c

bridge: Adjust min age inc for HZ > 256 · aaca735f

由 Joakim Tjernlund 提交于 3月 01, 2012

min age increment needs to round up its min age tick for all
HZ values to guarantee message age is increasing.
Signed-off-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aaca735f

04 3月, 2012 1 次提交

tcp: don't fragment SACKed skbs in tcp_mark_head_lost() · c0638c24

由 Neal Cardwell 提交于 3月 02, 2012

In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:

(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).

(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0638c24

29 2月, 2012 1 次提交

tcp: fix false reordering signal in tcp_shifted_skb · 4c90d3b3

由 Neal Cardwell 提交于 2月 26, 2012

When tcp_shifted_skb() shifts bytes from the skb that is currently
pointed to by 'highest_sack' then the increment of
TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
implicit advancement, combined with the recent fix to pass the correct
SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
that the newly SACKed range was before the tcp_highest_sack_seq(),
leading to a call to tcp_update_reordering() with a degree of
reordering matching the size of the newly SACKed range (typically just
1 packet, which is a NOP, but potentially larger).

This commit fixes this by simply calling tcp_sacktag_one() before the
TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
highest SACKed sequence.

Correspondingly, we can simplify the code a little now that
tcp_shifted_skb() should update the lost_cnt_hint in all cases where
skb == tp->lost_skb_hint.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c90d3b3

25 2月, 2012 1 次提交

netfilter: bridge: fix module autoload in compat case · e899b111

由 Florian Westphal 提交于 2月 22, 2012

We expected 0 if module doesn't exist, which is no longer the case
(42046e2e,
netfilter: x_tables: return -ENOENT for non-existant matches/targets).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e899b111