提交 · bd566e7525b5986864e8d6eb5b67640abcd284a9 · openeuler / raspberrypi-kernel

21 1月, 2008 1 次提交

[IPV4] fib_hash: fix duplicated route issue · bd566e75

由 Joonwoo Park 提交于 1月 18, 2008

http://bugzilla.kernel.org/show_bug.cgi?id=9493

The fib allows making identical routes with 'ip route replace'.
This patch makes the fib return -EEXIST if replacement would cause duplication.
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd566e75

10 1月, 2008 1 次提交

[IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache · 0bcceadc

由 Eric Dumazet 提交于 1月 10, 2008

In rt_cache_get_next(), no need to guard seq->private by a
rcu_dereference() since seq is private to the thread running this
function. Reading seq.private once (as guaranted bu rcu_dereference())
or several time if compiler really is dumb enough wont change the
result.

But we miss real spots where rcu_dereference() are needed, both in
rt_cache_get_first() and rt_cache_get_next()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bcceadc

09 1月, 2008 4 次提交

[LRO] Fix lro_mgr->features checks · 877364e6

由 Brice Goglin 提交于 1月 07, 2008

lro_mgr->features contains a bitmask of LRO_F_* values which are
defined as power of two, not as bit indexes.
They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).
Signed-off-by: NBrice Goglin <Brice.Goglin@inria.fr>
Acked-by: NAndrew Gallatin <gallatin@myri.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877364e6

[IPV4] ROUTE: ip_rt_dump() is unecessary slow · d8c92830

由 Eric Dumazet 提交于 1月 07, 2008

I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache
is fetched quite fast :

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem,
removing quadratic behavior.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8c92830

[IPV4] ipconfig: Fix regression in ip command line processing · 92ffb85d

由 Amos Waterland 提交于 1月 05, 2008

The recent changes for ip command line processing fixed some problems
but unfortunately broke some common usage scenarios.  In current
2.6.24-rc6 the following command line results in no IP address
assignment, which is surely a regression:

 ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off

Please find below a patch that works for all cases I can find.
Signed-off-by: NAmos Waterland <apw@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ffb85d

[IPV4] raw: Strengthen check on validity of iph->ihl · f844c74f

由 Herbert Xu 提交于 1月 05, 2008

We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length.  However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.

This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f844c74f

04 1月, 2008 1 次提交

[INET]: Fix netdev renaming and inet address labels · 44344b2a

由 Mark McLoughlin 提交于 1月 04, 2008

When re-naming an interface, the previous secondary address
labels get lost e.g.

  $> brctl addbr foo
  $> ip addr add 192.168.0.1 dev foo
  $> ip addr add 192.168.0.2 dev foo label foo:00
  $> ip addr show dev foo | grep inet
    inet 192.168.0.1/32 scope global foo
    inet 192.168.0.2/32 scope global foo:00
  $> ip link set foo name bar
  $> ip addr show dev bar | grep inet
    inet 192.168.0.1/32 scope global bar
    inet 192.168.0.2/32 scope global bar:2

Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44344b2a

30 12月, 2007 1 次提交

[TCP]: use non-delayed ACK for congestion control RTT · 2072c228

由 Gavin McCullagh 提交于 12月 29, 2007

When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet.  The first (in order of seq
number) will be artificially long due to the delay waiting for the
second packet, the second will trigger the ACK and so will not itself
be delayed.

According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in
the delayed ack.  For congestion control however, it seems measuring
delayed ack delay is not desirable as it varies independently of
congestion.

The patch below causes seq_rtt and last_ackt to be updated with any
available later packet rtts which should have less (and hopefully
zero) delack delay.  The rtt value then gets passed to
ca_ops->pkts_acked().

Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
within a TSO chunk (!fully_acked), using only the final ACK (which
includes any TSO delay) to generate RTTs.  This patch removes these
checks so RTTs are passed for each ACK to ca_ops->pkts_acked().

For non-delay based congestion control (cubic, h-tcp), rtt is
sometimes used for rtt-scaling.  In shortening the RTT, this may make
them a little less aggressive.  Delay-based schemes (eg vegas, veno,
illinois) should get a cleaner, more accurate congestion signal,
particularly for small cwnds.  The congestion control module can
potentially also filter out bad RTTs due to the delayed ack alarm by
looking at the associated cnt which (where delayed acking is in use)
should probably be 1 if the alarm went off or greater if the ACK was
triggered by a packet.
Signed-off-by: NGavin McCullagh <gavin.mccullagh@nuim.ie>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2072c228

29 12月, 2007 1 次提交

[IPV4] Fix ip=dhcp regression · 9cecd07c

由 Simon Horman 提交于 12月 27, 2007

David Brownell pointed out a regression in my recent "Fix ip command
line processing" patch. It turns out to be a fairly blatant oversight on
my part whereby ic_enable is never set, and thus autoconfiguration is
never enabled. Clearly my testing was broken :-(

The solution that I have is to set ic_enable to 1 if we hit
ip_auto_config_setup(), which basically means that autoconfiguration is
activated unless told otherwise. I then flip ic_enable to 0 if ip=off,
ip=none, ip=::::::off or ip=::::::none using ic_proto_name();

The incremental patch is below, let me know if a non-incremental version
is prepared, as I did as for the original patch to be reverted pending a
fix.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cecd07c

27 12月, 2007 2 次提交

[IPV4]: Fix ip command line processing. · a6c05c3d

由 Simon Horman 提交于 12月 25, 2007

Recently the documentation in Documentation/nfsroot.txt was
update to note that in fact ip=off and ip=::::::off as the
latter is ignored and the default (on) is used.

This was certainly a step in the direction of reducing confusion.
But it seems to me that the code ought to be fixed up so that
ip=::::::off actually turns off ip autoconfiguration.

This patch also notes more specifically that ip=on (aka ip=::::::on)
is the default.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6c05c3d

[NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility · fae718dd

由 Patrick McHardy 提交于 12月 24, 2007

Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.

Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fae718dd

21 12月, 2007 2 次提交

[IPV4]: OOPS with NETLINK_FIB_LOOKUP netlink socket · d883a036

由 Denis V. Lunev 提交于 12月 21, 2007

[ Regression added by changeset:
	cd40b7d3
	[NET]: make netlink user -> kernel interface synchronious
  -DaveM ]

nl_fib_input re-reuses incoming skb to send the reply. This means that this
packet will be freed twice, namely in:
- netlink_unicast_kernel
- on receive path
Use clone to send as a cure, the caller is responsible for kfree_skb on error.

Thanks to Alexey Dobryan, who originally found the problem.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d883a036

[NETFILTER] ipv4: Spelling fixes · e00ccd4a

由 Joe Perches 提交于 12月 20, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e00ccd4a

20 12月, 2007 2 次提交

[IPV4] ip_gre: set mac_header correctly in receive path · 1d069167

由 Timo Teras 提交于 12月 20, 2007

mac_header update in ipgre_recv() was incorrectly changed to
skb_reset_mac_header() when it was introduced.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d069167

[IPV4] ARP: Remove not used code · e0260fed

由 Mark Ryden 提交于 12月 19, 2007

In arp_process() (net/ipv4/arp.c), there is unused code: definition
and assignment of tha (target hw address ).
Signed-off-by: NMark Ryden <markryde@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0260fed

17 12月, 2007 1 次提交

[IPV4]: Make tcp_input_metrics() get minimum RTO via tcp_rto_min() · 488faa2a

由 Satoru SATOH 提交于 12月 16, 2007

tcp_input_metrics() refers to the built-time constant TCP_RTO_MIN
regardless of configured minimum RTO with iproute2.
Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

488faa2a

15 12月, 2007 2 次提交

[IPV4]: Updates to nfsroot documentation · f33e1d9f

由 Amos Waterland 提交于 12月 14, 2007

The difference between ip=off and ip=::::::off has been a cause of much
confusion.  Document how each behaves, and do not contradict ourselves by
saying that "off" is the default when in fact "any" is the default and is
descibed as being so lower in the file.
Signed-off-by: NAmos Waterland <apw@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f33e1d9f

[NETFILTER]: ip_tables: fix compat copy race · a18aa31b

由 Patrick McHardy 提交于 12月 12, 2007

When copying entries to user, the kernel makes two passes through the
data, first copying all the entries, then fixing up names and counters.
On the second pass it copies the kernel and match data from userspace
to the kernel again to find the corresponding structures, expecting
that kernel pointers contained in the data are still valid.

This is obviously broken, fix by avoiding the second pass completely
and fixing names and counters while dumping the ruleset, using the
kernel-internal data structures.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a18aa31b

11 12月, 2007 2 次提交

[IPv4] ESP: Discard dummy packets introduced in rfc4303 · 2017a72c

由 Thomas Graf 提交于 12月 10, 2007

RFC4303 introduces dummy packets with a nexthdr value of 59
to implement traffic confidentiality. Such packets need to
be dropped silently and the payload may not be attempted to
be parsed as it consists of random chunk.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2017a72c

[IPV4]: Swap the ifa allocation with the"ipv4_devconf_setall" call · a4e65d36

由 Pavel Emelyanov 提交于 12月 07, 2007

According to Herbert, the ipv4_devconf_setall should be called
only when the ifa is added to the device. However, failed
ifa allocation may bring things into inconsistent state.

Move the call to ipv4_devconf_setall after the ifa allocation.

Fits both net-2.6 (with offsets) and net-2.6.25 (cleanly).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4e65d36

07 12月, 2007 2 次提交

[IPV4]: Remove prototype of ip_rt_advice · 56c99d04

由 Denis V. Lunev 提交于 12月 06, 2007

ip_rt_advice has been gone, so no need to keep prototype and debug message.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56c99d04

[IPv4]: Reply net unreachable ICMP message · 7f53878d

由 Mitsuru Chinen 提交于 12月 07, 2007

IPv4 stack doesn't reply any ICMP destination unreachable message
with net unreachable code when IP detagrams are being discarded
because of no route could be found in the forwarding path.
Incidentally, IPv6 stack replies such ICMPv6 message in the similar
situation.
Signed-off-by: NMitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f53878d

05 12月, 2007 6 次提交

[LRO]: fix lro_gen_skb() alignment · 621544eb

由 Andrew Gallatin 提交于 12月 05, 2007

Add a field to the lro_mgr struct so that drivers can specify how much
padding is required to align layer 3 headers when a packet is copied
into a freshly allocated skb by inet_lro.c:lro_gen_skb().  Without
padding, skbs generated by LRO will cause alignment warnings on
architectures which require strict alignment (seen on sparc64).

Myri10GE is updated to use this field.
Signed-off-by: NAndrew Gallatin <gallatin@myri.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

621544eb

[TCP]: NAGLE_PUSH seems to be a wrong way around · 4e67d876

由 Ilpo Jrvinen 提交于 12月 05, 2007

The comment in tcp_nagle_test suggests that. This bug is very
very old, even 2.4.0 seems to have it.
Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e67d876

[TCP]: Move prior_in_flight collect to more robust place · 52d34081

由 Ilpo Jrvinen 提交于 12月 05, 2007

The previous location is after sacktag processing, which affects
counters tcp_packets_in_flight depends on. This may manifest as
wrong behavior if new SACK blocks are present and all is clear
for call to tcp_cong_avoid, which in the case of
tcp_reno_cong_avoid bails out early because it thinks that
TCP is not limited by cwnd.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52d34081

[TCP] FRTO: Use of existing funcs make code more obvious & robust · 3e6f049e

由 Ilpo Jrvinen 提交于 12月 05, 2007

Though there's little need for everything that tcp_may_send_now
does (actually, even the state had to be adjusted to pass some
checks FRTO does not want to occur), it's more robust to let it
make the decision if sending is allowed. State adjustments
needed:
- Make sure snd_cwnd limit is not hit in there
- Disable nagle (if necessary) through the frto_counter == 2

The result of check for frto_counter in argument to call for
tcp_enter_frto_loss can just be open coded, therefore there
isn't need to store the previous frto_counter past
tcp_may_send_now.

In addition, returns can then be combined.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e6f049e

[IPVS]: Fix sched registration race when checking for name collision. · 4ac63ad6

由 Pavel Emelyanov 提交于 12月 04, 2007

The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.

This is racy, since we can have a race window between the lock
being re-locked for writing.

The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ac63ad6

[IPVS]: Don't leak sysctl tables if the scheduler registration fails. · a014bc8f

由 Pavel Emelyanov 提交于 12月 04, 2007

In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.

I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a014bc8f

03 12月, 2007 1 次提交

[INET]: Fix inet_diag dead-lock regression · d523a328

由 Herbert Xu 提交于 12月 03, 2007

The inet_diag register fix broke inet_diag module loading because the
loaded module had to take the same mutex that's already held by the
loader in order to register the new handler.

This patch fixes it by introducing a separate mutex to protect the
handling of handlers.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d523a328

29 11月, 2007 2 次提交

[TCP] illinois: Incorrect beta usage · a357dde9

由 Stephen Hemminger 提交于 11月 30, 2007

Lachlan Andrew observed that my TCP-Illinois implementation uses the
beta value incorrectly:
  The parameter  beta  in the paper specifies the amount to decrease
  *by*:  that is, on loss,
     W <-  W -  beta*W
  but in   tcp_illinois_ssthresh() uses  beta  as the amount
  to decrease  *to*: W <- beta*W

This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network,
hurting performance. Note: since the base beta value is .5, it has no
impact on a congested network.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a357dde9

[INET]: Fix inet_diag register vs rcv race · 07693198

由 Pavel Emelyanov 提交于 11月 30, 2007

The following race is possible when one cpu unregisters the handler
while other one is trying to receive a message and call this one:

CPU1:                                                 CPU2:
inet_diag_rcv()                                       inet_diag_unregister()
  mutex_lock(&inet_diag_mutex);
  netlink_rcv_skb(skb, &inet_diag_rcv_msg);
    if (inet_diag_table[nlh->nlmsg_type] == 
                               NULL) /* false handler is still registered */
    ...
    netlink_dump_start(idiagnl, skb, nlh,
                           inet_diag_dump, NULL);
           cb = kzalloc(sizeof(*cb), GFP_KERNEL);
                   /* sleep here freeing memory 
                    * or preempt
                    * or sleep later on nlk->cb_mutex
                    */
                                                         spin_lock(&inet_diag_register_lock);
                                                         inet_diag_table[type] = NULL;
    ...                                                  spin_unlock(&inet_diag_register_lock);
                                                         synchronize_rcu();
                                                         /* CPU1 is sleeping - RCU quiescent
                                                          * state is passed
                                                          */
                                                         return;
    /* inet_diag_dump is finally called: */
    inet_diag_dump()
      handler = inet_diag_table[cb->nlh->nlmsg_type];
      BUG_ON(handler == NULL); 
      /* OOPS! While we slept the unregister has set
       * handler to NULL :(
       */

Grep showed, that the register/unregister functions are called
from init/fini module callbacks for tcp_/dccp_diag, so it's OK
to use the inet_diag_mutex to synchronize manipulations with the
inet_diag_table and the access to it.

Besides, as Herbert pointed out, asynchronous dumps should hold 
this mutex as well, and thus, we provide the mutex as cb_mutex one.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

07693198

26 11月, 2007 1 次提交

[IPV4]: Remove bogus ifdef mess in arp_process · 3660019e

由 Adrian Bunk 提交于 11月 26, 2007

The #ifdef's in arp_process() were not only a mess, they were also wrong 
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or 
CONFIG_NETDEV_10000=y) cases.

Since they are not required this patch removes them.

Also removed are some #ifdef's around #include's that caused compile 
errors after this change.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3660019e

23 11月, 2007 2 次提交

[TCP] MTUprobe: Cleanup send queue check (no need to loop) · 7f9c33e5

由 Ilpo Järvinen 提交于 11月 23, 2007

The original code has striking complexity to perform a query
which can be reduced to a very simple compare.

FIN seqno may be included to write_seq but it should not make
any significant difference here compared to skb->len which was
used previously. One won't end up there with SYN still queued.

Use of write_seq check guarantees that there's a valid skb in
send_head so I removed the extra check.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7f9c33e5

[TCP]: MTUprobe: receiver window & data available checks fixed · 91cc17c0

由 Ilpo Järvinen 提交于 11月 23, 2007

It seems that the checked range for receiver window check should
begin from the first rather than from the last skb that is going
to be included to the probe. And that can be achieved without
reference to skbs at all, snd_nxt and write_seq provides the
correct seqno already. Plus, it SHOULD account packets that are
necessary to trigger fast retransmit [RFC4821].

Location of snd_wnd < probe_size/size_needed check is bogus
because it will cause the other if() match as well (due to
snd_nxt >= snd_una invariant).

Removed dead obvious comment.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

91cc17c0

21 11月, 2007 4 次提交

[IPVS]: Fix compiler warning about unused register_ip_vs_protocol · d535a916

由 Pavel Emelyanov 提交于 11月 20, 2007

This is silly, but I have turned the CONFIG_IP_VS to m,
to check the compilation of one (recently sent) fix
and set all the CONFIG_IP_VS_PROTO_XXX options to n to
speed up the compilation.

In this configuration the compiler warns me about

  CC [M]  net/ipv4/ipvs/ip_vs_proto.o
net/ipv4/ipvs/ip_vs_proto.c:49: warning: 'register_ip_vs_protocol' defined but not used

Indeed. With no protocols selected there are no
calls to this function - all are compiled out with
ifdefs.

Maybe the best fix would be to surround this call with
ifdef-s or tune the Kconfig dependences, but I think that
marking this register function as __used is enough. No?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d535a916

[ARP]: Fix arp reply when sender ip 0 · b4a9811c

由 Jonas Danielsson 提交于 11月 20, 2007

Fix arp reply when received arp probe with sender ip 0.

Send arp reply with target ip address 0.0.0.0 and target hardware
address set to hardware address of requester. Previously sent reply
with target ip address and target hardware address set to same as
source fields.
Signed-off-by: NJonas Danielsson <the.sator@gmail.com>
Acked-by: NAlexey Kuznetov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4a9811c

Y
[IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps. · 354faf09
由 YOSHIFUJI Hideaki 提交于 11月 20, 2007
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
354faf09
Y
[IPV4] TCPMD5: Omit redundant NULL check for kfree() argument. · a80cc20d
由 YOSHIFUJI Hideaki 提交于 11月 20, 2007
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
a80cc20d

20 11月, 2007 2 次提交

[NETFILTER]: Fix kernel panic with REDIRECT target. · 1f305323

由 Evgeniy Polyakov 提交于 11月 20, 2007

When connection tracking entry (nf_conn) is about to copy itself it can
have some of its extension users (like nat) as being already freed and
thus not required to be copied.

Actually looking at this function I suspect it was copied from
nf_nat_setup_info() and thus bug was introduced.

Report and testing from David <david@unsolicited.net>.

[ Patrick McHardy states:

	I now understand whats happening:

	- new connection is allocated without helper
	- connection is REDIRECTed to localhost
	- nf_nat_setup_info adds NAT extension, but doesn't initialize it yet
	- nf_conntrack_alter_reply performs a helper lookup based on the
	   new tuple, finds the SIP helper and allocates a helper extension,
	   causing reallocation because of too little space
	- nf_nat_move_storage is called with the uninitialized nat extension

	So your fix is entirely correct, thanks a lot :)  ]
Signed-off-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f305323

[IPV4]: Add missing "space" · 464c4f18

由 Joe Perches 提交于 11月 19, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464c4f18