提交 · ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5 · OpenHarmony / kernel_linux

13 6月, 2008 1 次提交

tcp: Revert 'process defer accept as established' changes. · ec0a1966

由 David S. Miller 提交于 6月 12, 2008

This reverts two changesets, ec3c0982
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0a
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems.  The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock.  The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> 		goto death;
> 	}
>
>+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+		tcp_send_active_reset(sk, GFP_ATOMIC);
>+		goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec0a1966

11 6月, 2008 2 次提交

net: Fix routing tables with id > 255 for legacy software · 709772e6

由 Krzysztof Piotr Oledzki 提交于 6月 10, 2008

Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).

This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.
Signed-off-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709772e6

inet{6}_request_sock: Init ->opt and ->pktopts in the constructor · ce4a7d0d

由 Arnaldo Carvalho de Melo 提交于 6月 10, 2008

Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce4a7d0d

05 6月, 2008 7 次提交

tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits. · 293ad604

由 Octavian Purdila 提交于 6月 04, 2008

skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.

This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.

Based on idea and initial patch from Evgeniy Polyakov.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Acked-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

293ad604

tcp: Increment OUTRSTS in tcp_send_active_reset() · 26af65cb

由 Sridhar Samudrala 提交于 6月 04, 2008

TCP "resets sent" counter is not incremented when a TCP Reset is 
sent via tcp_send_active_reset().
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26af65cb

raw: Raw socket leak. · 22dd4850

由 Denis V. Lunev 提交于 6月 04, 2008

The program below just leaks the raw kernel socket

int main() {
        int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
        struct sockaddr_in addr;

        memset(&addr, 0, sizeof(addr));
        inet_aton("127.0.0.1", &addr.sin_addr);
        addr.sin_family = AF_INET;
        addr.sin_port = htons(2048);
        sendto(fd,  "a", 1, MSG_MORE, &addr, sizeof(addr));
        return 0;
}

Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22dd4850

tcp: fix skb vs fack_count out-of-sync condition · a6604471

由 Ilpo Järvinen 提交于 6月 04, 2008

This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
  1) DSACK in the middle of previous SACK block must be generated.
  2) In order to take that particular branch, part or all of the
     DSACKed segment must already be SACKed so that we have that
     in cache in the first place.
  3) The new info must be top enough so that fackets_out will be
     updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.

It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.

Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).

This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.

This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b ([TCP]: Rewrite SACK block processing & 
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6604471

[IPV6]: inet_sk(sk)->cork.opt leak · 36d926b9

由 Denis V. Lunev 提交于 6月 04, 2008

IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

36d926b9

Y
[IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel. · baa2bfb8
由 YOSHIFUJI Hideaki 提交于 5月 30, 2008
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
```
baa2bfb8

tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp)) · 8aca6cb1

由 Ilpo Järvinen 提交于 6月 04, 2008

It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.

I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).

This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8aca6cb1

04 6月, 2008 3 次提交

route: Remove unused ifa_anycast field · ab32cd79

由 Thomas Graf 提交于 6月 03, 2008

The field was supposed to allow the creation of an anycast route by
assigning an anycast address to an address prefix. It was never
implemented so this field is unused and serves no purpose. Remove it.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab32cd79

route: Mark unused routing attributes as such · 1f9d11c7

由 Thomas Graf 提交于 6月 03, 2008

Also removes an unused policy entry for an attribute which is
only used in kernel->user direction.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f9d11c7

route: Mark unused route cache flags as such. · 51b77cae

由 Thomas Graf 提交于 6月 03, 2008

Also removes an obsolete check for the unused flag RTCF_MASQ.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51b77cae

22 5月, 2008 3 次提交

net: The world is not perfect patch. · 071f92d0

由 Rami Rosen 提交于 5月 21, 2008

  Unless there will be any objection here, I suggest consider the
following patch which simply removes the code for the
-DI_WISH_WORLD_WERE_PERFECT in the three methods which use it.

The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT
show that this code was not built and not used for really a long time.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

071f92d0

net/ipv4/arp.c: Use common hex_asc helpers · 51f82a2b

由 Denis Cheng 提交于 5月 21, 2008

Here the local hexbuf is a duplicate of global const char hex_asc from
lib/hexdump.c, except the hex letters' cases:

	const char hexbuf[] = "0123456789ABCDEF";

	const char hex_asc[] = "0123456789abcdef";

and here to print HW addresses, the hex cases are not significant.

Thanks to Harvey Harrison to introduce the hex_asc_hi/hex_asc_lo helpers.
Signed-off-by: NDenis Cheng <crquan@gmail.com>
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f82a2b

tcp: TCP connection times out if ICMP frag needed is delayed · 7d227cd2

由 Sridhar Samudrala 提交于 5月 21, 2008

We are seeing an issue with TCP in handling an ICMP frag needed
message that is received after net.ipv4.tcp_retries1 retransmits.
The default value of retries1 is 3. So if the path mtu changes
and ICMP frag needed is lost for the first 3 retransmits or if
it gets delayed until 3 retransmits are done, TCP doesn't update
MSS correctly and continues to retransmit the orginal message
until it timesout after tcp_retries2 retransmits.

I am seeing this issue even with the latest 2.6.25.4 kernel.

In tcp_retransmit_timer(), when retransmits counter exceeds 
tcp_retries1 value, the dst cache entry of the socket is reset.
At this time, if we receive an ICMP frag needed message, the 
dst entry gets updated with the new MTU, but the TCP sockets
dst_cache entry remains NULL.

So the next time when we try to retransmit after the ICMP frag
needed is received, tcp_retransmit_skb() gets called. Here the
cur_mss value is calculated at the start of the routine with
a NULL sk_dst_cache. Instead we should call tcp_current_mss after
the rebuild_header that caches the dst entry with the updated mtu.
Also the rebuild_header should be called before tcp_fragment
so that skb is fragmented if the mss goes down.
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d227cd2

21 5月, 2008 1 次提交

ipsec: Use the correct ip_local_out function · 1ac06e03

由 Herbert Xu 提交于 5月 20, 2008

Because the IPsec output function xfrm_output_resume does its
own dst_output call it should always call __ip_local_output
instead of ip_local_output as the latter may invoke dst_output
directly.  Otherwise the return values from nf_hook and dst_output
may clash as they both use the value 1 but for different purposes.

When that clash occurs this can cause a packet to be used after
it has been freed which usually leads to a crash.  Because the
offending value is only returned from dst_output with qdiscs
such as HTB, this bug is normally not visible.

Thanks to Marco Berizzi for his perseverance in tracking this
down.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac06e03

14 5月, 2008 1 次提交

cipso: Relax too much careful cipso hash function. · 5e0f8923

由 Pavel Emelyanov 提交于 5月 13, 2008

The cipso_v4_cache is allocated to contain CIPSO_V4_CACHE_BUCKETS
buckets. The CIPSO_V4_CACHE_BUCKETS = 1 << CIPSO_V4_CACHE_BUCKETBITS,
where CIPSO_V4_CACHE_BUCKETBITS = 7.

The bucket-selection function for this hash is calculated like this:

  bkt = hash & (CIPSO_V4_CACHE_BUCKETBITS - 1);
                                     ^^^

i.e. picking only 4 buckets of possible 128 :)
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e0f8923

13 5月, 2008 3 次提交

tcp FRTO: work-around inorder receivers · 79d44516

由 Ilpo Järvinen 提交于 5月 13, 2008

If receiver consumes segments successfully only in-order, FRTO
fallback to conventional recovery produces RTO loop because
FRTO's forward transmissions will always get dropped and need to
be resent, yet by default they're not marked as lost (which are
the only segments we will retransmit in CA_Loss).

Price to pay about this is occassionally unnecessarily
retransmitting the forward transmission(s). SACK blocks help
a bit to avoid this, so it's mainly a concern for NewReno case
though SACK is not fully immune either.

This change has a side-effect of fixing SACKFRTO problem where
it didn't have snd_nxt of the RTO time available anymore when
fallback become necessary (this problem would have only occured
when RTO would occur for two or more segments and ECE arrives
in step 3; no need to figure out how to fix that unless the
TODO item of selective behavior is considered in future).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NDamon L. Chesser <damon@damtek.com>
Tested-by: NDamon L. Chesser <damon@damtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79d44516

tcp FRTO: Fix fallback to conventional recovery · a1c1f281

由 Ilpo Järvinen 提交于 5月 13, 2008

It seems that commit 009a2e3e ("[TCP] FRTO: Improve
interoperability with other undo_marker users") run into
another land-mine which caused fallback to conventional
recovery to break:

1. Cumulative ACK arrives after FRTO retransmission
2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp
   which should be kept like in CA_Loss state it would be
3. undo_marker change allowed tcp_packet_delayed to return
   true because of the cleared retrans_stamp once FRTO is
   terminated causing LossUndo to occur, which means all loss
   markings FRTO made are reverted.

This means that the conventional recovery basically recovered
one loss per RTT, which is not that efficient. It was quite
unobvious that the undo_marker change broken something like
this, I had a quite long session to track it down because of
the non-intuitiviness of the bug (luckily I had a trivial
reproducer at hand and I was also able to learn to use kprobes
in the process as well :-)).

This together with the NewReno+FRTO fix and FRTO in-order
workaround this fixes Damon's problems, this and the first
mentioned are enough to fix Bugzilla #10063.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NDamon L. Chesser <damon@damtek.com>
Tested-by: NDamon L. Chesser <damon@damtek.com>
Tested-by: NSebastian Hyrwall <zibbe@cisko.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1c1f281

net: Allow netdevices to specify needed head/tailroom · f5184d26

由 Johannes Berg 提交于 5月 12, 2008

This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5184d26

08 5月, 2008 2 次提交

net/ipv4: correct RFC 1122 section reference in comment · c67fa027

由 J.H.M. Dassen (Ray) 提交于 5月 08, 2008

RFC 1122 does not have a section 3.1.2.2. The requirement to silently
discard datagrams with a bad checksum is in section 3.2.1.2 instead.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611Signed-off-by: NJ.H.M. Dassen (Ray) <jdassen@debian.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c67fa027

tcp FRTO: SACK variant is errorneously used with NewReno · 62ab2227

由 Ilpo Järvinen 提交于 5月 08, 2008

Note: there's actually another bug in FRTO's SACK variant, which
is the causing failure in NewReno case because of the error
that's fixed here. I'll fix the SACK case separately (it's
a separate bug really, though related, but in order to fix that
I need to audit tp->snd_nxt usage a bit).

There were two places where SACK variant of FRTO is getting
incorrectly used even if SACK wasn't negotiated by the TCP flow.
This leads to incorrect setting of frto_highmark with NewReno
if a previous recovery was interrupted by another RTO.

An eventual fallback to conventional recovery then incorrectly
considers one or couple of segments as forward transmissions
though they weren't, which then are not LOST marked during
fallback making them "non-retransmittable" until the next RTO.
In a bad case, those segments are really lost and are the only
one left in the window. Thus TCP needs another RTO to continue.
The next FRTO, however, could again repeat the same events
making the progress of the TCP flow extremely slow.

In order for these events to occur at all, FRTO must occur
again in FRTOs step 3 while the key segments must be lost as
well, which is not too likely in practice. It seems to most
frequently with some small devices such as network printers
that *seem* to accept TCP segments only in-order. In cases
were key segments weren't lost, things get automatically
resolved because those wrongly marked segments don't need to be
retransmitted in order to continue.

I found a reproducer after digging up relevant reports (few
reports in total, none at netdev or lkml I know of), some
cases seemed to indicate middlebox issues which seems now
to be a false assumption some people had made. Bugzilla
#10063 _might_ be related. Damon L. Chesser <damon@damtek.com>
had a reproducable case and was kind enough to tcpdump it
for me. With the tcpdump log it was quite trivial to figure
out.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62ab2227

05 5月, 2008 2 次提交

ip: Use inline function dst_metric() instead of direct access to dst->metric[] · 5ffc02a1

由 Satoru SATOH 提交于 5月 04, 2008

There are functions to refer to the value of dst->metric[THE_METRIC-1]
directly without use of a inline function "dst_metric" defined in
net/dst.h.

The following patch changes them to use the inline function
consistently.
Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ffc02a1

ip: Make use of the inline function dst_metric_locked() · 0bbeafd0

由 Satoru SATOH 提交于 5月 04, 2008

Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bbeafd0

03 5月, 2008 1 次提交

net: use get/put_unaligned_* helpers · d3e2ce3b

由 Harvey Harrison 提交于 5月 02, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3e2ce3b

02 5月, 2008 2 次提交

ipv4: assign PDE->data before gluing PDE into /proc tree · 84841c3c

由 Denis V. Lunev 提交于 5月 02, 2008

The check for PDE->data != NULL becomes useless after the replacement
of proc_net_fops_create with proc_create_data.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84841c3c

netfilter: assign PDE->data before gluing PDE into /proc tree · 6e79d85d

由 Denis V. Lunev 提交于 5月 02, 2008

Simply replace proc_create and further data assigned with proc_create_data.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e79d85d

01 5月, 2008 2 次提交

rename div64_64 to div64_u64 · 6f6d6a1a

由 Roman Zippel 提交于 5月 01, 2008

Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f6d6a1a

net: fix returning void-valued expression warnings · ab59859d

由 Harvey Harrison 提交于 5月 01, 2008

drivers/net/8390.c:37:2: warning: returning void-valued expression
drivers/net/bnx2.c:1635:3: warning: returning void-valued expression
drivers/net/xen-netfront.c:1806:2: warning: returning void-valued expression
net/ipv4/tcp_hybla.c:105:3: warning: returning void-valued expression
net/ipv4/tcp_vegas.c:171:3: warning: returning void-valued expression
net/ipv4/tcp_veno.c:123:3: warning: returning void-valued expression
net/sysctl_net.c:85:2: warning: returning void-valued expression
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Acked-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab59859d

30 4月, 2008 3 次提交

tcp: Overflow bug in Vegas · 15913114

由 Lachlan Andrew 提交于 4月 30, 2008

From: Lachlan Andrew <lachlan.andrew@gmail.com>

There is an overflow bug in net/ipv4/tcp_vegas.c for large BDPs
(e.g. 400Mbit/s, 400ms).  The multiplication (old_wnd *
vegas->baseRTT) << V_PARAM_SHIFT overflows a u32.

[ Fix tcp_veno.c too, it has similar calculations. -DaveM ]
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15913114

[IPv4] UFO: prevent generation of chained skb destined to UFO device · be9164e7

由 Kostya B 提交于 4月 29, 2008

Problem: ip_append_data() could wrongly generate a chained skb for
devices which support UFO.  When sk_write_queue is not empty
(e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
of the queued skb, a new chained skb is created.

I would normally assume UFO device should get data in nr_frags and not
in frag_list.  Later the udp4_hwcsum_outgoing() resets csum to NONE
and skb_gso_segment() has oops.

Proposal:
1. Even length is less than mtu, employ ip_ufo_append_data()
and append data to the __existed__ skb in the sk_write_queue.

2. ip_ufo_append_data() is fixed due to a wrong manipulation of
peek-ing and later enqueue-ing of the same skb.  Now, enqueuing is
always performed, because on error the further
ip_flush_pending_frames() would release the queued skb.
Signed-off-by: NKostya B <bkostya@hotmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be9164e7

ipv4: annotate a few functions __init in ipconfig.c · 45e741b8

由 Sam Ravnborg 提交于 4月 29, 2008

A few functions are only used from __init context.
So annotate these with __init for consistency and silence
the following warnings:

WARNING: net/ipv4/built-in.o(.text+0x2a876): Section mismatch
         in reference from the function ic_bootp_init() to
         the variable .init.data:bootp_packet_type
WARNING: net/ipv4/built-in.o(.text+0x2a907): Section mismatch
         in reference from the function ic_bootp_cleanup() to
         the variable .init.data:bootp_packet_type

Note: The warnings only appear with CONFIG_DEBUG_SECTION_MISMATCH=y
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45e741b8

29 4月, 2008 7 次提交

Remove duplicated unlikely() in IS_ERR() · 801678c5

由 Hirofumi Nakagawa 提交于 4月 29, 2008

Some drivers have duplicated unlikely() macros.  IS_ERR() already has
unlikely() in itself.

This patch cleans up such pointless code.
Signed-off-by: NHirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

801678c5

netfilter: nf_conntrack: padding breaks conntrack hash on ARM · 443a70d5

由 Philip Craig 提交于 4月 29, 2008

commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()"
results in ARM platforms hashing uninitialised padding.  This padding
doesn't exist on other architectures.

Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure
everything is initialised.  There were only 4 bytes that
NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM).
Signed-off-by: NPhilip Craig <philipc@snapgear.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

443a70d5

ipv4: Update MTU to all related cache entries in ip_rt_frag_needed() · 0010e465

由 Timo Teras 提交于 4月 29, 2008

Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
cache entries where ifindex is specified. This is similar to what is
already done in ip_rt_redirect().
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0010e465

net: Add compat support for getsockopt (MCAST_MSFILTER) · 42908c69

由 David L Stevens 提交于 4月 29, 2008

This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.
Signed-off-by: NDavid L Stevens <dlstevens@us.ibm.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42908c69

ipvs: fix oops in backup for fwmark conn templates · 2ad17def

由 Julian Anastasov 提交于 4月 29, 2008

	Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556
where conn templates with protocol=IPPROTO_IP can oops backup box.

        Result from ip_vs_proto_get() should be checked because
protocol value can be invalid or unsupported in backup. But
for valid message we should not fail for templates which use
IPPROTO_IP. Also, add checks to validate message limits and
connection state. Show state NONE for templates using IPPROTO_IP.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ad17def

netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets · 9a732ed6

由 Arnaud Ebalard 提交于 4月 29, 2008

While reinjecting *bigger* modified versions of IPv6 packets using
libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too)
but I get the following on recents kernels (2.6.25, trace below is
against today's net-2.6 git tree):

skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0
------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT 
Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000)
Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80 
       f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60 
       c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c 
Call Trace:
 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
 [<c04cdbfc>] ? skb_put+0x3c/0x40
 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
 [<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160
 [<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160
 [<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160
 [<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0
 [<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30
 [<c04f8c73>] ? netlink_unicast+0x243/0x2b0
 [<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70
 [<c04f9406>] ? netlink_sendmsg+0x1c6/0x270
 [<c04c8244>] ? sock_sendmsg+0xc4/0xf0
 [<c011970d>] ? set_next_entity+0x1d/0x50
 [<c0133a80>] ? autoremove_wake_function+0x0/0x40
 [<c0118f9e>] ? __wake_up_common+0x3e/0x70
 [<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280
 [<c011d308>] ? __wake_up+0x68/0x70
 [<c02cea47>] ? copy_from_user+0x37/0x70
 [<c04cfd7c>] ? verify_iovec+0x2c/0x90
 [<c04c837a>] ? sys_sendmsg+0x10a/0x230
 [<c011967a>] ? __dequeue_entity+0x2a/0xa0
 [<c011970d>] ? set_next_entity+0x1d/0x50
 [<c0345397>] ? pty_write+0x47/0x60
 [<c033d59b>] ? tty_default_put_char+0x1b/0x20
 [<c011d2e9>] ? __wake_up+0x49/0x70
 [<c033df99>] ? tty_ldisc_deref+0x39/0x90
 [<c033ff20>] ? tty_write+0x1a0/0x1b0
 [<c04c93af>] ? sys_socketcall+0x7f/0x260
 [<c0102ff9>] ? sysenter_past_esp+0x6a/0x91
 [<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0
 =======================
Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0 
EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8


Looking at the code, I ended up in nfq_mangle() function (called by
nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to
the increased size of data passed to the function. AFAICT, it should ask
for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the
resulting sk_buff has not enough space to support the skb_put(skb, diff)
call a few lines later, this results in the call to skb_over_panic().

The patch below asks for allocation of a copy with enough space for
mangled packet and the same amount of headroom as old sk_buff. While
looking at how the regression appeared (e2b58a67), I noticed the same
pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects
those locations too.

Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things
are ok (2.6.25 and today's net-2.6 git tree).
Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a732ed6

tcp: Limit cwnd growth when deferring for GSO · 246eb2af

由 John Heffner 提交于 4月 29, 2008

This fixes inappropriately large cwnd growth on sender-limited flows
when GSO is enabled, limiting cwnd growth to 64k.
Signed-off-by: NJohn Heffner <johnwheffner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

246eb2af

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年