提交 · 1f8ae0a21d83f43006d7f6d2862e921dbf2eeddd · openanolis / cloud-kernel

14 3月, 2009 1 次提交

Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying... · ead2ceb0

由 Neil Horman 提交于 3月 11, 2009

Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ead2ceb0

12 3月, 2009 1 次提交

tcp: allow timestamps even if SYN packet has tsval=0 · fc1ad92d

由 Eric Dumazet 提交于 3月 11, 2009

Some systems send SYN packets with apparently wrong RFC1323 timestamp
option values [timestamp tsval=0 tsecr=0].
It might be for security reasons (http://www.secuobs.com/plugs/25220.shtml )

Linux TCP stack ignores this option and sends back a SYN+ACK packet
without timestamp option, thus many TCP flows cannot use timestamps
and lose some benefit of RFC1323.

Other operating systems seem to not care about initial tsval value, and let
tcp flows to negotiate timestamp option.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc1ad92d

10 3月, 2009 1 次提交

net: convert usage of packet_type to read_mostly · 7546dd97

由 Stephen Hemminger 提交于 3月 09, 2009

Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7546dd97

03 3月, 2009 3 次提交

tcp: Like icmp use register_pernet_subsys · 2f20d2e6

由 Eric W. Biederman 提交于 2月 22, 2009

To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f20d2e6

netns: Fix icmp shutdown. · 6eb07772

由 Eric W. Biederman 提交于 2月 22, 2009

Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6eb07772

tcp: tcp_init_wl / tcp_update_wl argument cleanup · ee7537b6

由 Hantzis Fotis 提交于 3月 02, 2009

The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.
Signed-off-by: NHantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee7537b6

02 3月, 2009 14 次提交

tcp: get rid of two unnecessary u16s in TCP skb flags copying · 9ce01461

由 Ilpo Järvinen 提交于 2月 28, 2009

I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce01461

tcp: in sendmsg/pages open code the real goto target · 0d6a775e

由 Ilpo Järvinen 提交于 2月 28, 2009

copied was assigned zero right before the goto, so if (copied)
cannot ever be true.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d6a775e

tcp: kill eff_sacks "cache", the sole user can calculate itself · cabeccbd

由 Ilpo Järvinen 提交于 2月 28, 2009

Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cabeccbd

tcp: add helper for AI algorithm · 758ce5c8

由 Ilpo Järvinen 提交于 2月 28, 2009

It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

758ce5c8

htcp: merge icsk_ca_state compare · 571a5dd8

由 Ilpo Järvinen 提交于 2月 28, 2009

Similar to what is done elsewhere in TCP code when double
state checks are being done.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

571a5dd8

tcp: drop unnecessary local var in collapse · e6c7d085

由 Ilpo Järvinen 提交于 2月 28, 2009

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6c7d085

tcp: cleanup ca_state mess in tcp_timer · bc079e9e

由 Ilpo Järvinen 提交于 2月 28, 2009

Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc079e9e

tcp: separate timeout marking loop to it's own function · 7363a5b2

由 Ilpo Järvinen 提交于 2月 28, 2009

Some comment about its current state added. So far I have
seen very few cases where the thing is actually useful,
usually just marginally (though admittedly I don't usually
see top of window losses where it seems possible that there
could be some gain), instead, more often the cases suffer
from L-marking spike which is certainly not desirable
(I'll bury improving it to my todo list, but on a low
prio position).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7363a5b2

tcp: remove redundant code from tcp_mark_lost_retrans · d0af4160

由 Ilpo Järvinen 提交于 2月 28, 2009

Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was
puzzled by the fact that !tcp_is_fack(tp) leads to early return
near the beginning and the later on tcp_is_fack(tp) was still
used in an if condition. The later check was a left-over from
RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as
there wasn't clear way how to handle this particular check
cheaply in the spirit of RFC3517 (using only SACK blocks, not
holes + SACK blocks as with FACK). I sort of left it there as
a reminder but since it's confusing other people just remove
it and comment the missing-feature stuff instead.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0af4160

tcp: fix corner case issue in segmentation during rexmitting · 02276f3c

由 Ilpo Järvinen 提交于 2月 28, 2009

If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02276f3c

tcp: Don't clear hints when tcp_fragmenting · d3d2ae45

由 Ilpo Järvinen 提交于 2月 28, 2009

1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3d2ae45

tcp: deferring in middle of queue makes very little sense · 62ad2761

由 Ilpo Järvinen 提交于 2月 28, 2009

If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62ad2761

tcp: fix lost_cnt_hint miscounts · 59a08cba

由 Ilpo Järvinen 提交于 2月 28, 2009

It is possible that lost_cnt_hint gets underflow in
tcp_clean_rtx_queue because the cumulative ACK can cover
the segment where lost_skb_hint points to only partially,
which means that the hint is not cleared, opposite to what
my (earlier) comment claimed.

Also I don't agree what I ended up writing about non-trivial
case there to be what I intented to say. It was not supposed
to happen that the hint won't get cleared and we underflow
in any scenario.

In general, this is quite hard to trigger in practice.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59a08cba

tcp: don't backtrack to sacked skbs · ac11ba75

由 Ilpo Järvinen 提交于 2月 28, 2009

Backtracking to sacked skbs is a horrible performance killer
since the hint cannot be advanced successfully past them...
...And it's totally unnecessary too.

In theory this is 2.6.27..28 regression but I doubt anybody
can make .28 to have worse performance because of other TCP
improvements.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac11ba75

01 3月, 2009 1 次提交

tcp: fix retrans_out leaks · 9ec06ff5

由 Ilpo Järvinen 提交于 3月 01, 2009

There's conflicting assumptions in shifting, the caller assumes
that dupsack results in S'ed skbs (or a part of it) for sure but
never gave a hint to tcp_sacktag_one when dsack is actually in
use. Thus DSACK retrans_out -= pcount was not taken and the
counter became out of sync. Remove obstacle from that information
flow to get DSACKs accounted in tcp_sacktag_one as expected.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ec06ff5

27 2月, 2009 1 次提交

inet fragments: fix sparse warning: context imbalance · 56bca31f

由 Hannes Eder 提交于 2月 25, 2009

Impact: Attribute function with __releases(...)

Fix this sparse warning:
  net/ipv4/inet_fragment.c:276:35: warning: context imbalance in 'inet_frag_find' - unexpected unlock
Signed-off-by: NHannes Eder <hannes@hanneseder.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56bca31f

25 2月, 2009 4 次提交

ipip: used time_before for comparing jiffies · 26d94b46

由 Wei Yongjun 提交于 2月 24, 2009

The functions time_before is more robust for comparing
jiffies against other values.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26d94b46

gre: used time_before for comparing jiffies · da6185d8

由 Wei Yongjun 提交于 2月 24, 2009

The functions time_before is more robust for comparing
jiffies against other values.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da6185d8

netlink: change nlmsg_notify() return value logic · 1ce85fe4

由 Pablo Neira Ayuso 提交于 2月 24, 2009

This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ce85fe4

tcp_scalable: Update malformed & dead url · a52b8bd3

由 Joe Perches 提交于 2月 24, 2009

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a52b8bd3

24 2月, 2009 1 次提交

Doc: Refer to ip-sysctl.txt for strict vs. loose rp_filter mode · d18921a0

由 Jesper Dangaard Brouer 提交于 2月 23, 2009

The IP_ADVANCED_ROUTER Kconfig describes the rp_filter
proc option.  Recent changes added a loose mode.
Instead of documenting this change too places, refer to
the document describing it:
 Documentation/networking/ip-sysctl.txt

I'm considering moving the rp_filter description away
from the Kconfig file into ip-sysctl.txt.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d18921a0

23 2月, 2009 7 次提交

tcp: Like icmp use register_pernet_subsys · 6a1b3054

由 Eric W. Biederman 提交于 2月 22, 2009

To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a1b3054

netns: Fix icmp shutdown. · 959d2726

由 Eric W. Biederman 提交于 2月 22, 2009

Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

959d2726

ipv4: Clean whitespaces in net/ipv4/Kconfig. · a6e8f27f

由 Jesper Dangaard Brouer 提交于 2月 22, 2009

While going through net/ipv4/Kconfig cleanup whitespaces.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6e8f27f

ipv4: Fix rp_filter description in net/ipv4/Kconfig. · b2cc46a8

由 Jesper Dangaard Brouer 提交于 2月 22, 2009

The reverse path filter (rp_filter) will NOT get enabled
when enabling forwarding.  Read the code and tested in
in practice.

Most distributions do enable it in startup scripts.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2cc46a8

ip: ipip compile warning · 5747a1aa

由 Stephen Hemminger 提交于 2月 22, 2009

Get rid of compile warning about non-const format
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5747a1aa

ip: add loose reverse path filtering · c1cf8422

由 Stephen Hemminger 提交于 2月 20, 2009

Extend existing reverse path filter option to allow strict or loose
filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering).

For compatibility with existing usage, the value 1 is chosen for strict mode
and 2 for loose mode.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1cf8422

cipso: Fix documentation comment · 586c2500

由 Paul Moore 提交于 2月 20, 2009

The CIPSO protocol engine incorrectly stated that the FIPS-188 specification
could be found in the kernel's Documentation directory. This patch corrects
that by removing the comment and directing users to the FIPS-188 documented
hosted online. For the sake of completeness I've also included a link to the
CIPSO draft specification on the NetLabel website.

Thanks to Randy Dunlap for spotting the error and letting me know.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

586c2500

22 2月, 2009 1 次提交

tcp: Always set urgent pointer if it's beyond snd_nxt · 7691367d

由 Herbert Xu 提交于 2月 21, 2009

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Thanks to Ilpo Järvinen for fixing a critical bug in this and
Jeff Chua for reporting that bug.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7691367d

19 2月, 2009 1 次提交

tcp: remove obsoleted comment about different passes · 5209921c

由 Ilpo Järvinen 提交于 2月 18, 2009

This is obsolete since the passes got combined.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5209921c

16 2月, 2009 2 次提交

net: replace commatas with semicolons · 1c10c49d

由 Thomas Gleixner 提交于 2月 16, 2009

Impact: syntax fix

Interestingly enough this compiles w/o any complaints:

	orphans = percpu_counter_sum_positive(&tcp_orphan_count),
	sockets = percpu_counter_sum_positive(&tcp_sockets_allocated),
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c10c49d

ip: support for TX timestamps on UDP and RAW sockets · 51f31cab

由 Patrick Ohly 提交于 2月 12, 2009

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f31cab

09 2月, 2009 2 次提交

gro: Optimise TCP packet reception · aa6320d3

由 Herbert Xu 提交于 2月 08, 2009

gro: Optimise TCP packet reception

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch uses bit ops to logical ops, as well as open coding
memcmp to exploit alignment properties.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa6320d3

gro: Optimise IPv4 packet reception · a5ad24be

由 Herbert Xu 提交于 2月 08, 2009

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch does some obvious changes to use 2-byte and 4-byte
operations instead of byte-oriented ones where possible.  Bit
ops are also used to replace logical ops to reduce branching.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5ad24be

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功