提交 · f40c8174d3c21bf178283f3ef3aa8c7bf238fdec · openeuler / Kernel

21 3月, 2008 6 次提交

[NETNS][IPV4] tcp - make proc handle the network namespaces · f40c8174

由 Daniel Lezcano 提交于 3月 21, 2008

This patch, like udp proc, makes the proc functions to take care of
which namespace the socket belongs.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f40c8174

[NETNS][IPV6] tcp - assign the netns for timewait sockets · 8d9f1744

由 Daniel Lezcano 提交于 3月 21, 2008

Copy the network namespace from the socket to the timewait socket.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d9f1744

[NETNS][IPV6] udp - make proc handle the network namespace · a91275ef

由 Daniel Lezcano 提交于 3月 21, 2008

This patch makes the common udp proc functions to take care of which
socket they should show taking into account the namespace it belongs.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a91275ef

[NET]: Add per-connection option to set max TSO frame size · 82cc1a7a

由 Peter P Waskiewicz Jr 提交于 3月 21, 2008

Update: My mailer ate one of Jarek's feedback mails...  Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
whitespace issue due to a patch import botch.  Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area.  Also brought the patch up to the latest net-2.6.26 tree.

Update: Made gso_max_size container 32 bits, not 16.  Moved the
location of gso_max_size within netdev to be less hotpath.  Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.

Update: Respun for net-2.6.26 tree.

Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!

This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection.  By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value.  This will propogate into the TCP
layer, and send TSO's of that size to the hardware.

This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.

This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82cc1a7a

[TCP]: Fix shrinking windows with window scaling · 607bfbf2

由 Patrick McHardy 提交于 3月 20, 2008

When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.


The dump below shows the problem (scaling factor 2^7):

- Window size of 557 (71296) is advertised, up to 3111907257:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>

- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
  below the last end:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>

The number 40 results from downscaling the remaining window:

3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40

If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.

If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.

Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

607bfbf2

[NETFILTER]: ipt_recent: sanity check hit count · d0ebf133

由 Daniel Hokka Zakrisson 提交于 3月 20, 2008

If a rule using ipt_recent is created with a hit count greater than
ip_pkt_list_tot, the rule will never match as it cannot keep track
of enough timestamps. This patch makes ipt_recent refuse to create such
rules.

With ip_pkt_list_tot's default value of 20, the following can be used
to reproduce the problem.

nc -u -l 0.0.0.0 1234 &
for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done

This limits it to 20 packets:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
         --rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
         60 --hitcount 20 --name test --rsource -j DROP

While this is unlimited:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
         --rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
         60 --hitcount 21 --name test --rsource -j DROP

With the patch the second rule-set will throw an EINVAL.
Reported-by: NSean Kennedy <skennedy@vcn.com>
Signed-off-by: NDaniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0ebf133

18 3月, 2008 3 次提交

R
[NET]: Add debugging names to __RW_LOCK_UNLOCKED macros. · 938b93ad
由 Robert P. J. Day 提交于 3月 18, 2008
```
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
938b93ad

[IPV4]: esp_output() misannotations · 5e226e4d

由 Al Viro 提交于 3月 17, 2008

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e226e4d

[NET] endianness noise: INADDR_ANY · e6f1cebf

由 Al Viro 提交于 3月 17, 2008

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f1cebf

12 3月, 2008 1 次提交

[TCP]: Prevent sending past receiver window with TSO (at last skb) · 5ea3a748

由 Ilpo Järvinen 提交于 3月 11, 2008

With TSO it was possible to send past the receiver window when the skb
to be sent was the last in the write queue while the receiver window
is the limiting factor. One can notice that there's a loophole in the
tcp_mss_split_point that lacked a receiver window check for the
tcp_write_queue_tail() if also cwnd was smaller than the full skb.

Noticed by Thomas Gleixner <tglx@linutronix.de> in form of "Treason
uncloaked! Peer ... shrinks window .... Repaired." messages (the peer
didn't actually shrink its window as the message suggests, we had just
sent something past it without a permission to do so).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ea3a748

07 3月, 2008 1 次提交

[UDP]: Revert udplite and code split. · db8dac20

由 David S. Miller 提交于 3月 06, 2008

This reverts commit db1ed684 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8f ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db ("[UDP]: Allow users to
configure UDP-Lite.").

First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.

We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible.  All of that work is less valuable if we're
just going to slap a config option on udplite support.

It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well.  In fact, this is the
second build failure resulting from the udplite change.

Finally, the config options provided was a bool, instead of a modular
option.  Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db8dac20

06 3月, 2008 2 次提交

net: replace remaining __FUNCTION__ occurrences · 0dc47877

由 Harvey Harrison 提交于 3月 05, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dc47877

[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts · ee6b9673

由 Eric Dumazet 提交于 3月 05, 2008

(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee6b9673

05 3月, 2008 3 次提交

[IPCONFIG]: The kernel gets no IP from some DHCP servers · dea75bdf

由 Stephen Hemminger 提交于 3月 04, 2008

From: Stephen Hemminger <shemminger@linux-foundation.org>

Based upon a patch by Marcel Wappler:
 
   This patch fixes a DHCP issue of the kernel: some DHCP servers
   (i.e.  in the Linksys WRT54Gv5) are very strict about the contents
   of the DHCPDISCOVER packet they receive from clients.
 
   Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and
   'siaddr' MUST be set to '0'.  These DHCP servers ignore Linux
   kernel's DHCP discovery packets with these two fields set to
   '255.255.255.255' (in contrast to popular DHCP clients, such as
   'dhclient' or 'udhcpc').  This leads to a not booting system.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dea75bdf

[ESP]: Add select on AUTHENC · ed58dd41

由 Herbert Xu 提交于 3月 04, 2008

Now the ESP uses the AEAD interface even for algorithms which are
not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
otherwise only combined mode algorithms will work.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed58dd41

[TCP]: TCP cubic v2.2 · 6b3d6263

由 Sangtae Ha 提交于 3月 04, 2008

We have updated CUBIC to fix some issues with slow increase in large
BDP networks. We also improved its convergence speed. The fix is in
fact very simple -- the window increase limit of smax during the
window probing phase (i.e., convex growth phase) is removed. We found
that this does not affect TCP friendliness, but only improves its
scalability. We have run some tests in our lab and also over the
Internet path from NCSU to Japan. These results can be seen from the
following page:

http://netsrv.csc.ncsu.edu/wiki/index.php/Intra_protocol_fairness_testing_with_linux-2.6.23.9
http://netsrv.csc.ncsu.edu/wiki/index.php/RTT_fairness_testing_with_linux-2.6.23.9
http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_friendliness_testing_with_linux-2.6.23.9Signed-off-by: NSangtae Ha <sha2@ncsu.edu>
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b3d6263

04 3月, 2008 8 次提交

[IPV4] UDP: Move IPv4-specific bits to other file. · 8be8af8f

由 YOSHIFUJI Hideaki 提交于 3月 04, 2008

Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c.
Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

8be8af8f

[UDP]: Allow users to configure UDP-Lite. · e898d4db

由 YOSHIFUJI Hideaki 提交于 3月 01, 2008

Let's give users an option for disabling UDP-Lite (~4K).

old:
|    text	   data	    bss	    dec	    hex	filename
|  286498	  12432	   6072	 305002	  4a76a	net/ipv4/built-in.o
|  193830	   8192	   3204	 205226	  321aa	net/ipv6/ipv6.o

new (without UDP-Lite):
|    text	   data	    bss	    dec	    hex	filename
|  284086	  12136	   5432	 301654	  49a56	net/ipv4/built-in.o
|  191835	   7832	   3076	 202743	  317f7	net/ipv6/ipv6.o
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

e898d4db

[TCP]: Add IPv6 support to TCP SYN cookies · c6aefafb

由 Glenn Griffin 提交于 2月 07, 2008

Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack.  Just a two line change, but will
resend in it's entirety.
Signed-off-by: NGlenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

c6aefafb

[TCP]: lower stack usage in cookie_hash() function · 11baab7a

由 Eric Dumazet 提交于 2月 07, 2008

400 bytes allocated on stack might be a litle bit too much. Using a
per_cpu var is more friendly.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

11baab7a

[ARP]: Introduce the arp_hdr_len helper. · 988b7050

由 Pavel Emelyanov 提交于 3月 03, 2008

There are some place, that calculate the ARP header length. These
calculations are correct, but 
 a) some operate with "magic" constants,
 b) enlarge the code length (sometimes at the cost of coding style),
 c) are not informative from the first glance.

The proposal is to introduce a helper, that includes all the good
sides of these calculations.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988b7050

[TCP]: Must count fack_count also when skipping · d152a7d8

由 Ilpo Järvinen 提交于 3月 03, 2008

It makes fackets_out to grow too slowly compared with the
real write queue.

This shouldn't cause those BUG_TRAP(packets <= tp->packets_out)
to trigger but how knows how such inconsistent fackets_out
affects here and there around TCP when everything is nowadays
assuming accurate fackets_out. So lets see if this silences
them all.

Reported by Guillaume Chazarain <guichaz@gmail.com>.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d152a7d8

[TCP]: Merge exit paths in tcp_v4_conn_request. · 7cd04fa7

由 Denis V. Lunev 提交于 3月 03, 2008

Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cd04fa7

[IPV4]: skb->dst can't be NULL in ip_options_echo. · da7ef338

由 Denis V. Lunev 提交于 3月 03, 2008

ip_options_echo is called on the packet input path after the initial
routing. The dst entry on the packet is cleared only in the several
very specific places and immidiately assigned back (may be new).
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da7ef338

01 3月, 2008 11 次提交

[ICMP]: Section conflict between icmp_sk_init/icmp_sk_exit. · 1d1c8d13

由 Denis V. Lunev 提交于 2月 29, 2008

Functions from __exit section should not be called from ones in __init
section. Fix this conflict.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d1c8d13

[INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack. · fd80eb94

由 Denis V. Lunev 提交于 2月 29, 2008

It looks like dst parameter is used in this API due to historical
reasons.  Actually, it is really used in the direct call to
tcp_v4_send_synack only.  So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd80eb94

[NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate. · 665bba10

由 Pavel Emelyanov 提交于 2月 29, 2008

Some netfilter code and rxrpc one use seq_open() to open
a proc file, but seq_release_private to release one.

This is harmless, but ambiguous.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

665bba10

[NETNS]: Make icmp_sk per namespace. · 4a6ad7a1

由 Denis V. Lunev 提交于 2月 29, 2008

All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmp_sk macro with
proper inline call.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a6ad7a1

[NETNS]: icmp(v6)_sk should not pin a namespace. · 5c8cafd6

由 Denis V. Lunev 提交于 2月 29, 2008

So, change icmp(v6)_sk creation/disposal to the scheme used in the
netlink for rtnl, i.e. create a socket in the context of the init_net
and assign the namespace without getting a referrence later.

Also use sk_release_kernel instead of sock_release to properly destroy
such sockets.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c8cafd6

[ICMP]: Allocate data for __icmp(v6)_sk dynamically. · 79c91159

由 Denis V. Lunev 提交于 2月 29, 2008

Own __icmp(v6)_sk should be present in each namespace. So, it should be
allocated dynamically. Though, alloc_percpu does not fit the case as it
implies additional dereferrence for no bonus.

Allocate data for pointers just like __percpu_alloc_mask does and place
pointers to struct sock into this array.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79c91159

[ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock. · 405666db

由 Denis V. Lunev 提交于 2月 29, 2008

We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket
is get from global variable now. When this code became namespaces, one
should pass a namespace and get socket from it.

Though, above is useless. Socket is available in the caller, just pass
it inside. This saves a bit of code now and saves more later.

add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168)
function                                     old     new   delta
icmp_rcv                                     718     719      +1
icmpv6_rcv                                  2343    2303     -40
icmp_send                                   1566    1518     -48
icmp_reply                                   549     468     -81
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

405666db

[ICMP]: Store sock rather than socket for ICMP flow control. · b7e729c4

由 Denis V. Lunev 提交于 2月 29, 2008

Basically, there is no difference, what to store: socket or sock. Though,
sock looks better as there will be 1 less dereferrence on the fast path.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7e729c4

[ICMP]: Optimize icmp_socket usage. · 1e3cf683

由 Denis V. Lunev 提交于 2月 29, 2008

Use this macro only once in a function to save a bit of space.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98)
function                                     old     new   delta
icmp_reply                                   562     561      -1
icmp_push_reply                              305     258     -47
icmp_init                                    273     223     -50
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e3cf683

[ICMP]: Add return code to icmp_init. · a5710d65

由 Denis V. Lunev 提交于 2月 29, 2008

icmp_init could fail and this is normal for namespace other than initial.
So, the panic should be triggered only on init_net initialization path.

Additionally create rollback path for icmp_init as a separate function.
It will also be used later during namespace destruction.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5710d65

[INET]: Remove struct net_proto_family* from _init calls. · 9b0f976f

由 Denis V. Lunev 提交于 2月 29, 2008

struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b0f976f

29 2月, 2008 5 次提交

[TCP]: BIC web page link is corrected. · 0bc8c7bf

由 Sangtae Ha 提交于 2月 28, 2008

Signed-off-by: NSangtae Ha <sha2@ncsu.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bc8c7bf

[NETNS]: Process inet_select_addr inside a namespace. · c4544c72

由 Denis V. Lunev 提交于 2月 28, 2008

The context is available from a network device passed in.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4544c72

D
[NETNS]: Enable IPv4 address manipulations inside namespace. · 3776c889
由 Denis V. Lunev 提交于 2月 28, 2008
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
3776c889
D
[NETNS]: Enable all routing manipulation via netlink inside namespace. · 1937504d
由 Denis V. Lunev 提交于 2月 28, 2008
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
1937504d

[NETNS]: Process devinet ioctl in the correct namespace. · e5b13cb1

由 Denis V. Lunev 提交于 2月 28, 2008

Add namespace parameter to devinet_ioctl and locate device inside it for
state changes.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5b13cb1

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功