提交 · f348d70a324e15afc701a494f32ec468abb7d1eb · openeuler / Kernel

26 3月, 2006 1 次提交

[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a

由 Davide Libenzi 提交于 3月 25, 2006

Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f348d70a

21 3月, 2006 3 次提交

[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset · 543d9cfe

由 Arnaldo Carvalho de Melo 提交于 3月 20, 2006

No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
to just after the function exported, etc.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

543d9cfe

A
[ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt · dec73ff0
由 Arnaldo Carvalho de Melo 提交于 3月 20, 2006
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
dec73ff0

[NET]: {get|set}sockopt compatibility layer · 3fdadf7d

由 Dmitry Mishin 提交于 3月 20, 2006

This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: NDmitry Mishin <dim@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fdadf7d

04 1月, 2006 2 次提交

[IP_SOCKGLUE]: Remove most of the tcp specific calls · d83d8461

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d83d8461

[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops · 8292a17a

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8292a17a

30 11月, 2005 2 次提交

[NET]: Add const markers to various variables. · 9b5b5cff

由 Arjan van de Ven 提交于 11月 29, 2005

the patch below marks various variables const in net/; the goal is to
move them to the .rodata section so that they can't false-share
cachelines with things that get written to, as well as potentially
helping gcc a bit with optimisations.  (these were found using a gcc
patch to warn about such variables)
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b5b5cff

[IPV4] tcp/route: Another look at hash table sizes · 18955cfc

由 Mike Stroyan 提交于 11月 29, 2005

  The tcp_ehash hash table gets too big on systems with really big memory.
It is worse on systems with pages larger than 4KB.  It wastes memory that
could be better used.  It also makes the netstat command slow because reading
/proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.

  The default value should not be larger for larger page sizes.  It seems
that the effect of page size is an unintended error dating back a long
time.  I also wonder if the default value really should be a larger
fraction of memory for systems with more memory.  While systems with
really big ram can afford more space for hash tables, it is not clear to
me that they benefit from increasing the allocation ratio for this table.

  The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
mm/page_alloc.c:alloc_large_system_hash.

tcp_init calls alloc_large_system_hash passing parameters-
    bucketsize=sizeof(struct tcp_ehash_bucket)
    numentries=thash_entries
    scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
    limit=0

On i386, PAGE_SHIFT is 12 for a page size of 4K
On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K

The num_physpages test above makes the allocation take a larger fraction
of the total memory on systems with larger memory.  The threshold size
for a i386 system is 512MB.  For an ia64 system with 16KB pages the
threshold is 2GB.

For smaller memory systems-
On i386, scale = (27 - 12) = 15
On ia64, scale = (27 - 14) = 13
For larger memory systems-
On i386, scale = (25 - 12) = 13
On ia64, scale = (25 - 14) = 11

  For the rest of this discussion, I'll just track the larger memory case.

  The default behavior has numentries=thash_entries=0, so the allocated
size is determined by either scale or by the default limit of 1/16 of
total memory.

In alloc_large_system_hash-
|	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
|	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
|	numentries >>= 20 - PAGE_SHIFT;
|	numentries <<= 20 - PAGE_SHIFT;

  At this point, numentries is pages for all of memory, rounded up to the
nearest megabyte boundary.

|	/* limit to 1 bucket per 2^scale bytes of low memory */
|	if (scale > PAGE_SHIFT)
|		numentries >>= (scale - PAGE_SHIFT);
|	else
|		numentries <<= (PAGE_SHIFT - scale);

On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
bytes of total memory.
On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
bytes of total memory.

|        log2qty = long_log2(numentries);
|
|        do {
|                size = bucketsize << log2qty;

bucketsize is 16, so size is 16 times numentries, rounded
down to a power of two.

On i386, size is 1/512 of bytes of total memory.
On ia64, size is 1/128 of bytes of total memory.

For smaller systems the results are
On i386, size is 1/2048 of bytes of total memory.
On ia64, size is 1/512 of bytes of total memory.

  The large page effect can be removed by just replacing
the use of PAGE_SHIFT with a constant of 12 in the calls to
alloc_large_system_hash.  That makes them more like the other uses of
that function from fs/inode.c and fs/dcache.c
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18955cfc

11 11月, 2005 2 次提交

[TCP]: spelling fixes · caa20d9a

由 Stephen Hemminger 提交于 11月 10, 2005

Minor spelling fixes for TCP code.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa20d9a

[TCP]: Appropriate Byte Count support · 9772efb9

由 Stephen Hemminger 提交于 11月 10, 2005

This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.

The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9772efb9

06 11月, 2005 1 次提交

[TCP/DCCP]: Randomize port selection · 6df71634

由 Stephen Hemminger 提交于 11月 03, 2005

This patch randomizes the port selected on bind() for connections
to help with possible security attacks. It should also be faster
in most cases because there is no need for a global lock.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

6df71634

06 9月, 2005 1 次提交

[TCP]: Fix TCP_OFF() bug check introduced by previous change. · fb5f5e6e

由 Herbert Xu 提交于 9月 05, 2005

The TCP_OFF assignment at the bottom of that if block can indeed set
TCP_OFF without setting TCP_PAGE.  Since there is not much to be
gained from avoiding this situation, we might as well just zap the
offset.  The following patch should fix it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb5f5e6e

02 9月, 2005 2 次提交

[TCP]: Fix sk_forward_alloc underflow in tcp_sendmsg · ef015786

由 Herbert Xu 提交于 9月 01, 2005

I've finally found a potential cause of the sk_forward_alloc underflows
that people have been reporting sporadically.

When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
check sk_forward_alloc even though a large amount of time may have
elapsed since we allocated the page.  In the mean time someone could've
come along and liberated packets and reclaimed sk_forward_alloc memory.

This patch makes tcp_sendmsg check sk_forward_alloc every time as we
do in do_tcp_sendpages.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef015786

[NET]: Add sk_stream_wmem_schedule · d80d99d6

由 Herbert Xu 提交于 9月 01, 2005

This patch introduces sk_stream_wmem_schedule as a short-hand for
the sk_forward_alloc checking on egress.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d80d99d6

30 8月, 2005 17 次提交

[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers · ba89966c

由 Eric Dumazet 提交于 8月 26, 2005

This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.

On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba89966c

[ICSK]: Generalise tcp_listen_poll · dc40c7bc

由 Arnaldo Carvalho de Melo 提交于 8月 23, 2005

Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc40c7bc

[ICSK]: Move TCP congestion avoidance members to icsk · 6687e988

由 Arnaldo Carvalho de Melo 提交于 8月 10, 2005

This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
minimal renaming/moving done in this changeset to ease review.

Most of it is just changes of struct tcp_sock * to struct sock * parameters.

With this we move to a state closer to two interesting goals:

1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
for any INET transport protocol that has struct inet_hashinfo and are
derived from struct inet_connection_sock. Keeps the userspace API, that will
just not display DCCP sockets, while newer versions of tools can support
DCCP.

2. INET generic transport pluggable Congestion Avoidance infrastructure, using
the current TCP CA infrastructure with DCCP.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6687e988

[TIMEWAIT]: Introduce inet_timewait_death_row · 295ff7ed

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

That groups all of the tables and variables associated to the TCP timewait
schedulling/recycling/killing code, that now can be isolated from the TCP
specific code and used by other transport protocols, such as DCCP.

Next changeset will move this code to net/ipv4/inet_timewait_sock.c
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

295ff7ed

[ICSK]: Move generalised functions from tcp to inet_connection_sock · a019d6fe

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This also improves reqsk_queue_prune and renames it to
inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock
and inet_request_sock objects, not just with request_sock ones thus
belonging to inet_request_sock.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a019d6fe

[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer · 295f7324

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

With this we're very close to getting all of the current TCP
refactorings in my dccp-2.6 tree merged, next changeset will export
some functions needed by the current DCCP code and then dccp-2.6.git
will be born!
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

295f7324

[ICSK]: Generalise tcp_listen_{start,stop} · 0a5578cf

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a5578cf

[NET]: Just move the inet_connection_sock function from tcp sources · 3f421baa

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

Completing the previous changeset, this also generalises tcp_v4_synq_add,
renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the
DCCP tree, which I plan to merge RSN.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f421baa

[NET]: Introduce inet_connection_sock · 463c84b9

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.

The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

463c84b9

[INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets · 8feaf0c0

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This paves the way to generalise the rest of the sock ID lookup
routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
kernels (where IPv6 is always built as a module):

[root@qemu ~]# grep tw_sock /proc/slabinfo
tw_sock_TCPv6  0  0  128  31  1
tw_sock_TCP    0  0   96  41  1
[root@qemu ~]#

Now if a protocol wants to use the TIME_WAIT generic infrastructure it
only has to set the sk_prot->twsk_obj_size field with the size of its
inet_timewait_sock derived sock and proto_register will create
sk_prot->twsk_slab, for now its only for INET sockets, but we can
introduce timewait_sock later if some non INET transport protocolo
wants to use this stuff.

Next changesets will take advantage of this new infrastructure to
generalise even more TCP code.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 188646   11764    5068  205478   322a6 net/ipv4/built-in.o
/tmp/after.size:  188144   11764    5068  204976   320b0 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
(qemu host)).
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8feaf0c0

[INET]: Move tcp_port_rover to inet_hashinfo · 6e04e021

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

Also expose all of the tcp_hashinfo members, i.e. killing those
tcp_ehash, etc macros, this will more clearly expose already generic
functions and some that need just a bit of work to become generic, as
we'll see in the upcoming changesets.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e04e021

[INET]: Generalise tcp_bind_hash & tcp_inherit_port · 2d8c4ce5

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This required moving tcp_bucket_cachep to inet_hashinfo.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8c4ce5

[INET]: Move bind_hash from tcp_sk to inet_sk · a55ebcc4

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This should really be in a inet_connection_sock, but I'm leaving it
for a later optimization, when some more fields common to INET
transport protocols now in tcp_sk or inet_sk will be chunked out into
inet_connection_sock, for now its better to concentrate on getting the
changes in the core merged to leave the DCCP tree with only DCCP
specific code.

Next changesets will take advantage of this move to generalise things
like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
the future, when tcp_tw_bucket gets transformed into the struct
timewait_sock hierarchy.

tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
moved to sk_prot.

A cascade of incremental changes will ultimately make the tcp_lookup
functions be fully generic.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a55ebcc4

[INET]: Just rename the TCP hashtable functions/structs to inet_ · 0f7ff927

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

This is to break down the complexity of the series of patches,
making it very clear that this one just does:

1. renames tcp_ prefixed hashtable functions and data structures that
   were already mostly generic to inet_ to share it with DCCP and
   other INET transport protocols.

2. Removes not used functions (__tb_head & tb_head)

3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
   tcp_v4_build_header)

Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
__tcp_put_port,  generic and get others like tcp_destroy_sock closer to generic
(tcp_orphan_count will go to sk->sk_prot to allow this).

Eventually most of these functions will be used passing the transport protocol
inet_hashinfo structure.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f7ff927

[NET]: Cleanup INET_REFCNT_DEBUG code · e6848976

由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005

Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6848976

A
[REQSK]: Move the syn_table destroy from tcp_listen_stop to reqsk_queue_destroy · 83e3609e
由 Arnaldo Carvalho de Melo 提交于 8月 09, 2005
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
83e3609e

[NET]: Kill skb->list · 8728b834

由 David S. Miller 提交于 8月 09, 2005

Remove the "list" member of struct sk_buff, as it is entirely
redundant.  All SKB list removal callers know which list the
SKB is on, so storing this in sk_buff does nothing other than
taking up some space.

Two tricky bits were SCTP, which I took care of, and two ATM
drivers which Francois Romieu <romieu@fr.zoreil.com> fixed
up.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>

8728b834

24 8月, 2005 1 次提交

[TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail(). · 89ebd197

由 David S. Miller 提交于 8月 23, 2005

Intention of this bit is to force pushing of the existing
send queue when TCP_CORK or TCP_NODELAY state changes via
setsockopt().

But it's easy to create a situation where the bit never
clears.  For example, if the send queue starts empty:

1) set TCP_NODELAY
2) clear TCP_NODELAY
3) set TCP_CORK
4) do small write()

The current code will leave TCP_NAGLE_PUSH set after that
sequence.  Unconditionally clearing the bit when new data
is added via skb_entail() solves the problem.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89ebd197

09 7月, 2005 1 次提交

[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() · b03efcfb

由 David S. Miller 提交于 7月 08, 2005

This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b03efcfb

06 7月, 2005 3 次提交

[TCP]: Move to new TSO segmenting scheme. · c1b4a7e6

由 David S. Miller 提交于 7月 05, 2005

Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1b4a7e6

[TCP]: Fix send-side cpu utiliziation regression. · b4e26f5e

由 David S. Miller 提交于 7月 05, 2005

Only put user data purely to pages when doing TSO.

The extra page allocations cause two problems:

1) Add the overhead of the page allocations themselves.
2) Make us do small user copies when we get to the end
   of the TCP socket cache page.

It is still beneficial to purely use pages for TSO,
so we will do it for that case.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4e26f5e

[TCP]: Simplify SKB data portion allocation with NETIF_F_SG. · c65f7f00

由 David S. Miller 提交于 7月 05, 2005

The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.

This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.

So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom().  This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.

Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c65f7f00

24 6月, 2005 2 次提交

[TCP]: Allow choosing TCP congestion control via sockopt. · 5f8ef48d

由 Stephen Hemminger 提交于 6月 23, 2005

Allow using setsockopt to set TCP congestion control to use on a per
socket basis.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f8ef48d

[TCP]: Add pluggable congestion control algorithm infrastructure. · 317a76f9

由 Stephen Hemminger 提交于 6月 23, 2005

Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules.  The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

317a76f9

19 6月, 2005 2 次提交

[TCP]: Fix sysctl_tcp_low_latency · 7df55125

由 David S. Miller 提交于 6月 18, 2005

When enabled, this should disable UCOPY prequeue'ing altogether,
but it does not due to a missing test.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7df55125

[NET] rename struct tcp_listen_opt to struct listen_sock · 2ad69c55

由 Arnaldo Carvalho de Melo 提交于 6月 18, 2005

Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ad69c55

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功