提交 · 63d886c96b2a580b1bf764de238ba3c63515b5ee · openanolis / cloud-kernel

06 7月, 2005 40 次提交

[PKT_SCHED]: Blackhole queueing discipline · 63d886c9

由 Thomas Graf 提交于 7月 05, 2005

Useful in combination with classful qdiscs to drop or
temporary disable certain flows, e.g. one could block
specific ds flows with dsmark.

Unlike the noop qdisc it can be controlled by the user and
statistic accounting is done.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63d886c9

[TCP]: Move to new TSO segmenting scheme. · c1b4a7e6

由 David S. Miller 提交于 7月 05, 2005

Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1b4a7e6

[TCP]: Break out send buffer expansion test. · 0d9901df

由 David S. Miller 提交于 7月 05, 2005

This makes it easier to understand, and allows easier
tweaking of the heuristic later on.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d9901df

[TCP]: Do not call tcp_tso_acked() if no work to do. · cb83199a

由 David S. Miller 提交于 7月 05, 2005

In tcp_clean_rtx_queue(), if the TSO packet is not even partially
acked, do not waste time calling tcp_tso_acked().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb83199a

[TCP]: Kill bogus comment above tcp_tso_acked(). · a5647696

由 David S. Miller 提交于 7月 05, 2005

Everything stated there is out of data.  tcp_trim_skb()
does adjust the available socket send buffer space and
skb->truesize now.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5647696

[TCP]: Fix send-side cpu utiliziation regression. · b4e26f5e

由 David S. Miller 提交于 7月 05, 2005

Only put user data purely to pages when doing TSO.

The extra page allocations cause two problems:

1) Add the overhead of the page allocations themselves.
2) Make us do small user copies when we get to the end
   of the TCP socket cache page.

It is still beneficial to purely use pages for TSO,
so we will do it for that case.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4e26f5e

[TCP]: Eliminate redundant computations in tcp_write_xmit(). · aa93466b

由 David S. Miller 提交于 7月 05, 2005

tcp_snd_test() is run for every packet output by a single
call to tcp_write_xmit(), but this is not necessary.

For one, the congestion window space needs to only be
calculated one time, then used throughout the duration
of the loop.

This cleanup also makes experimenting with different TSO
packetization schemes much easier.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa93466b

[TCP]: Break out tcp_snd_test() into it's constituent parts. · 7f4dd0a9

由 David S. Miller 提交于 7月 05, 2005

tcp_snd_test() does several different things, use inline
functions to express this more clearly.

1) It initializes the TSO count of SKB, if necessary.
2) It performs the Nagle test.
3) It makes sure the congestion window is adhered to.
4) It makes sure SKB fits into the send window.

This cleanup also sets things up so that things like the
available packets in the congestion window does not need
to be calculated multiple times by packet sending loops
such as tcp_write_xmit().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f4dd0a9

[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. · 55c97f3e

由 David S. Miller 提交于 7月 05, 2005

'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue.  This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.

However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.

Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.

As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55c97f3e

[TCP]: Fix redundant calculations of tcp_current_mss() · a2e2a59c

由 David S. Miller 提交于 7月 05, 2005

tcp_write_xmit() uses tcp_current_mss(), but some of it's callers,
namely __tcp_push_pending_frames(), already has this value available
already.

While we're here, fix the "cur_mss" argument to be "unsigned int"
instead of plain "unsigned".
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2e2a59c

[TCP]: tcp_write_xmit() tabbing cleanup · 92df7b51

由 David S. Miller 提交于 7月 05, 2005

Put the main basic block of work at the top-level of
tabbing, and mark the TCP_CLOSE test with unlikely().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92df7b51

[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). · a762a980

由 David S. Miller 提交于 7月 05, 2005

The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it.  tcp_write_xmit() does the call for us,
so the call here can simply be removed.

Also, tcp_write_xmit() can be marked static.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a762a980

[TCP]: Add missing skb_header_release() call to tcp_fragment(). · f44b5271

由 David S. Miller 提交于 7月 05, 2005

When we add any new packet to the TCP socket write queue,
we must call skb_header_release() on it in order for the
TSO sharing checks in the drivers to work.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f44b5271

[TCP]: Move __tcp_data_snd_check into tcp_output.c · 84d3e7b9

由 David S. Miller 提交于 7月 05, 2005

It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84d3e7b9

[TCP]: Move send test logic out of net/tcp.h · f6302d1d

由 David S. Miller 提交于 7月 05, 2005

This just moves the code into tcp_output.c, no code logic changes are
made by this patch.

Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al.  We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6302d1d

[TCP]: Fix quick-ack decrementing with TSO. · fc6415bc

由 David S. Miller 提交于 7月 05, 2005

On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set.  It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.

When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.

Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration.  That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc6415bc

[TCP]: Simplify SKB data portion allocation with NETIF_F_SG. · c65f7f00

由 David S. Miller 提交于 7月 05, 2005

The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.

This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.

So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom().  This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.

Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c65f7f00

[NET]: Remove __ARGS from include/net/slhc_vj.h · b8259d9a

由 Alexey Dobriyan 提交于 7月 05, 2005

I suspect "#define __ARGS(x) ()" was deprecated before I was born.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDomen Puncer <domen@coderock.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8259d9a

[NET]: improve readability of dev_set_promiscuity() in net/core/dev.c · 52609c0b

由 David Chau 提交于 7月 05, 2005

A trivial patch to improve the readability of dev_set_promiscuity()
in net/core/dev.c. New code does exactly the same thing as original
code.
Signed-off-by: NDavid Chau <ddcc@mit.edu>
Signed-off-by: NDomen Puncer <domen@coderock.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52609c0b

[SHAPER]: Switch to spinlocks. · bc971dee

由 Christoph Hellwig 提交于 7月 05, 2005

Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc971dee

[IPV4]: More broken memory allocation fixes for fib_trie · 2f36895a

由 Robert Olsson 提交于 7月 05, 2005

Below a patch to preallocate memory when doing resize of trie (inflate halve)
If preallocations fails it just skips the resize of this tnode for this time.

The oops we got when killing bgpd (with full routing) is now gone. 
Patrick memory patch is also used.
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f36895a

T
[DECNET]: Fix memset overflow on 64bit archs while dumping decnet routing rules · db1322b8
由 Thomas Graf 提交于 7月 05, 2005
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
db1322b8

[IPV4]: Bug fix in rt_check_expire() · bb1d23b0

由 Eric Dumazet 提交于 7月 05, 2005

- rt_check_expire() fixes (an overflow occured if size of the hash
  was >= 65536)

reminder of the bugfix:

The rt_check_expire() has a serious problem on machines with large
route caches, and a standard HZ value of 1000.

With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ;

the loop count :

     for (t = ip_rt_gc_interval << rt_hash_log; t >= 0;


overflows (t is a 31 bit value) as soon rt_hash_log is >= 16  (65536
slots in route cache hash table).

In this case, rt_check_expire() does nothing at all
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb1d23b0

[IPV4]: Use the fancy alloc_large_system_hash() function for route hash table · 424c4b70

由 Eric Dumazet 提交于 7月 05, 2005

- rt hash table allocated using alloc_large_system_hash() function.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

424c4b70

[NET]: Hashed spinlocks in net/ipv4/route.c · 22c047cc

由 Eric Dumazet 提交于 7月 05, 2005

- Locking abstraction
- Spinlocks moved out of rt hash table : Less memory (50%) used by rt 
  hash table. it's a win even on UP.
- Sizing of spinlocks table depends on NR_CPUS
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22c047cc

[IPV4]: Handle large allocations in fib_trie · f0e36f8c

由 Patrick McHardy 提交于 7月 05, 2005

Inflating a node a couple of times makes it exceed the 128k kmalloc limit.
Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NRobert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0e36f8c

D
[TG3]: Update driver version and reldate. · 93e266f6
由 David S. Miller 提交于 7月 05, 2005
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
93e266f6

[TG3]: support for ethtool -C · d244c892

由 Michael Chan 提交于 7月 05, 2005

Add support for ethtool -C with verification of user parameters.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d244c892

H
[IPV6]: Makes IPv6 rcv registration happen last during initialisation. · e2ed4052
由 Herbert Xu 提交于 7月 05, 2005
```
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e2ed4052

[IPV4]: Fix crash in ip_rcv while booting related to netconsole · 30e224d7

由 Herbert Xu 提交于 7月 05, 2005

Makes IPv4 ip_rcv registration happen last in af_inet.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30e224d7

D
[SKGE]: Fix build on big-endian · a31488ca
由 David S. Miller 提交于 7月 05, 2005
```
Missing PCI_REV_DESC define.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
a31488ca

[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation · 023e09a7

由 Thomas Graf 提交于 7月 05, 2005

Current behaviour is to not report an error if a rate
estimator is created together with a qdisc and the
configuration of the rate estimator is bogus. This leads
to unexpected behaviour because the user is not notified.

New behaviour is to report the error and let the whole
qdisc creation operation fail so the user is able to fix
his mistake.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

023e09a7

[PKT_SCHED]: Cleanup qdisc creation and alignment macros · 3d54b82f

由 Thomas Graf 提交于 7月 05, 2005

Adds qdisc_alloc() to share code between qdisc_create()
and qdisc_create_dflt(). Hides the qdisc alignment behind
macros and makes use of them.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d54b82f

T
[PKT_SCHED]: Move sch_generic.c prototypes to correct header file · e41a33e6
由 Thomas Graf 提交于 7月 05, 2005
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e41a33e6

[NET]: Reduce size of sk_buff by 4 bytes · 1cbb3380

由 Thomas Graf 提交于 7月 05, 2005

Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cbb3380

T
[NET]: Remove unused security member in sk_buff · e176fe89
由 Thomas Graf 提交于 7月 05, 2005
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e176fe89

[NET]: net/core/filter.c: make len cover the entire packet · 3154e540

由 Patrick McHardy 提交于 7月 05, 2005

As suggested by Herbert Xu:

Since we don't require anything to be in the linear packet range
anymore make len cover the entire packet.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3154e540

P
[NET]: Consolidate common code in net/core/filter.c · 0b05b2a4
由 Patrick McHardy 提交于 7月 05, 2005
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0b05b2a4

[NET]: Remove redundant code in net/core/filter.c · 6935d46c

由 Patrick McHardy 提交于 7月 05, 2005

skb_header_pointer handles linear and non-linear data, no need to handle
linear data again.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6935d46c

[NET]: Fix signedness issues in net/core/filter.c · 55820ee2

由 Patrick McHardy 提交于 7月 05, 2005

This is the code to load packet data into a register:

                        k = fentry->k;
                        if (k < 0) {
...
                        } else {
                                u32 _tmp, *p;
                                p = skb_header_pointer(skb, k, 4, &_tmp);
                                if (p != NULL) {
                                        A = ntohl(*p);
                                        continue;
                                }
                        }

skb_header_pointer checks if the requested data is within the
linear area:

        int hlen = skb_headlen(skb);

        if (offset + len <= hlen)
                return skb->data + offset;

When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.

I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.

This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.

Thanks to Thomas Vögtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55820ee2

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功