提交 · ac758e3c55c529714354fc268892ca4d23ca1e99 · OpenHarmony / kernel_linux

26 4月, 2007 40 次提交

[XFRM]: beet: fix worst case header_len calculation · ac758e3c

由 Patrick McHardy 提交于 4月 09, 2007

esp_init_state doesn't account for the beet pseudo header in the header_len
calculation, which may result in undersized skbs hitting xfrm4_beet_output,
causing unnecessary reallocations in ip_finish_output2.

The skbs should still always have enough room to avoid causing
skb_under_panic in skb_push since we have at least 16 bytes available
from LL_RESERVED_SPACE in xfrm_state_check_space.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac758e3c

[XFRM]: Optimize MTU calculation · c5c25238

由 Patrick McHardy 提交于 4月 09, 2007

Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.

Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c25238

[XFRM]: esp: fix skb_tail_pointer conversion bug · 55792258

由 Patrick McHardy 提交于 4月 09, 2007

Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer
conversion:

-       *(u8*)(trailer->tail - 1) = top_iph->protocol;
+       *(skb_tail_pointer(skb) - 1) = top_iph->protocol;

-       *(u8 *)(trailer->tail - 1) = *skb_network_header(skb);
+       *(skb_tail_pointer(skb) - 1) = *skb_network_header(skb);
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55792258

[XFRM]: beet: minor cleanups · ea2f10a3

由 Patrick McHardy 提交于 4月 05, 2007

Remove unnecessary initialization/variable.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea2f10a3

A
[SK_BUFF]: Some more conversions to skb_copy_from_linear_data · 1a4e2d09
由 Arnaldo Carvalho de Melo 提交于 3月 31, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
```
1a4e2d09

[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} · 27d7ff46

由 Arnaldo Carvalho de Melo 提交于 3月 31, 2007

To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

27d7ff46

T
[NET] fib_rules: delay route cache flush by ip_rt_min_delay · 4b19ca44
由 Thomas Graf 提交于 3月 28, 2007
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4b19ca44

[SK_BUFF]: Introduce skb_copy_from_linear_data{_offset} · d626f62b

由 Arnaldo Carvalho de Melo 提交于 3月 27, 2007

To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

d626f62b

[IPV4]: align inet_protos[] on SMP · 03d4f879

由 Eric Dumazet 提交于 3月 27, 2007

As IPPROTO_TCP is 6, it makes sense to make sure inet_protos[] array
is properly cache line aligned to avoid false sharing on SMP.

c0680540 b peer_total
c0680544 b inet_peer_unused_head
c0680560 B inet_protos

On i386 this example, we can see that inet_protos[IPPROTO_TCP] shares
a potentially hot (and modified) cache line.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03d4f879

[TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates · 4103f8cd

由 Eric Dumazet 提交于 3月 27, 2007

tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated.
(Very hot cache line)
It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP.

ffffffff8081d9c0 B tcp_orphan_count
ffffffff8081d9c4 B tcp_memory_allocated
ffffffff8081d9c8 B tcp_sockets_allocated
ffffffff8081d9cc B tcp_memory_pressure
ffffffff8081d9d0 b tcp_md5sig_users
ffffffff8081d9d8 b tcp_md5sig_pool
ffffffff8081d9e0 b warntime.31570
ffffffff8081d9e8 b tcp_socket
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4103f8cd

[NET] fib_rules: Flush route cache after rule modifications · 73417f61

由 Thomas Graf 提交于 3月 27, 2007

The results of FIB rules lookups are cached in the routing cache
except for IPv6 as no such cache exists. So far, it was the
responsibility of the user to flush the cache after modifying any
rules. This lead to many false bug reports due to misunderstanding
of this concept.

This patch automatically flushes the route cache after inserting
or deleting a rule.

Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug
in the previous patch.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73417f61

[NET]: inet_ehash_secret should be __read_mostly and set only once · be776281

由 Eric Dumazet 提交于 3月 27, 2007

There is a very tiny probability that build_ehash_secret() is called
at the same time by different CPUS.

Also, using __read_mostly is a must for inet_ehash_secret
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be776281

[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE · 35fc92a9

由 Herbert Xu 提交于 3月 26, 2007

Right now Xen has a horrible hack that lets it forward packets with
partial checksums.  One of the reasons that CHECKSUM_PARTIAL and
CHECKSUM_COMPLETE were added is so that we can get rid of this hack
(where it creates two extra bits in the skbuff to essentially mirror
ip_summed without being destroyed by the forwarding code).

I had forgotten that I've already gone through all the deivce drivers
last time around to make sure that they're looking at ip_summed ==
CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit.  In any case,
I've now done that again so it should definitely be safe.

Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE
values on forward so we I'm setting that to CHECKSUM_NONE.  This should
be safe to remove for bridging but I'd like to check that code path
first.

So here is the patch that lets us get rid of the hack by preserving
ip_summed (mostly) on forwarded packets.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35fc92a9

[IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed · 2d771cd8

由 Janusz Krzysztofik 提交于 3月 26, 2007

this is a small patch by Janusz Krzysztofik to ip_route_output_slow()
that allows VIP-less LVS linux director to generate packets
originating >From VIP if sysctl_ip_nonlocal_bind is set.

In a nutshell, the intention is for an LVS linux director to be able
to send ICMP unreachable responses to end-users when real-servers are
removed.

http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00106.htmlSigned-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d771cd8

[TCP] tcp_probe: improvements for net-2.6.22 · 85795d64

由 Stephen Hemminger 提交于 3月 24, 2007

Change tcp_probe to use ktime (needed to add one export).
Add option to only get events when cwnd changes - from Doug Leith
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85795d64

[TCP]: cubic update for net-2.6.22 · e1c3e7ab

由 Stephen Hemminger 提交于 3月 24, 2007

The following update received from Injong updates TCP cubic to the latest
version. I am running more complete tests and will have results after 4/1.

According to Injong: the new version improves on its scalability,
fairness and stability. So in all properties, we confirmed it shows better
performance.

NCSU results (for 2.6.18 and 2.6.20) available:
http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing

This version is described in a new Internet draft for CUBIC.
http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txtSigned-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1c3e7ab

[NET] Move DF check to ip_forward · 9af3912e

由 John Heffner 提交于 3月 25, 2007

Do fragmentation check in ip_forward, similar to ipv6 forwarding.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9af3912e

[INET]: Use jhash + random secret for ehash. · b3da2cf3

由 David S. Miller 提交于 3月 23, 2007

The days are gone when this was not an issue, there are folks out
there with huge bot networks that can be used to attack the
established hash tables on remote systems.

So just like the routing cache and connection tracking
hash, use Jenkins hash with random secret input.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3da2cf3

[NETFILTER]: Use setup_timer · e6f689db

由 Patrick McHardy 提交于 3月 23, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f689db

[NETFILTER]: Remove changelogs and CVS IDs · 1b53d904

由 Patrick McHardy 提交于 3月 23, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b53d904

[NETLINK]: Directly return -EINTR from netlink_dump_start() · c702e804

由 Thomas Graf 提交于 3月 22, 2007

Now that all users of netlink_dump_start() use netlink_run_queue()
to process the receive queue, it is possible to return -EINTR from
netlink_dump_start() directly, therefore simplying the callers.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c702e804

[IPv4] diag: Use netlink_run_queue() to process the receive queue · ead592ba

由 Thomas Graf 提交于 3月 22, 2007

Makes use of netlink_run_queue() to process the receive queue and
converts inet_diag_rcv_msg() to use the type safe netlink interface.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ead592ba

T
[TCP] westwood: Use type safe netlink interface · 26728105
由 Thomas Graf 提交于 3月 22, 2007
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
26728105

[TCP] vegas: Use type safe netlink interface · e9195d67

由 Thomas Graf 提交于 3月 22, 2007

Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9195d67

[TCP]: cubic optimization · 7e58886b

由 Stephen Hemminger 提交于 3月 22, 2007

Use willy's work in optimizing cube root by having table for small values.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e58886b

[NET] rules: Unified rules dumping · c454673d

由 Thomas Graf 提交于 3月 25, 2007

Implements a unified, protocol independant rules dumping function
which is capable of both, dumping a specific protocol family or
all of them. This speeds up dumping as less lookups are required.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c454673d

[IPv4]: Use rtnl registration interface · 63f3444f

由 Thomas Graf 提交于 3月 22, 2007

Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63f3444f

[NETLINK]: Use nlmsg_trim() where appropriate · dc5fc579

由 Arnaldo Carvalho de Melo 提交于 3月 25, 2007

Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc5fc579

[NETLINK]: Introduce nlmsg_hdr() helper · b529ccf2

由 Arnaldo Carvalho de Melo 提交于 4月 25, 2007

For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b529ccf2

[IPV4]: fib_trie root node settings · 965ffea4

由 Robert Olsson 提交于 3月 19, 2007

The threshold for root node can be more aggressive set to get
better tree compression. The new setting mekes the root grow
from 16 to 19 bits and substansial improvemnt in Aver depth
this with the current table of 214393 prefixes

But really the dynamic resize should need more investigation
both in terms convergence and performance and maybe it should
be possible to change...

Maybe just for the brave to start with or we may have to back
this out.

965ffea4

[IPV4]: fib_trie resize break · 05eee48c

由 Robert Olsson 提交于 3月 19, 2007

The patch below adds break condition for the resize operations. If
we don't achieve the desired fill factor a warning is printed. Trie
should still be operational but new thresholds should be considered.
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05eee48c

[SK_BUFF]: Convert skb->tail to sk_buff_data_t · 27a884dc

由 Arnaldo Carvalho de Melo 提交于 4月 19, 2007

So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27a884dc

[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures · 2e07fa9c

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
in further shrinking work, likely with the offsetization of other pointers,
such as ->{data,tail,end}, at the cost of adds, that were minimized by the
usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
accessed multiple times in each function, it also is not more expensive than
before with regards to most of the handling of such headers, like setting one
of these headers to another (transport to network, etc), or subtracting, adding
to/from it, comparing them, etc.

Now we have this layout for sk_buff on a x86_64 machine:

[acme@mica net-2.6.22]$ pahole vmlinux sk_buff
struct sk_buff {
	struct sk_buff *       next;             /*   0   8 */
	struct sk_buff *       prev;             /*   8   8 */
	struct rb_node         rb;               /*  16  24 */
	struct sock *          sk;               /*  40   8 */
	ktime_t                tstamp;           /*  48   8 */
	struct net_device *    dev;              /*  56   8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct net_device *    input_dev;        /*  64   8 */
	sk_buff_data_t         transport_header; /*  72   4 */
	sk_buff_data_t         network_header;   /*  76   4 */
	sk_buff_data_t         mac_header;       /*  80   4 */

	/* XXX 4 bytes hole, try to pack */

	struct dst_entry *     dst;              /*  88   8 */
	struct sec_path *      sp;               /*  96   8 */
	char                   cb[48];           /* 104  48 */
	/* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
	unsigned int           len;              /* 152   4 */
	unsigned int           data_len;         /* 156   4 */
	unsigned int           mac_len;          /* 160   4 */
	union {
		__wsum         csum;             /*       4 */
		__u32          csum_offset;      /*       4 */
	};                                       /* 164   4 */
	__u32                  priority;         /* 168   4 */
	__u8                   local_df:1;       /* 172   1 */
	__u8                   cloned:1;         /* 172   1 */
	__u8                   ip_summed:2;      /* 172   1 */
	__u8                   nohdr:1;          /* 172   1 */
	__u8                   nfctinfo:3;       /* 172   1 */
	__u8                   pkt_type:3;       /* 173   1 */
	__u8                   fclone:2;         /* 173   1 */
	__u8                   ipvs_property:1;  /* 173   1 */

	/* XXX 2 bits hole, try to pack */

	__be16                 protocol;         /* 174   2 */
	void    (*destructor)(struct sk_buff *); /* 176   8 */
	struct nf_conntrack *  nfct;             /* 184   8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct sk_buff *       nfct_reasm;       /* 192   8 */
	struct nf_bridge_info *nf_bridge;        /* 200   8 */
	__u16                  tc_index;         /* 208   2 */
	__u16                  tc_verd;          /* 210   2 */
	dma_cookie_t           dma_cookie;       /* 212   4 */
	__u32                  secmark;          /* 216   4 */
	__u32                  mark;             /* 220   4 */
	unsigned int           truesize;         /* 224   4 */
	atomic_t               users;            /* 228   4 */
	unsigned char *        head;             /* 232   8 */
	unsigned char *        data;             /* 240   8 */
	unsigned char *        tail;             /* 248   8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	unsigned char *        end;              /* 256   8 */
}; /* size: 264, cachelines: 5 */
   /* sum members: 260, holes: 1, sum holes: 4 */
   /* bit holes: 1, sum bit holes: 2 bits */
   /* last cacheline: 8 bytes */

On 32 bits nothing changes, and pointers continue to be used with the compiler
turning all this abstraction layer into dust. But there are some sk_buff
validation tricks that are now possible, humm... :-)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e07fa9c

[SK_BUFF]: unions of just one member don't get anything done, kill them · b0e380b1

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0e380b1

[SK_BUFF]: Introduce skb_network_header_len · cfe1fc77

由 Arnaldo Carvalho de Melo 提交于 3月 16, 2007

For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfe1fc77

[SK_BUFF]: Use the helpers to get the layer header pointer · bff9b61c

由 Arnaldo Carvalho de Melo 提交于 3月 16, 2007

Some more cases...
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bff9b61c

[SK_BUFF]: Some more layer header conversions · ddc7b8e3

由 Arnaldo Carvalho de Melo 提交于 3月 15, 2007

Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddc7b8e3

[SK_BUFF]: More skb_put related skb_reset_transport_header · d10ba34b

由 Arnaldo Carvalho de Melo 提交于 3月 14, 2007

This time we have to set it to skb->tail that is not anymore equal to
skb->data, so we either add a new helper or just add the skb->tail - skb->data
offset, for now do the later.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d10ba34b

[NETFILTER]: nf_conntrack: add nf_copy() to safely copy members in skb · e7ac05f3

由 Yasuyuki Kozakai 提交于 3月 14, 2007

This unifies the codes to copy netfilter related datas. Before copying,
nf_copy() puts original members in destination skb.
Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7ac05f3

[NETFILTER]: Remove IPv4 only connection tracking/NAT · 587aa641

由 Patrick McHardy 提交于 3月 14, 2007

Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

587aa641

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多