提交 · 762cc40801ad757a34527d5e548816cf3b6fc606 · openeuler / Kernel

16 10月, 2007 17 次提交

[INET]: Consolidate the xxx_put · 762cc408

由 Pavel Emelyanov 提交于 10月 15, 2007

These ones use the generic data types too, so move
them in one place.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

762cc408

[INET]: Small cleanup for xxx_put after evictor consolidation · 4b6cb5d8

由 Pavel Emelyanov 提交于 10月 15, 2007

After the evictor code is consolidated there is no need in
passing the extra pointer to the xxx_put() functions.

The only place when it made sense was the evictor code itself.

Maybe this change must got with the previous (or with the
next) patch, but I try to make them shorter as much as
possible to simplify the review (but they are still large
anyway), so this change goes in a separate patch.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b6cb5d8

[INET]: Consolidate the xxx_evictor · 8e7999c4

由 Pavel Emelyanov 提交于 10月 15, 2007

The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.

The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e7999c4

[INET]: Consolidate the xxx_frag_destroy · 1e4b8287

由 Pavel Emelyanov 提交于 10月 15, 2007

To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:

 * to destoy the skb (optional, used in conntracks only)
 * to free the queue itself (mandatory, but later I plan to
   move the allocation and the destruction of frag_queues
   into the common place, so this callback will most likely
   be optional too).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e4b8287

[INET]: Consolidate xxx_the secret_rebuild · 321a3a99

由 Pavel Emelyanov 提交于 10月 15, 2007

This code works with the generic data types as well, so
move this into inet_fragment.c

This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c

Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

321a3a99

[INET]: Consolidate the xxx_frag_kill · 277e650d

由 Pavel Emelyanov 提交于 10月 15, 2007

Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.

The xxx_unlink() code is moved as well.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

277e650d

[INET]: Collect common frag sysctl variables together · 04128f23

由 Pavel Emelyanov 提交于 10月 15, 2007

Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.

I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:

 1. to keep them in the __read_mostly section;
 2. not to export the whole inet_frags objects outside.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04128f23

[INET]: Collect frag queues management objects together · 7eb95156

由 Pavel Emelyanov 提交于 10月 15, 2007

There are some objects that are common in all the places
which are used to keep track of frag queues, they are:

 * hash table
 * LRU list
 * rw lock
 * rnd number for hash function
 * the number of queues
 * the amount of memory occupied by queues
 * secret timer

Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like

-    write_lock(&ipfrag_lock);
+    write_lock(&ip4_frags.lock);

To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7eb95156

[INET]: Move common fields from frag_queues in one place. · 5ab11c98

由 Pavel Emelyanov 提交于 10月 15, 2007

Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:

 * struct ipq in ipv4/ip_fragment.c
 * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
 * struct frag_queue in ipv6/reassembly.c

After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like

-    atomic_dec(&fq->refcnt);
+    atomic_dec(&fq->q.refcnt);

(these occupy most of the patch)
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ab11c98

I
[TCP]: high_seq parameter removed (all callers use tp->high_seq) · f885c5b0
由 Ilpo Järvinen 提交于 10月 15, 2007
```
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f885c5b0

[IPV4]: Uninline netfilter okfns · 861d0486

由 Patrick McHardy 提交于 10月 15, 2007

Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.

Before:
   text    data     bss     dec     hex filename
8997385 1016524  524652 10538561         a0ce41 vmlinux

After:
   text    data     bss     dec     hex filename
8994009 1016524  524652 10535185         a0c111 vmlinux
-------------------------------------------------------
  -3376

All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

861d0486

[NETFILTER]: Replace sk_buff ** with sk_buff * · 3db05fea

由 Herbert Xu 提交于 10月 15, 2007

With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3db05fea

[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom · 2ca7b0ac

由 Herbert Xu 提交于 10月 14, 2007

This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.

This allows us to remove the double pointers later.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ca7b0ac

[IPVS]: Replace local version of skb_make_writable · af1e1cf0

由 Herbert Xu 提交于 10月 14, 2007

This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af1e1cf0

[NETFILTER]: Do not copy skb in skb_make_writable · 37d41879

由 Herbert Xu 提交于 10月 14, 2007

Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37d41879

[IPV4]: Change ip_defrag to return an integer · 776c729e

由 Herbert Xu 提交于 10月 14, 2007

Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

776c729e

[IPV4]: Make ip_defrag return the same packet · 1706d587

由 Herbert Xu 提交于 10月 14, 2007

This patch is a bit of a hack.  However it is worth it if you consider that
this is the only reason why we have to carry around the struct sk_buff **
pointers in netfilter.

It makes ip_defrag always return the packet that was given to it on input.
It does this by cloning the packet and replacing its original contents with
the head fragment if necessary.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1706d587

15 10月, 2007 2 次提交

fix endianness bug in inet_lro · f53f4137

由 Al Viro 提交于 10月 14, 2007

all uses of and almost all assignments to lro_desc->tcp_ack assume that it's
net-endian; one converts net-endian to host-endian and sticks it in
lro_desc->tcp_ack.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f53f4137

inet_lro: trivial endianness annotations · 9df7c98a

由 Al Viro 提交于 10月 14, 2007

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9df7c98a

12 10月, 2007 7 次提交

[TCP]: Limit processing lost_retrans loop to work-to-do cases · b08d6cb2

由 Ilpo Järvinen 提交于 10月 11, 2007

This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b08d6cb2

[TCP]: Fix lost_retrans loop vs fastpath problems · f785a8e2

由 Ilpo Järvinen 提交于 10月 11, 2007

Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.

It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.

Maybe after(highest_sack_end_seq, tp->high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.

Noticed because of report against a related problem from TAKANO
Ryousei <takano@axe-inc.co.jp>. He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:

  http://marc.info/?l=linux-netdev&m=119149311913288&w=2

...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f785a8e2

[TCP]: No need to re-count fackets_out/sacked_out at RTO · 4cd82999

由 Ilpo Järvinen 提交于 10月 11, 2007

Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cd82999

[TCP]: Extract tcp_match_queue_to_sack from sacktag code · d1935942

由 Ilpo Järvinen 提交于 10月 11, 2007

This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)

Notice that there's a need to handle out-of-mem at caller's
place.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1935942

[TCP]: Kill almost unused variable pcount from sacktag · f6fb128d

由 Ilpo Järvinen 提交于 10月 11, 2007

It's on the way for future cutting of that function.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6fb128d

[TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L · 3eec0047

由 Ilpo Järvinen 提交于 10月 11, 2007

This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3eec0047

[TCP]: Add bytes_acked (ABC) clearing to FRTO too · 16e90681

由 Ilpo Järvinen 提交于 10月 11, 2007

I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.

Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e90681

11 10月, 2007 14 次提交

[NETLINK]: fib_frontend build fixes · 28f7b036

由 David S. Miller 提交于 10月 10, 2007

1) fibnl needs to be declared outside of config ifdefs,
   and also should not be explicitly initialized to NULL
2) nl_fib_input() args are wrong for netlink_kernel_create()
   input method
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28f7b036

[NET]: make netlink user -> kernel interface synchronious · cd40b7d3

由 Denis V. Lunev 提交于 10月 10, 2007

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd40b7d3

[INET]: local port range robustness · 227b60f5

由 Stephen Hemminger 提交于 10月 10, 2007

Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

227b60f5

[IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c · 631a6698

由 Herbert Xu 提交于 10月 10, 2007

This patch makes the IPv4 x->type->input functions return the next protocol
instead of setting it directly. This is identical to how we do things in
IPv6 and will help us merge common code on the input path.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

631a6698

[IPSEC]: Move IP length/checksum setting out of transforms · ceb1eec8

由 Herbert Xu 提交于 10月 10, 2007

This patch moves the setting of the IP length and checksum fields out of
the transforms and into the xfrmX_output functions.  This would help future
efforts in merging the transforms themselves.

It also adds an optimisation to ipcomp due to the fact that the transport
offset is guaranteed to be zero.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb1eec8

[IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr · 87bdc48d

由 Herbert Xu 提交于 10月 10, 2007

This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions.  Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.

I've also added transport header type conversion headers for these types
which are now used by the transforms.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87bdc48d

[IPSEC]: Use IPv6 calling convention as the convention for x->mode->output · 37fedd3a

由 Herbert Xu 提交于 10月 10, 2007

The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation.  This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.

It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37fedd3a

[IPSEC]: Set skb->data to payload in x->mode->output · 7b277b1a

由 Herbert Xu 提交于 10月 10, 2007

This patch changes the calling convention so that on entry from
x->mode->output and before entry into x->type->output skb->data
will point to the payload instead of the IP header.

This is essentially a redistribution of skb_push/skb_pull calls
with the aim of minimising them on the common path of tunnel +
ESP.

It'll also let us use the same calling convention between IPv4
and IPv6 with the next patch.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b277b1a

[IPSEC] esp: Remove NAT-T checksum invalidation for BEET · 8bd17075

由 Herbert Xu 提交于 10月 10, 2007

I pointed this out back when this patch was first proposed but it looks like
it got lost along the way.

The checksum only needs to be ignored for NAT-T in transport mode where
we lose the original inner addresses due to NAT.  With BEET the inner
addresses will be intact so the checksum remains valid.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bd17075

[TCP]: Separate lost_retrans loop into own function · 1c1e87ed

由 Ilpo Järvinen 提交于 10月 10, 2007

Follows own function for each task principle, this is really
somewhat separate task being done in sacktag. Also reduces
indentation.

In addition, added ack_seq local var to break some long
lines & fixed coding style things.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1e87ed

[NETFILTER]: Make netfilter code use the seq_open_private · e2da5913

由 Pavel Emelyanov 提交于 10月 10, 2007

Just switch to the consolidated calls.

ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2da5913

[NET]: Make core networking code use seq_open_private · cf7732e4

由 Pavel Emelyanov 提交于 10月 10, 2007

This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7732e4

[IPSEC]: Move state lock into x->type->output · b7c6538c

由 Herbert Xu 提交于 10月 09, 2007

This patch releases the lock on the state before calling x->type->output.
It also adds the lock to the spots where they're currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7c6538c

[IPSEC]: Store IPv6 nh pointer in mac_header on output · 007f0211

由 Herbert Xu 提交于 10月 09, 2007

Current the x->mode->output functions store the IPv6 nh pointer in the
skb network header. This is inconvenient because the network header then
has to be fixed up before the packet can leave the IPsec stack. The mac
header field is unused on output so we can use that to store this instead.

This patch does that and removes the network header fix-up in xfrm_output.

It also uses ipv6_hdr where appropriate in the x->type->output functions.

There is also a minor clean-up in esp4 to make it use the same code as
esp6 to help any subsequent effort to merge the two.

Lastly it kills two redundant skb_set_* statements in BEET that were
simply copied over from transport mode.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007f0211

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功