提交 · cd40b7d3983c708aabe3d3008ec64ffce56d33b0 · openeuler / Kernel

11 10月, 2007 40 次提交

[NET]: make netlink user -> kernel interface synchronious · cd40b7d3

由 Denis V. Lunev 提交于 10月 10, 2007

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd40b7d3

[NET]: unify netlink kernel socket recognition · aed81560

由 Denis V. Lunev 提交于 10月 10, 2007

There are currently two ways to determine whether the netlink socket is a
kernel one or a user one. This patch creates a single inline call for
this purpose and unifies all the calls in the af_netlink.c

No similar calls are found outside af_netlink.c.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aed81560

[NET]: cleanup 3rd argument in netlink_sendskb · 7ee015e0

由 Denis V. Lunev 提交于 10月 10, 2007

netlink_sendskb does not use third argument. Clean it and save a couple of
bytes.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ee015e0

[NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2 · 3b71535f

由 Denis V. Lunev 提交于 10月 10, 2007

The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.

Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b71535f

[NET]: rtnl_unlock cleanups · 1536cc0d

由 Denis V. Lunev 提交于 10月 10, 2007

There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.

Normal code path is the following:
netlink_sendmsg
   netlink_unicast
       netlink_sendskb
           skb_queue_tail
           netlink_data_ready
               rtnetlink_rcv
                   mutex_lock(&rtnl_mutex);
                   netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
                   mutex_unlock(&rtnl_mutex);

So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1536cc0d

[NET]: sanitize kernel_accept() error path · fa8705b0

由 Tony Battersby 提交于 10月 10, 2007

If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore).  Make it pass back NULL
instead for better safety.
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa8705b0

[INET]: local port range robustness · 227b60f5

由 Stephen Hemminger 提交于 10月 10, 2007

Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

227b60f5

[SCTP]: port randomization · 06393009

由 Stephen Hemminger 提交于 10月 10, 2007

Add port randomization rather than a simple fixed rover
for use with SCTP.  This makes it act similar to TCP, UDP, DCCP
when allocating ports.

No longer need port_alloc_lock as well (suggestion by Brian Haley).
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06393009

[NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched · 3c0cfc13

由 Patrick McHardy 提交于 10月 10, 2007

The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0cfc13

[IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c · 631a6698

由 Herbert Xu 提交于 10月 10, 2007

This patch makes the IPv4 x->type->input functions return the next protocol
instead of setting it directly. This is identical to how we do things in
IPv6 and will help us merge common code on the input path.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

631a6698

[IPSEC]: Move IP length/checksum setting out of transforms · ceb1eec8

由 Herbert Xu 提交于 10月 10, 2007

This patch moves the setting of the IP length and checksum fields out of
the transforms and into the xfrmX_output functions.  This would help future
efforts in merging the transforms themselves.

It also adds an optimisation to ipcomp due to the fact that the transport
offset is guaranteed to be zero.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb1eec8

[IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr · 87bdc48d

由 Herbert Xu 提交于 10月 10, 2007

This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions.  Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.

I've also added transport header type conversion headers for these types
which are now used by the transforms.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87bdc48d

[IPSEC]: Use IPv6 calling convention as the convention for x->mode->output · 37fedd3a

由 Herbert Xu 提交于 10月 10, 2007

The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation.  This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.

It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37fedd3a

[IPSEC]: Set skb->data to payload in x->mode->output · 7b277b1a

由 Herbert Xu 提交于 10月 10, 2007

This patch changes the calling convention so that on entry from
x->mode->output and before entry into x->type->output skb->data
will point to the payload instead of the IP header.

This is essentially a redistribution of skb_push/skb_pull calls
with the aim of minimising them on the common path of tunnel +
ESP.

It'll also let us use the same calling convention between IPv4
and IPv6 with the next patch.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b277b1a

[IPSEC] beet: Fix extension header support on output · bee0b40c

由 Herbert Xu 提交于 10月 10, 2007

The beet output function completely kills any extension headers by replacing
them with the IPv6 header.  This is because it essentially ignores the
result of ip6_find_1stfragopt by simply acting as if there aren't any
extension headers.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bee0b40c

[IPSEC] esp: Remove NAT-T checksum invalidation for BEET · 8bd17075

由 Herbert Xu 提交于 10月 10, 2007

I pointed this out back when this patch was first proposed but it looks like
it got lost along the way.

The checksum only needs to be ignored for NAT-T in transport mode where
we lose the original inner addresses due to NAT.  With BEET the inner
addresses will be intact so the checksum remains valid.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bd17075

[IPV6]: Defer IPv6 device initialization until a valid qdisc is specified · f24e3d65

由 Mitsuru Chinen 提交于 10月 10, 2007

To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.
Signed-off-by: NMitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f24e3d65

[NET]: Remove double dev->flags checking when calling dev_close() · 9b772652

由 Pavel Emelyanov 提交于 10月 10, 2007

The unregister_netdevice() and dev_change_net_namespace()
both check for dev->flags to be IFF_UP before calling the
dev_close(), but the dev_close() checks for IFF_UP itself,
so remove those unneeded checks.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b772652

[TCP]: Separate lost_retrans loop into own function · 1c1e87ed

由 Ilpo Järvinen 提交于 10月 10, 2007

Follows own function for each task principle, this is really
somewhat separate task being done in sacktag. Also reduces
indentation.

In addition, added ack_seq local var to break some long
lines & fixed coding style things.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1e87ed

[SUNRPC]: Make the sunrpc use the seq_open_private() · ec931035

由 Pavel Emelyanov 提交于 10月 10, 2007

Just switch to the consolidated code.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec931035

[IRDA]: Make the IRDA use the seq_open_private() · a662d4cb

由 Pavel Emelyanov 提交于 10月 10, 2007

Just switch to the consolidated code
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a662d4cb

[DECNET]: Make decnet code use the seq_open_private() · 31164088

由 Pavel Emelyanov 提交于 10月 10, 2007

Just switch to the consolidated code.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31164088

[NETFILTER]: Make netfilter code use the seq_open_private · e2da5913

由 Pavel Emelyanov 提交于 10月 10, 2007

Just switch to the consolidated calls.

ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2da5913

[NET]: Make core networking code use seq_open_private · cf7732e4

由 Pavel Emelyanov 提交于 10月 10, 2007

This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7732e4

[PATCH] mac80211: Defer setting of RX_FLAG_DECRYPTED. · e2f036da

由 Mattias Nissler 提交于 10月 07, 2007

The decryption handlers will skip the frame if the RX_FLAG_DECRYPTED
flag is set, so the early flag setting introduced by Johannes breaks
decryption. To work around this, call the handlers first and then set
the flag.
Signed-off-by: NMattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

e2f036da

[PATCH] ieee80211_if_set_type: make check for master dev more explicit · 0654ff05

由 John W. Linville 提交于 10月 04, 2007

Problem description by Daniel Drake <dsd@gentoo.org>:

"This sequence of events causes loss of connectivity:

<plug in>
<associate as normal in managed mode>
ifconfig eth7 down
iwconfig eth7 mode monitor
ifconfig eth7 up
ifconfig eth7 down
iwconfig eth7 mode managed
<associate as normal>

At this point you are associated but TX does not work. This is because
the eth7 hard_start_xmit is still ieee80211_monitor_start_xmit."

The problem is caused by ieee80211_if_set_type checking for a non-zero
hard_start_xmit pointer value in order to avoid changing that value for
master devices.  The fix is to make that check more explicitly linked to
master devices rather than simply checking if the value has been
previously set.

CC: Daniel Drake <dsd@gentoo.org>
Acked-by: NMichael Wu <flamingice@sourmilk.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

0654ff05

[IPSEC]: Move state lock into x->type->output · b7c6538c

由 Herbert Xu 提交于 10月 09, 2007

This patch releases the lock on the state before calling x->type->output.
It also adds the lock to the spots where they're currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7c6538c

[IPSEC]: Lock state when copying non-atomic fields to user-space · 050f009e

由 Herbert Xu 提交于 10月 09, 2007

This patch adds locking so that when we're copying non-atomic fields such as
life-time or coaddr to user-space we don't get a partial result.

For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
expiration notification to include the keys and life-times.  This is in-line
with XFRM behaviour.

The actual cases affected are:

* pfkey_getspi: No change as we don't have any keys to copy.
* key_notify_sa:
	+ ADD/UPD: This wouldn't work otherwise.
	+ DEL: It can't hurt.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

050f009e

[XFRM] user: Move attribute copying code into copy_to_user_state_extra · 68325d3b

由 Herbert Xu 提交于 10月 09, 2007

Here's a good example of code duplication leading to code rot.  The
notification patch did its own netlink message creation for xfrm states.
It duplicated code that was already in dump_one_state.  Guess what, the
next time (and the time after) when someone updated dump_one_state the
notification path got zilch.

This patch moves that code from dump_one_state to copy_to_user_state_extra
and uses it in xfrm_notify_sa too.  Unfortunately whoever updates this
still needs to update xfrm_sa_len since the notification path wants to
know the exact size for allocation.

At least I've added a comment saying so and if someone still forgest, we'll
have a WARN_ON telling us so.

I also changed the security size calculation to use xfrm_user_sec_ctx since
that's what we actually put into the skb.  However it makes no practical
difference since it has the same size as xfrm_sec_ctx.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68325d3b

[IPSEC]: Move common code into xfrm_alloc_spi · 658b219e

由 Herbert Xu 提交于 10月 09, 2007

This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.

In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space. This is inconsistent as other identical constructions
are not protected by the state lock. This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).

The SPI byte order conversion has also been moved.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

658b219e

[IPSEC]: Remove gratuitous km wake-up events on ACQUIRE · 75ba28c6

由 Herbert Xu 提交于 10月 09, 2007

There is no point in waking people up when creating/updating larval states
because they'll just go back to sleep again as larval states by definition
cannot be found by xfrm_state_find.

We should only wake them up when the larvals mature or die.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75ba28c6

[IPSEC]: Store IPv6 nh pointer in mac_header on output · 007f0211

由 Herbert Xu 提交于 10月 09, 2007

Current the x->mode->output functions store the IPv6 nh pointer in the
skb network header. This is inconvenient because the network header then
has to be fixed up before the packet can leave the IPsec stack. The mac
header field is unused on output so we can use that to store this instead.

This patch does that and removes the network header fix-up in xfrm_output.

It also uses ipv6_hdr where appropriate in the x->type->output functions.

There is also a minor clean-up in esp4 to make it use the same code as
esp6 to help any subsequent effort to merge the two.

Lastly it kills two redundant skb_set_* statements in BEET that were
simply copied over from transport mode.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007f0211

[IPSEC]: Remove bogus ref count in xfrm_secpath_reject · 1ecafede

由 Herbert Xu 提交于 10月 09, 2007

Constructs of the form

	xfrm_state_hold(x);
	foo(x);
	xfrm_state_put(x);

tend to be broken because foo is either synchronous where this is totally
unnecessary or if foo is asynchronous then the reference count is in the
wrong spot.

In the case of xfrm_secpath_reject, the function is synchronous and therefore
we should just kill the reference count.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ecafede

[NETNS]: Don't memset() netns to zero manually · 32f0c4cb

由 Pavel Emelyanov 提交于 10月 09, 2007

The newly created net namespace is set to 0 with memset()
in setup_net(). The setup_net() is also called for the
init_net_ns(), which is zeroed naturally as a global var.

So remove this memset and allocate new nets with the
kmem_cache_zalloc().
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32f0c4cb

[IPv6]: use container_of() macro in fib6_clean_node() · 0a8891a0

由 Benjamin Thery 提交于 10月 08, 2007

In ip6_fib.c, fib6_clean_node() casts a fib6_walker_t pointer to
a fib6_cleaner_t pointer assuming a struct fib6_walker_t (field 'w')
is the first field in struct fib6_walker_t.

To prevent any future problems that may occur if one day a field
is inadvertently inserted before the 'w' field in struct fib6_cleaner_t,
(and to improve readability), this patch uses the container_of() macro.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a8891a0

[NETNS]: Move some code into __init section when CONFIG_NET_NS=n · 4665079c

由 Pavel Emelyanov 提交于 10月 08, 2007

With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.

Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.

The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4665079c

[8021Q]: transfer dev_id from real device · 3607c446

由 Ursula Braun 提交于 10月 08, 2007

A net_device struct provides field dev_id. It is used for
unique ipv6 generation in case of shared network cards
(as for the OSA network cards of IBM System z).
If VLAN devices are built on top of such shared network cards,
this dev_id information needs to be transferred to the VLAN device.
Signed-off-by: NUrsula Braun <braunu@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3607c446

[IPSEC]: Move RO-specific output code into xfrm6_mode_ro.c · 45b17f48

由 Herbert Xu 提交于 10月 08, 2007

The lastused update check in xfrm_output can be done just as well in
the mode output function which is specific to RO.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45b17f48

[IPSEC]: Unexport xfrm_replay_notify · cdf7e668

由 Herbert Xu 提交于 10月 08, 2007

Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
the export.

This patch also removes xfrm_aevent_doreplay since it's now called in just
one spot.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdf7e668

[IPSEC]: Move output replay code into xfrm_output · 436a0a40

由 Herbert Xu 提交于 10月 08, 2007

The replay counter is one of only two remaining things in the output code
that requires a lock on the xfrm state (the other being the crypto).  This
patch moves it into the generic xfrm_output so we can remove the lock from
the transforms themselves.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

436a0a40

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功