提交 · 7f6b9dbd5afbd966a82dcbafc5ed62305eb9d479 · openanolis / cloud-kernel

23 2月, 2010 2 次提交

由 stephen hemminger 提交于 2月 22, 2010

Get rid of custom locking that was using wait queue, lock, and atomic
to basically build a queued mutex.  Use RCU for read side.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f6b9dbd

packet: convert socket list to RCU (v3) · 808f5114

由 stephen hemminger 提交于 2月 22, 2010

Convert AF_PACKET to use RCU, eliminating one more reader/writer lock.

There is no need for a real sk_del_node_init_rcu(), because sk_del_node_init
is doing the equivalent thing to hlst_del_init_rcu already; but added
some comments to try and make that obvious.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

808f5114

20 2月, 2010 3 次提交

xfrm: Flushing empty SPD generates false events · 2f1eb65f

由 Jamal Hadi Salim 提交于 2月 19, 2010

To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.

Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f1eb65f

xfrm: Flushing empty SAD generates false events · 9e64cc95

由 Jamal Hadi Salim 提交于 2月 19, 2010

To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.

Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e64cc95

pfkey: fix SA and SP flush sequence · 8be987d7

由 Jamal Hadi Salim 提交于 2月 19, 2010

RFC 2367 says flushing behavior should be:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush event to ALL listeners

This is not realistic today in the presence of selinux policies
which may reject the flush etc. So we make the sequence become:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush response to originater from #1
4) if there were no errors then:
kernel -> user space: flush event to ALL listeners
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8be987d7

19 2月, 2010 11 次提交

netfilter: nf_queue: fix NF_STOLEN skb leak · 64507fdb

由 Eric Dumazet 提交于 2月 19, 2010

commit 3bc38712 (handle NF_STOP and unknown verdicts in
nf_reinject) was a partial fix to packet leaks.

If user asks NF_STOLEN status, we must free the skb as well.
Reported-by: NAfi Gjermund <afigjermund@gmail.com>
Signed-off-by: NEric DUmazet <eric.dumazet@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

64507fdb

netfilter: ctnetlink: fix creation of conntrack with helpers · a88e22ad

由 Pablo Neira Ayuso 提交于 2月 19, 2010

This patch fixes a bug that triggers an assertion if you create
a conntrack entry with a helper and netfilter debugging is enabled.
Basically, we hit the assertion because the confirmation flag is
set before the conntrack extensions are added. To fix this, we
move the extension addition before the aforementioned flag is
set.

This patch also removes the possibility of setting a helper for
existing conntracks. This operation would also trigger the
assertion since we are not allowed to add new extensions for
existing conntracks. We know noone that could benefit from
this operation sanely.

Thanks to Eric Dumazet for initial posting a preliminary patch
to address this issue.
Reported-by: NDavid Ramblewski <David.Ramblewski@atosorigin.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

a88e22ad

xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR · 72032fdb

由 jamal 提交于 2月 18, 2010

XFRMINHDRERROR counter is ambigous when validating forwarding
path. It makes it tricky to debug when you have both in and fwd
validation.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72032fdb

net: TCP thin dupack · 7e380175

由 Andreas Petlund 提交于 2月 18, 2010

This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.
Signed-off-by: NAndreas Petlund <apetlund@simula.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e380175

net: TCP thin linear timeouts · 36e31b0a

由 Andreas Petlund 提交于 2月 18, 2010

This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.
Signed-off-by: NAndreas Petlund <apetlund@simula.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36e31b0a

const: struct nla_policy · b54452b0

由 Alexey Dobriyan 提交于 2月 18, 2010

Make remaining netlink policies as const.
Fixup coding style where needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b54452b0

ipv6: drop unused "dev" arg of icmpv6_send() · 3ffe533c

由 Alexey Dobriyan 提交于 2月 18, 2010

Dunno, what was the idea, it wasn't used for a long time.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ffe533c

ipv6: use standard lists for FIB walks · bbef49da

由 Alexey Dobriyan 提交于 2月 18, 2010

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbef49da

ipv6: remove stale MIB definitions · bc417d99

由 Alexey Dobriyan 提交于 2月 18, 2010

ICMP6 MIB statistics was per-netns for quite a time.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc417d99

AF_UNIX: update locking comment · 663717f6

由 Stephen Hemminger 提交于 2月 18, 2010

The lock used in unix_state_lock() is a spin_lock not reader-writer.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

663717f6

netfilter: nf_defrag_ipv4: fix compilation error with NF_CONNTRACK=n · 37ee3d5b

由 Patrick McHardy 提交于 2月 18, 2010

As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation
of nf_defrag_ipv4 fails with:

include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct'
include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put'
include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct'
net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct'

net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a
few #ifdefs. Long term the header file should be fixed to be usable
even with NF_CONNTRACK=n.
Tested-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

37ee3d5b

18 2月, 2010 9 次提交

ipvs: SCTP Trasport Loadbalancing Support · 2906f66a

由 Venkata Mohan Reddy 提交于 2月 18, 2010

Enhance IPVS to load balance SCTP transport protocol packets. This is done
based on the SCTP rfc 4960. All possible control chunks have been taken
care. The state machine used in this code looks some what lengthy. I tried
to make the state machine easy to understand.
Signed-off-by: NVenkata Mohan Reddy Koppula <mohanreddykv@gmail.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

2906f66a

IPv6: convert mc_lock to spinlock · 6457d26b

由 Stephen Hemminger 提交于 2月 17, 2010

Only used for writing, so convert to spinlock
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6457d26b

net: export attach/detach filter routines · 5ff3f073

由 Michael S. Tsirkin 提交于 2月 14, 2010

Export sk_attach_filter/sk_detach_filter routines,
so that tun module can use them.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ff3f073

net: bug fix for vlan + gro issue · e76b69cc

由 Ajit Khaparde 提交于 2月 16, 2010

Traffic (tcp) doesnot start on a vlan interface when gro is enabled.
Even the tcp handshake was not taking place.
This is because, the eth_type_trans call before the netif_receive_skb
in napi_gro_finish() resets the skb->dev to napi->dev from the previously
set vlan netdev interface. This causes the ip_route_input to drop the
incoming packet considering it as a packet coming from a martian source.

I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7.
With this fix, the traffic starts and the test runs fine on both vlan
and non-vlan interfaces.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAjit Khaparde <ajitk@serverengines.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e76b69cc

xfrm: Revert false event eliding commits. · 069c474e

由 David S. Miller 提交于 2月 17, 2010

As reported by Alexey Dobriyan:

--------------------
setkey now takes several seconds to run this simple script
and it spits "recv: Resource temporarily unavailable" messages.

#!/usr/sbin/setkey -f
flush;
spdflush;

add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;

spdadd A B any -P in ipsec
        ipcomp/tunnel/192.168.1.2-192.168.1.3/use;
spdadd B A any -P out ipsec
        ipcomp/tunnel/192.168.1.3-192.168.1.2/use;
--------------------

Obviously applications want the events even when the table
is empty.  So we cannot make this behavioral change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

069c474e

ethtool: Don't flush n-tuple list from ethtool_reset() · 7af3351f

由 Ben Hutchings 提交于 2月 17, 2010

The n-tuple list should be flushed if and only if the ETH_RESET_FILTER
flag is set and the driver is able to reset filtering/flow direction
hardware without also resetting a component whose flag is not set.
This test is best left to the driver.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7af3351f

net: use kasprintf() for socket cache names · faf23422

由 Alexey Dobriyan 提交于 2月 17, 2010

kasprintf() makes code smaller.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faf23422

xt_hashlimit: fix locking · 8a5ce545

由 Eric Dumazet 提交于 2月 17, 2010

Commit 2eff25c1
(netfilter: xt_hashlimit: fix race condition and simplify locking)
added a mutex deadlock :
htable_create() is called with hashlimit_mutex already locked
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a5ce545

ipmr: remove useless checks from ipmr_device_event · 9f0beba9

由 Pavel Emelyanov 提交于 2月 17, 2010

The net being checked there is dev_net(dev) and thus this if
is always false.

Fits both net and net-next trees.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f0beba9

17 2月, 2010 15 次提交

net: remove INIT_RCU_HEAD() usage · dc4c2c31

由 Alexey Dobriyan 提交于 2月 12, 2010

call_rcu() will unconditionally reinitialize RCU head anyway.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc4c2c31

percpu: add __percpu sparse annotations to net · 7d720c3e

由 Tejun Heo 提交于 2月 16, 2010

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d720c3e

xfrm: avoid spinlock in get_acqseq() used by xfrm user · 6836b9bd

由 jamal 提交于 2月 16, 2010

Eric's version fixed it for pfkey. This one is for xfrm user.
I thought about amortizing those two get_acqseq()s but it seems
reasonable to have two of these sequence spaces for the two different
interfaces.

cheers,
jamal
commit d5168d5addbc999c94aacda8f28a4a173756a72b
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date:   Tue Feb 16 06:51:22 2010 -0500

    xfrm: avoid spinlock in get_acqseq() used by xfrm user

    This is in the same spirit as commit 28aecb9d
    by Eric Dumazet.
    Use atomic_inc_return() in get_acqseq() to avoid taking a spinlock
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6836b9bd

net neigh: Decouple per interface neighbour table controls from binary sysctls · 54716e3b

由 Eric W. Biederman 提交于 2月 14, 2010

Stop computing the number of neighbour table settings we have by
counting the number of binary sysctls.  This behaviour was silly
and meant that we could not add another neighbour table setting
without also adding another binary sysctl.

Don't pass the binary sysctl path for neighour table entries
into neigh_sysctl_register.  These parameters are no longer
used and so are just dead code.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54716e3b

net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers · 02291680

由 Eric W. Biederman 提交于 2月 14, 2010

Stop using the binary sysctl enumeartion in sysctl.h as an index into
a per interface array.  This leads to unnecessary binary sysctl number
allocation, and a fragility in data structure and implementation
because of unnecessary coupling.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02291680

tunnels: fix netns vs proto registration ordering · d5aa407f

由 Alexey Dobriyan 提交于 2月 16, 2010

Same stuff as in ip_gre patch: receive hook can be called before netns
setup is done, oopsing in net_generic().
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5aa407f

gre: fix netns vs proto registration ordering · c2892f02

由 Alexey Dobriyan 提交于 2月 16, 2010

GRE protocol receive hook can be called right after protocol addition is done.
If netns stuff is not yet initialized, we're going to oops in
net_generic().

This is remotely oopsable if ip_gre is compiled as module and packet
comes at unfortunate moment of module loading.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2892f02

xfrm: Fix xfrm_state_clone leak · 553f9118

由 Herbert Xu 提交于 2月 15, 2010

xfrm_state_clone calls kfree instead of xfrm_state_put to free
a failed state.  Depending on the state of the failed state, it
can cause leaks to things like module references.

All states should be freed by xfrm_state_put past the point of
xfrm_init_state.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

553f9118

ipcomp: Avoid duplicate calls to ipcomp_destroy · 10e7454e

由 Herbert Xu 提交于 2月 15, 2010

When ipcomp_tunnel_attach fails we will call ipcomp_destroy twice.
This may lead to double-frees on certain structures.

As there is no reason to explicitly call ipcomp_destroy, this patch
removes it from ipcomp*.c and lets the standard xfrm_state destruction
take place.

This is based on the discovery and patch by Alexey Dobriyan.
Tested-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10e7454e

ethtool: allow non-admin user to read GRO settings. · 1cab819b

由 stephen hemminger 提交于 2月 11, 2010

Looks like an oversight in GRO design.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cab819b

netfilter: ebtables: mark: add CONFIG_COMPAT support · 6e705f56

由 Florian Westphal 提交于 1月 27, 2010

Add the required handlers to convert 32 bit
ebtables mark match and match target structs to 64bit layout.
Signed-off-by: NFlorian Westphal <fwestphal@astaro.com>

6e705f56

netfilter: ebt_limit: add CONFIG_COMPAT support · 314ddca3

由 Florian Westphal 提交于 1月 27, 2010

ebt_limit structure is larger on 64 bit systems due
to "long" type used in the (kernel-only) data section.

Setting .compatsize is enough in this case, these values
have no meaning in userspace.
Signed-off-by: NFlorian Westphal <fwestphal@astaro.com>

314ddca3

netfilter: ebtables: try native set/getsockopt handlers, too · 90b89af7

由 Florian Westphal 提交于 2月 07, 2010

ebtables can be compiled to perform userspace-side padding of
structures. In that case, all the structures are already in the
'native' format expected by the kernel.

This tries to determine what format the userspace program is
using.

For most set/getsockopts, this can be done by checking
the len argument for sizeof(compat_ebt_replace) and
re-trying the native handler on error.

In case of EBT_SO_GET_ENTRIES, the native handler is tried first,
it will error out early when checking the *len argument
(the compat version has to defer this check until after
 iterating over the kernel data set once, to adjust for all
 the structure size differences).

As this would cause error printks, remove those as well, as
recommended by Bart de Schuymer.
Signed-off-by: NFlorian Westphal <fw@strlen.de>

90b89af7

netfilter: ebtables: add CONFIG_COMPAT support · 81e675c2

由 Florian Westphal 提交于 1月 05, 2010

Main code for 32 bit userland ebtables binary with 64 bit kernels
support.

Tested on x86_64 kernel only, using 64bit ebtables binary
for output comparision.

At least ebt_mark, m_mark and ebt_limit need CONFIG_COMPAT hooks, too.

remaining problem:

The ebtables userland makefile has:
ifeq ($(shell uname -m),sparc64)
	CFLAGS+=-DEBT_MIN_ALIGN=8 -DKERNEL_64_USERSPACE_32
endif

struct ebt_replace, ebt_entry_match etc. then contain userland-side
padding, i.e.  even if we are called from a 32 bit userland, the
structures may already be in the right format.

This problem is addressed in a follow-up patch.
Signed-off-by: NFlorian Westphal <fwestphal@astaro.com>

81e675c2

netfilter: ebtables: split update_counters into two functions · 49facff9

由 Florian Westphal 提交于 2月 07, 2010

allows to call do_update_counters() from upcoming CONFIG_COMPAT
code instead of copy&pasting the same code.
Signed-off-by: NFlorian Westphal <fw@strlen.de>

49facff9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功