提交 · 745898eaf0eb7a04a56dec1188d9148259510863 · openeuler / raspberrypi-kernel

27 5月, 2009 1 次提交

tcp: Optimise GRO port comparisons · 745898ea

由 Herbert Xu 提交于 5月 26, 2009

Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745898ea

22 5月, 2009 1 次提交

ipv4: Fix oops with FIB_TRIE · 3ed18d76

由 Robert Olsson 提交于 5月 21, 2009

It seems we can fix this by disabling preemption while we re-balance the 
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high 
loads continuesly taking a full BGP table up/down via iproute -batch.

Note. fib_trie is not updated for CONFIG_PREEMPT_RCU

Reported-by: Andrei Popa
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ed18d76

21 5月, 2009 3 次提交

net: Remove unused parameter from fill method in fib_rules_ops. · 04af8cf6

由 Rami Rosen 提交于 5月 20, 2009

The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04af8cf6

net: fix rtable leak in net/ipv4/route.c · 1ddbcb00

由 Eric Dumazet 提交于 5月 19, 2009

Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: NAlexander V. Lukyanov <lav@yar.ru>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ddbcb00

net: fix length computation in rt_check_expire() · cf8da764

由 Eric Dumazet 提交于 5月 19, 2009

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf8da764

20 5月, 2009 1 次提交

ipv4: teach ipconfig about the MTU option in DHCP · 9643f455

由 Chris Friesen 提交于 5月 19, 2009

The DHCP spec allows the server to specify the MTU. This can be useful
for netbooting with UDP-based NFS-root on a network using jumbo frames.
This patch allows the kernel IP autoconfiguration to handle this option
correctly.

It would be possible to use initramfs and add a script to set the MTU,
but that seems like a complicated solution if no initramfs is otherwise
necessary, and would bloat the kernel image more than this code would.

This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen.
Signed-off-by: NChris Friesen <cfriesen@nortel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9643f455

19 5月, 2009 5 次提交

net: Fix devinet_sysctl_forward · 9b8adb5e

由 Eric W. Biederman 提交于 5月 13, 2009

sysctls are unregistered with the rntl_lock held making
it unsafe to unconditionally grab the the rtnl_lock.  Instead
we need to call rtnl_trylock and restart the system call
if we can not grab it.  Otherwise we could deadlock at unregistration
time.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b8adb5e

ipv4: make default for INET_LRO consistent with help text · bc8a5397

由 Frans Pop 提交于 5月 18, 2009

Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.")
changed this config from tristate to bool.  Add default so that it is
consistent with the help text.
Signed-off-by: NFrans Pop <elendil@planet.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc8a5397

ipv4: cleanup: remove unnecessary include. · d23a9b5b

由 Rami Rosen 提交于 5月 18, 2009

There is no need for net/icmp.h header in net/ipv4/fib_frontend.c.
This patch removes the #include net/icmp.h from it.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d23a9b5b

R
ipv4: cleanup - remove two unused parameters from fib_semantic_match(). · e204a345
由 Rami Rosen 提交于 5月 18, 2009
```
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e204a345

tcp: fix MSG_PEEK race check · 77527313

由 Ilpo Järvinen 提交于 5月 10, 2009

Commit 518a09ef (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.

I tried my best to deal with urg hole madness too which happens
here:
	if (!sock_flag(sk, SOCK_URGINLINE)) {
		++*seq;
		...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NFrans Pop <elendil@planet.nl>
Tested-by: NIan Zimmermann <itz@buug.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77527313

18 5月, 2009 2 次提交

ipconfig: handle case of delayed DHCP server · 2513dfb8

由 Chris Friesen 提交于 5月 17, 2009

If a DHCP server is delayed, it's possible for the client to receive the 
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from 
a second interface.  The client then sends out a DHCPREQUEST from the 
second interface, but the server doesn't recognize the device and 
rejects the request.

This patch simply tracks the current device being configured and throws 
away the OFFER if it is not intended for the current device.  A more 
sophisticated approach would be to put the OFFER information into the 
struct ic_device rather than storing it globally.
Signed-off-by: NChris Friesen <cfriesen@nortel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2513dfb8

R
ipv4: remove an unused parameter from configure method of fib_rules_ops. · 8b3521ee
由 Rami Rosen 提交于 5月 11, 2009
```
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8b3521ee

09 5月, 2009 1 次提交

ipv4: Make INET_LRO a bool instead of tristate. · e81963b1

由 David S. Miller 提交于 5月 08, 2009

This code is used as a library by several device drivers,
which select INET_LRO.

If some are modules and some are statically built into the
kernel, we get build failures if INET_LRO is modular.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e81963b1

07 5月, 2009 1 次提交

net: Make inet_twsk_put similar to sock_put · 4dbc8ef7

由 Arnaldo Carvalho de Melo 提交于 5月 06, 2009

By separating the freeing code from the refcounting decrementing.
Probably reducing icache pressure when we still have reference counts to
go.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4dbc8ef7

06 5月, 2009 1 次提交

tcp:fix the code indent · ae8d7f88

由 Shan Wei 提交于 5月 05, 2009

Signed-off-by: Shan Wei<shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae8d7f88

05 5月, 2009 2 次提交

tcp: Fix tcp_prequeue() to get correct rto_min value · 0c266898

由 Satoru SATOH 提交于 5月 04, 2009

tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
the actual value might be tuned. The following patches fix this and make
tcp_prequeue get the actual value returns from tcp_rto_min().
Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c266898

tcp: extend ECN sysctl to allow server-side only ECN · 255cac91

由 Ilpo Järvinen 提交于 5月 04, 2009

This should be very safe compared with full enabled, so I see
no reason why it shouldn't be done right away. As ECN can only
be negotiated if the SYN sending party is also supporting it,
somebody in the loop probably knows what he/she is doing. If
SYN does not ask for ECN, the server side SYN-ACK is identical
to what it is without ECN. Thus it's quite safe.

The chosen value is safe w.r.t to existing configs which
choose to currently set manually either 0 or 1 but
silently upgrades those who have not explicitly requested
ECN off.

Whether to just enable both sides comes up time to time but
unless that gets done now we can at least make the servers
aware of ECN already. As there are some known problems to occur
if ECN is enabled, it's currently questionable whether there's
any real gain from enabling clients as servers mostly won't
support it anyway (so we'd hit just the negative sides). After
enabling the servers and getting that deployed, the client end
enable really has some potential gain too.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

255cac91

29 4月, 2009 1 次提交

netfilter: revised locking for x_tables · 942e4a2b

由 Stephen Hemminger 提交于 4月 28, 2009

The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.

This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

942e4a2b

28 4月, 2009 1 次提交

inet_diag: Remove dup assignments · ac5978e7

由 Arnaldo Carvalho de Melo 提交于 4月 28, 2009

These are later assigned to other values without being used meanwhile.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac5978e7

27 4月, 2009 3 次提交

gro: Fix COMPLETE checksum handling · 36e7b1b8

由 Herbert Xu 提交于 4月 27, 2009

On a brand new GRO skb, we cannot call ip_hdr since the header
may lie in the non-linear area.  This patch adds the helper
skb_gro_network_header to handle this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36e7b1b8

ipv4: Limit size of route cache hash table · c9503e0f

由 Anton Blanchard 提交于 4月 27, 2009

Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:

IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.

With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.

With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:

IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)

With this patch rt_worker_func now takes 0.460 ms on the same system.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9503e0f

snmp: add missing counters for RFC 4293 · edf391ff

由 Neil Horman 提交于 4月 27, 2009

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf391ff

20 4月, 2009 2 次提交

syncookies: remove last_synq_overflow from struct tcp_sock · a0f82f64

由 Florian Westphal 提交于 4月 19, 2009

last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even
though it is only used when a listening sockets syn queue
is full.

We can (ab)use rx_opt.ts_recent_stamp to store the same information;
it is not used otherwise as long as a socket is in listen state.

Move linger2 around to avoid splitting struct mtu_probe
across cacheline boundary on 32 bit arches.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0f82f64

tcp: fix mid-wq adjustment helper · 52cf3cc8

由 Ilpo Järvinen 提交于 4月 18, 2009

Just noticed while doing some new work that the recent
mid-wq adjustment logic will misbehave when FACK is not
in use (happens either due sysctl'ed off or auto-detected
reordering) because I forgot the relevant TCPCB tagbit.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52cf3cc8

17 4月, 2009 3 次提交

[PATCH] net: remove superfluous call to synchronize_net() · 573636cb

由 Eric Dumazet 提交于 4月 17, 2009

inet_register_protosw() function is responsible for adding a new
inet protocol into a global table (inetsw[]) that is used with RCU rules.

As soon as the store of the pointer is done, other cpus might see
this new protocol in inetsw[], so we have to make sure new protocol
is ready for use. All pending memory updates should thus be committed
to memory before setting the pointer.
This is correctly done using rcu_assign_pointer()

synchronize_net() is typically used at unregister time, after
unsetting the pointer, to make sure no other cpu is still using
the object we want to dismantle. Using it at register time
is only adding an artificial delay that could hide a real bug,
and this bug could popup if/when synchronize_rcu() can proceed
faster than now.

This saves about 13 ms on boot time on a HZ=1000 8 cpus machine  ;) 
(4 calls to inet_register_protosw(), and about 3200 us per call)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

573636cb

gro: Fix use after free in tcp_gro_receive · a0a69a01

由 Herbert Xu 提交于 4月 17, 2009

After calling skb_gro_receive skb->len can no longer be relied
on since if the skb was merged using frags, then its pages will
have been removed and the length reduced.

This caused tcp_gro_receive to prematurely end merging which
resulted in suboptimal performance with ixgbe.

The fix is to store skb->len on the stack.
Reported-by: NMark Wagner <mwagner@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a69a01

netfilter: nf_nat: add support for persistent mappings · 98d500d6

由 Patrick McHardy 提交于 4月 16, 2009

The removal of the SAME target accidentally removed one feature that is
not available from the normal NAT targets so far, having multi-range
mappings that use the same mapping for each connection from a single
client. The current behaviour is to choose the address from the range
based on source and destination IP, which breaks when communicating
with sites having multiple addresses that require all connections to
originate from the same IP address.

Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
destination address is taken into account for selecting addresses.

http://bugzilla.kernel.org/show_bug.cgi?id=12954Signed-off-by: NPatrick McHardy <kaber@trash.net>

98d500d6

14 4月, 2009 1 次提交

tcp: fix >2 iw selection · 86bcebaf

由 Ilpo Järvinen 提交于 4月 14, 2009

A long-standing feature in tcp_init_metrics() is such that
any of its goto reset prevents call to tcp_init_cwnd().
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86bcebaf

11 4月, 2009 1 次提交

ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7

由 Vlad Yasevich 提交于 4月 09, 2009

Commit b2f5e7cd
(ipv6: Fix conflict resolutions during ipv6 binding)
introduced a regression where time-wait sockets were
not treated correctly.  This resulted in the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
...
Call Trace:
[<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
[<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
[<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
[<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
[<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
[<ffffffff8056ed49>] sys_bind+0x89/0x100
[<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
Tested-by: NBrian Haley <brian.haley@hp.com>
Tested-by: NEd Tomlinson <edt@aei.ca>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

499923c7

03 4月, 2009 2 次提交

tcp: miscounts due to tcp_fragment pcount reset · 9eb9362e

由 Ilpo Järvinen 提交于 4月 01, 2009

It seems that trivial reset of pcount to one was not sufficient
in tcp_retransmit_skb. Multiple counters experience a positive
miscount when skb's pcount gets lowered without the necessary
adjustments (depending on skb's sacked bits which exactly), at
worst a packets_out miscount can crash at RTO if the write queue
is empty!

Triggering this requires mss change, so bidir tcp or mtu probe or
like.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NUwe Bugla <uwe.bugla@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb9362e

tcp: add helper for counter tweaking due mid-wq change · 797108d1

由 Ilpo Järvinen 提交于 4月 01, 2009

We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

797108d1

02 4月, 2009 1 次提交

netfilter: use rcu_read_bh() in ipt_do_table() · fa9a86dd

由 Eric Dumazet 提交于 4月 02, 2009

Commit 78454473
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and  ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
Reported-and-bisected-by: NRoman Mindalev <r000n@r000n.net>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa9a86dd

01 4月, 2009 1 次提交

ipv4: remove unused parameter from tcp_recv_urg(). · 377f0a08

由 Rami Rosen 提交于 3月 31, 2009

Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

377f0a08

28 3月, 2009 2 次提交

netlabel: Label incoming TCP connections correctly in SELinux · 389fb800

由 Paul Moore 提交于 3月 27, 2009

The current NetLabel/SELinux behavior for incoming TCP connections works but
only through a series of happy coincidences that rely on the limited nature of
standard CIPSO (only able to convey MLS attributes) and the write equality
imposed by the SELinux MLS constraints. The problem is that network sockets
created as the result of an incoming TCP connection were not on-the-wire
labeled based on the security attributes of the parent socket but rather based
on the wire label of the remote peer. The issue had to do with how IP options
were managed as part of the network stack and where the LSM hooks were in
relation to the code which set the IP options on these newly created child
sockets. While NetLabel/SELinux did correctly set the socket's on-the-wire
label it was promptly cleared by the network stack and reset based on the IP
options of the remote peer.

This patch, in conjunction with a prior patch that adjusted the LSM hook
locations, works to set the correct on-the-wire label format for new incoming
connections through the security_inet_conn_request() hook. Besides the
correct behavior there are many advantages to this change, the most significant
is that all of the NetLabel socket labeling code in SELinux now lives in hooks
which can return error codes to the core stack which allows us to finally get
ride of the selinux_netlbl_inode_permission() logic which greatly simplfies
the NetLabel/SELinux glue code. In the process of developing this patch I
also ran into a small handful of AF_INET6 cleanliness issues that have been
fixed which should make the code safer and easier to extend in the future.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

389fb800

lsm: Relocate the IPv4 security_inet_conn_request() hooks · 284904aa

由 Paul Moore 提交于 3月 27, 2009

The current placement of the security_inet_conn_request() hooks do not allow
individual LSMs to override the IP options of the connection's request_sock.
This is a problem as both SELinux and Smack have the ability to use labeled
networking protocols which make use of IP options to carry security attributes
and the inability to set the IP options at the start of the TCP handshake is
problematic.

This patch moves the IPv4 security_inet_conn_request() hooks past the code
where the request_sock's IP options are set/reset so that the LSM can safely
manipulate the IP options as needed. This patch intentionally does not change
the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options
are not currently implemented, once they are we will have a better idea of
the correct placement for the IPv6 hooks.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJames Morris <jmorris@namei.org>

284904aa

26 3月, 2009 4 次提交

H
netfilter: nf_conntrack: calculate per-protocol nlattr size · a400c30e
由 Holger Eitzenberger 提交于 3月 25, 2009
```
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
```
a400c30e

netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() · ea781f19

由 Eric Dumazet 提交于 3月 25, 2009

Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ea781f19

netfilter: {ip,ip6,arp}_tables: fix incorrect loop detection · 1f9352ae

由 Patrick McHardy 提交于 3月 25, 2009

Commit e1b4b9f3 ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case
search for loops) introduced a regression in the loop detection algorithm,
causing sporadic incorrectly detected loops.

When a chain has already been visited during the check, it is treated as
having a standard target containing a RETURN verdict directly at the
beginning in order to not check it again. The real target of the first
rule is then incorrectly treated as STANDARD target and checked not to
contain invalid verdicts.

Fix by making sure the rule does actually contain a standard target.

Based on patch by Francis Dupont <Francis_Dupont@isc.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1f9352ae

netfilter: factorize ifname_compare() · b8dfe498

由 Eric Dumazet 提交于 3月 25, 2009

We use same not trivial helper function in four places. We can factorize it.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b8dfe498