提交 · 5c0de29d06318ec8f6e3ba0d17d62529dbbdc1e8 · openanolis / cloud-kernel

26 3月, 2009 8 次提交

netfilter: nf_conntrack: add generic function to get len of generic policy · 5c0de29d

由 Holger Eitzenberger 提交于 3月 25, 2009

Usefull for all protocols which do not add additional data, such
as GRE or UDPlite.
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

5c0de29d

netfilter: ctnetlink: allocate right-sized ctnetlink skb · 2732c4e4

由 Holger Eitzenberger 提交于 3月 25, 2009

Try to allocate a Netlink skb roughly the size of the actual
message, with the help from the l3 and l4 protocol helpers.
This is all to prevent a reallocation in netlink_trim() later.

The overhead of allocating the right-sized skb is rather small, with
ctnetlink_alloc_skb() actually being inlined away on my x86_64 box.
The size of the per-proto space is determined at registration time of
the protocol helper.
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

2732c4e4

netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() · ea781f19

由 Eric Dumazet 提交于 3月 25, 2009

Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ea781f19

netfilter: limit the length of the helper name · af9d32ad

由 Holger Eitzenberger 提交于 3月 25, 2009

This is necessary in order to have an upper bound for Netlink
message calculation, which is not a problem at all, as there
are no helpers with a longer name.
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

af9d32ad

netfilter: ctnetlink: add callbacks to the per-proto nlattrs · d0dba725

由 Holger Eitzenberger 提交于 3月 25, 2009

There is added a single callback for the l3 proto helper.  The two
callbacks for the l4 protos are necessary because of the general
structure of a ctnetlink event, which is in short:

 CTA_TUPLE_ORIG
   <l3/l4-proto-attributes>
 CTA_TUPLE_REPLY
   <l3/l4-proto-attributes>
 CTA_ID
 ...
 CTA_PROTOINFO
   <l4-proto-attributes>
 CTA_TUPLE_MASTER
   <l3/l4-proto-attributes>

Therefore the formular is

 size := sizeof(generic-nlas) + 3 * sizeof(tuple_nlas) + sizeof(protoinfo_nlas)

Some of the NLAs are optional, e. g. CTA_TUPLE_MASTER, which is only
set if it's an expected connection.  But the number of optional NLAs is
small enough to prevent netlink_trim() from reallocating if calculated
properly.
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

d0dba725

netfilter: factorize ifname_compare() · b8dfe498

由 Eric Dumazet 提交于 3月 25, 2009

We use same not trivial helper function in four places. We can factorize it.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b8dfe498

netfilter: nf_conntrack: use hlist_add_head_rcu() in nf_conntrack_set_hashsize() · 78f36486

由 Eric Dumazet 提交于 3月 25, 2009

Using hlist_add_head() in nf_conntrack_set_hashsize() is quite dangerous.
Without any barrier, one CPU could see a loop while doing its lookup.
Its true new table cannot be seen by another cpu, but previous table is still
readable.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

78f36486

netfilter: fix xt_LED build failure · a9a9adfe

由 Patrick McHardy 提交于 3月 25, 2009

net/netfilter/xt_LED.c:40: error: field netfilter_led_trigger has incomplete type
net/netfilter/xt_LED.c: In function led_timeout_callback:
net/netfilter/xt_LED.c:78: warning: unused variable ledinternal
net/netfilter/xt_LED.c: In function led_tg_check:
net/netfilter/xt_LED.c:102: error: implicit declaration of function led_trigger_register
net/netfilter/xt_LED.c: In function led_tg_destroy:
net/netfilter/xt_LED.c:135: error: implicit declaration of function led_trigger_unregister

Fix by adding a dependency on LED_TRIGGERS.
Reported-by: NSachin Sant <sachinp@in.ibm.com>
Tested-by: NSubrata Modak <tosubrata@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

a9a9adfe

24 3月, 2009 1 次提交

netfilter: nf_conntrack: Reduce conntrack count in nf_conntrack_free() · 1d45209d

由 Eric Dumazet 提交于 3月 24, 2009

We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
accumulate about 10.000 elements per CPU in its internal queues. To get accurate
conntrack counts (at the expense of slightly more RAM used), we might consider
conntrack counter not taking into account "about to be freed elements, waiting
in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
callback.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Tested-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1d45209d

23 3月, 2009 2 次提交

nefilter: nfnetlink: add nfnetlink_set_err and use it in ctnetlink · dd5b6ce6

由 Pablo Neira Ayuso 提交于 3月 23, 2009

This patch adds nfnetlink_set_err() to propagate the error to netlink
broadcast listener in case of memory allocation errors in the
message building.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

dd5b6ce6

netfilter: sysctl support of logger choice · 17625274

由 Eric Leblond 提交于 3月 23, 2009

This patchs adds support of modification of the used logger via sysctl.
It can be used to change the logger to module that can not use the bind
operation (ipt_LOG and ipt_ULOG). For this purpose, it creates a
directory /proc/sys/net/netfilter/nf_log which contains a file
per-protocol. The content of the file is the name current logger (NONE if
not set) and a logger can be setup by simply echoing its name to the file.
By echoing "NONE" to a /proc/sys/net/netfilter/nf_log/PROTO file, the
logger corresponding to this PROTO is set to NULL.
Signed-off-by: NEric Leblond <eric@inl.fr>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

17625274

19 3月, 2009 3 次提交

netfilter: ctnetlink: fix rcu context imbalance · 0f5b3e85

由 Patrick McHardy 提交于 3月 18, 2009

Introduced by 7ec47496 (netfilter: ctnetlink: cleanup master conntrack assignation):

net/netfilter/nf_conntrack_netlink.c:1275:2: warning: context imbalance in 'ctnetlink_create_conntrack' - different lock contexts for basic block
Signed-off-by: NPatrick McHardy <kaber@trash.net>

0f5b3e85

netfilter: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put · 711d60a9

由 Florian Westphal 提交于 3月 18, 2009

users have been moved to __nf_ct_l4proto_find.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

711d60a9

netfilter: ctnetlink: remove remaining module refcounting · cd91566e

由 Florian Westphal 提交于 3月 18, 2009

Convert the remaining refcount users.

As pointed out by Patrick McHardy, the protocols can be accessed safely using RCU.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cd91566e

17 3月, 2009 1 次提交

netfilter: xtables: add cluster match · 0269ea49

由 Pablo Neira Ayuso 提交于 3月 16, 2009

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

0269ea49

16 3月, 2009 8 次提交

net: netfilter conntrack - add per-net functionality for DCCP protocol · 1546000f

由 Cyrill Gorcunov 提交于 3月 16, 2009

Module specific data moved into per-net site and being allocated/freed
during net namespace creation/deletion.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1546000f

netfilter: xtables: avoid pointer to self · acc738fe

由 Jan Engelhardt 提交于 3月 16, 2009

Commit 78454473 (netfilter: iptables:
lock free counters) broke a number of modules whose rule data referenced
itself. A reallocation would not reestablish the correct references, so
it is best to use a separate struct that does not fall under RCU.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

acc738fe

netfilter: ctnetlink: move event reporting for new entries outside the lock · f0a3c086

由 Pablo Neira Ayuso 提交于 3月 16, 2009

This patch moves the event reporting outside the lock section. With
this patch, the creation and update of entries is homogeneous from
the event reporting perspective. Moreover, as the event reporting is
done outside the lock section, the netlink broadcast delivery can
benefit of the yield() call under congestion.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f0a3c086

netfilter: ctnetlink: cleanup conntrack update preliminary checkings · e098360f

由 Pablo Neira Ayuso 提交于 3月 16, 2009

This patch moves the preliminary checkings that must be fulfilled
to update a conntrack, which are the following:

 * NAT manglings cannot be updated
 * Changing the master conntrack is not allowed.

This patch is a cleanup.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

e098360f

netfilter: ctnetlink: cleanup master conntrack assignation · 7ec47496

由 Pablo Neira Ayuso 提交于 3月 16, 2009

This patch moves the assignation of the master conntrack to
ctnetlink_create_conntrack(), which is where it really belongs.
This patch is a cleanup.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

7ec47496

netfilter: remove IPvX specific parts from nf_conntrack_l4proto.h · 9d2493f8

由 Christoph Paasch 提交于 3月 16, 2009

Moving the structure definitions to the corresponding IPvX specific header files.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

9d2493f8

netfilter: print the list of register loggers · c7a913cd

由 Eric Leblond 提交于 3月 16, 2009

This patch modifies the proc output to add display of registered
loggers. The content of /proc/net/netfilter/nf_log is modified. Instead
of displaying a protocol per line with format:
	proto:logger
it now displays:
	proto:logger (comma_separated_list_of_loggers)
NONE is used as keyword if no logger is used.
Signed-off-by: NEric Leblond <eric@inl.fr>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

c7a913cd

netfilter: use a linked list of loggers · ca735b3a

由 Eric Leblond 提交于 3月 16, 2009

This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.
Signed-off-by: NEric Leblond <eric@inl.fr>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ca735b3a

24 2月, 2009 2 次提交

netfilter: xt_hashlimit fix · 28337ff5

由 Eric Dumazet 提交于 2月 24, 2009

Commit 78454473
(netfilter: iptables: lock free counters) broke xt_hashlimit netfilter module :

This module was storing a pointer inside its xt_hashlimit_info, and this pointer
is not relocated when we temporarly switch tables (iptables -L).

This hack is not not needed at all (probably a leftover from
ancient time), as each cpu should and can access to its own copy.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

28337ff5

netfilter: nf_conntrack: account packets drop by tcp_packet() · 7d1e0459

由 Pablo Neira Ayuso 提交于 2月 24, 2009

Since tcp_packet() may return -NF_DROP in two situations, the
packet-drop stats must be increased.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

7d1e0459

20 2月, 2009 4 次提交

netfilter: x_tables: add LED trigger target · 268cb38e

由 Adam Nielsen 提交于 2月 20, 2009

Kernel module providing implementation of LED netfilter target.  Each
instance of the target appears as a led-trigger device, which can be
associated with one or more LEDs in /sys/class/leds/
Signed-off-by: NAdam Nielsen <a.nielsen@shikadi.net>
Acked-by: NRichard Purdie <rpurdie@linux.intel.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

268cb38e

netfilter: fix hardcoded size assumptions · af07d241

由 Hagen Paul Pfeifer 提交于 2月 20, 2009

get_random_bytes() is sometimes called with a hard coded size assumption
of an integer. This could not be true for next centuries. This patch
replace it with a compile time statement.
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

af07d241

netfilter: nf_conntrack: table max size should hold at least table size · e478075c

由 Hagen Paul Pfeifer 提交于 2月 20, 2009

Table size is defined as unsigned, wheres the table maximum size is
defined as a signed integer. The calculation of max is 8 or 4,
multiplied the table size. Therefore the max value is aligned to
unsigned.
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

e478075c

netfilter: iptables: lock free counters · 78454473

由 Stephen Hemminger 提交于 2月 20, 2009

The reader/writer lock in ip_tables is acquired in the critical path of
processing packets and is one of the reasons just loading iptables can cause
a 20% performance loss. The rwlock serves two functions:

1) it prevents changes to table state (xt_replace) while table is in use.
This is now handled by doing rcu on the xt_table. When table is
replaced, the new table(s) are put in and the old one table(s) are freed
after RCU period.

2) it provides synchronization when accesing the counter values.
This is now handled by swapping in new table_info entries for each cpu
then summing the old values, and putting the result back onto one
cpu. On a busy system it may cause sampling to occur at different
times on each cpu, but no packet/byte counts are lost in the process.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>

Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

78454473

19 2月, 2009 4 次提交

netfilter: xt_physdev: unfold two loops in physdev_mt() · eacc17fb

由 Eric Dumazet 提交于 2月 19, 2009

xt_physdev netfilter module can use an ifname_compare() helper
so that two loops are unfolded.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

eacc17fb

netfilter: xt_physdev fixes · 4f1c3b7e

由 Eric Dumazet 提交于 2月 18, 2009

1) physdev_mt() incorrectly assumes nulldevname[] is aligned on an int

2) It also uses word comparisons, while it could use long word ones.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

4f1c3b7e

netfilter: Combine ipt_ttl and ip6t_hl source · cfac5ef7

由 Jan Engelhardt 提交于 2月 18, 2009

Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cfac5ef7

netfilter: Combine ipt_TTL and ip6t_HL source · 563d36eb

由 Jan Engelhardt 提交于 2月 18, 2009

Suggested by: James King <t.james.king@gmail.com>

Similarly to commit c9fd4968, merge
TTL and HL. Since HL does not depend on any IPv6-specific function,
no new module dependencies would arise.

With slight adjustments to the Kconfig help text.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

563d36eb

18 2月, 2009 2 次提交

netfilter: remove unneeded goto · fecea3a3

由 Jan Engelhardt 提交于 2月 18, 2009

Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

fecea3a3

netfilter: change generic l4 protocol number · fe2a7ce4

由 Christoph Paasch 提交于 2月 18, 2009

0 is used by Hop-by-hop header and so this may cause confusion.
255 is stated as 'Reserved' by IANA.
Signed-off-by: NChristoph Paasch <christoph.paasch@student.uclouvain.be>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

fe2a7ce4

10 2月, 2009 3 次提交

netfilter: xt_sctp: sctp chunk mapping doesn't work · d4e2675a

由 Qu Haoran 提交于 2月 09, 2009

When user tries to map all chunks given in argument, kernel
works on a copy of the chunkmap, but at the end it doesn't
check the copy, but the orginal one.
Signed-off-by: NQu Haoran <haoran.qu@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4e2675a

netfilter: ctnetlink: fix echo if not subscribed to any multicast group · 1f9da256

由 Pablo Neira Ayuso 提交于 2月 09, 2009

This patch fixes echoing if the socket that has sent the request to
create/update/delete an entry is not subscribed to any multicast
group. With the current code, ctnetlink would not send the echo
message via unicast as nfnetlink_send() would be skip.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f9da256

netfilter: ctnetlink: allow changing NAT sequence adjustment in creation · c969aa7d

由 Pablo Neira Ayuso 提交于 2月 09, 2009

This patch fixes an inconsistency in the current ctnetlink code
since NAT sequence adjustment bit can only be updated but not set
in the conntrack entry creation.

This patch is used by conntrackd to successfully recover newly
created entries that represent connections with helpers and NAT
payload mangling.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c969aa7d

05 2月, 2009 1 次提交

net: Partially allow skb destructors to be used on receive path · 9a279bcb

由 Herbert Xu 提交于 2月 04, 2009

As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.

This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.

With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.

Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off.  Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.

Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.

It turns out that making skb destructors useable on the receive path
isn't as easy as it seems.  For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough.  This is because we assume all
over the IP stack that skb->sk is an IP socket if present.

The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.

Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches.  However, it turns out that for bridging at least we
don't need to do all of this work.

This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).

So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path.  It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.

One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a279bcb

01 2月, 2009 1 次提交

net: replace uses of __constant_{endian} · 09640e63

由 Harvey Harrison 提交于 2月 01, 2009

Base versions handle constant folding now.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09640e63

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功