提交 · 96f51428c43de20723630f0d756a7a9a42cbd974 · openeuler / Kernel

14 6月, 2015 12 次提交

netfilter: ipset: Introduce RCU locking in bitmap:* types · 96f51428

由 Jozsef Kadlecsik 提交于 6月 13, 2015

There's nothing much required because the bitmap types use atomic
bit operations. However the logic of adding elements slightly changed:
first the MAC address updated (which is not atomic), then the element
activated (added). The extensions may call kfree_rcu() therefore we
call rcu_barrier() at module removal.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

96f51428

netfilter: ipset: Prepare the ipset core to use RCU at set level · b57b2d1f

由 Jozsef Kadlecsik 提交于 6月 13, 2015

Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking
accordingly. Convert the comment extension into an rcu-avare object. Also,
simplify the timeout routines.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

b57b2d1f

netfilter:ipset Remove rbtree from hash:net,iface · bd55389c

由 Jozsef Kadlecsik 提交于 6月 13, 2015

Remove rbtree in order to introduce RCU instead of rwlock in ipset
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

bd55389c

netfilter: ipset: Make sure listing doesn't grab a set which is just being destroyed. · 9c1ba5c8

由 Jozsef Kadlecsik 提交于 6月 13, 2015

There was a small window when all sets are destroyed and a concurrent
listing of all sets could grab a set which is just being destroyed.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

9c1ba5c8

netfilter: ipset: Fix parallel resizing and listing of the same set · c4c99783

由 Jozsef Kadlecsik 提交于 6月 13, 2015

When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying of
the original hash table may happen from the resizing or listing functions.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

c4c99783

netfilter: ipset: Fix cidr handling for hash:*net* types · f690cbae

由 Jozsef Kadlecsik 提交于 6月 12, 2015

Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.

Testsuite entry is added to verify the fix.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

f690cbae

netfilter: ipset: Check CIDR value only when attribute is given · aff22758

由 Sergey Popovich 提交于 6月 12, 2015

There is no reason to check CIDR value regardless attribute
specifying CIDR is given.

Initialize cidr array in element structure on element structure
declaration to let more freedom to the compiler to optimize
initialization right before element structure is used.

Remove local variables cidr and cidr2 for netnet and netportnet
hashes as we do not use packed cidr value for such set types and
can store value directly in e.cidr[].
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

aff22758

netfilter: ipset: Make sure we always return line number on batch · a212e08e

由 Sergey Popovich 提交于 6月 12, 2015

Even if we return with generic IPSET_ERR_PROTOCOL it is good idea
to return line number if we called in batch mode.

Moreover we are not always exiting with IPSET_ERR_PROTOCOL. For
example hash:ip,port,net may return IPSET_ERR_HASH_RANGE_UNSUPPORTED
or IPSET_ERR_INVALID_CIDR.
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

a212e08e

netfilter: ipset: Permit CIDR equal to the host address CIDR in IPv6 · 2c227f27

由 Sergey Popovich 提交于 6月 12, 2015

Permit userspace to supply CIDR length equal to the host address CIDR
length in netlink message. Prohibit any other CIDR length for IPv6
variant of the set.

Also return -IPSET_ERR_HASH_RANGE_UNSUPPORTED instead of generic
-IPSET_ERR_PROTOCOL in IPv6 variant of hash:ip,port,net when
IPSET_ATTR_IP_TO attribute is given.
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

2c227f27

netfilter: ipset: Check extensions attributes before getting extensions. · 7dd37bc8

由 Sergey Popovich 提交于 6月 12, 2015

Make all extensions attributes checks within ip_set_get_extensions()
and reduce number of duplicated code.
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

7dd37bc8

S
netfilter: ipset: Use SET_WITH_*() helpers to test set extensions · edda0791
由 Sergey Popovich 提交于 6月 12, 2015
```
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
edda0791
J
netfilter: ipset: Use MSEC_PER_SEC consistently · aaeb6e24
由 Jozsef Kadlecsik 提交于 6月 12, 2015
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
aaeb6e24

12 6月, 2015 28 次提交

netfilter: xtables: avoid percpu ruleset duplication · 482cfc31

由 Florian Westphal 提交于 6月 11, 2015

We store the rule blob per (possible) cpu.  Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.

Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.

On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.
Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

482cfc31

netfilter: xtables: use percpu rule counters · 71ae0dff

由 Florian Westphal 提交于 6月 11, 2015

The binary arp/ip/ip6tables ruleset is stored per cpu.

The only reason left as to why we need percpu duplication are the rule
counters embedded into ipt_entry et al -- since each cpu has its own copy
of the rules, all counters can be lockless.

The downside is that the more cpus are supported, the more memory is
required.  Rules are not just duplicated per online cpu but for each
possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
not for the e.g. 64 cores present.

To save some memory and also improve utilization of shared caches it
would be preferable to only store the rule blob once.

So we first need to separate counters and the rule blob.

Instead of using entry->counters, allocate this percpu and store the
percpu address in entry->counters.pcnt on CONFIG_SMP.

This change makes no sense as-is; it is merely an intermediate step to
remove the percpu duplication of the rule set in a followup patch.
Suggested-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

71ae0dff

netfilter: bridge: restore vlan tag when refragmenting · d7b59742

由 Florian Westphal 提交于 6月 05, 2015

If bridge netfilter is used with both
bridge-nf-call-iptables and bridge-nf-filter-vlan-tagged enabled
then ip fragments in VLAN frames are sent without the vlan header.

This has never worked reliably.  Turns out this relied on pre-3.5
behaviour where skb frag_list was used to store ip fragments;
ip_fragment() then re-used these skbs.

But since commit 3cc49492
("ipv4: use skb coalescing in defragmentation") this is no longer
the case.  ip_do_fragment now needs to allocate new skbs, but these
don't contain the vlan tag information anymore.

Fix it by storing vlan information of the ressembled skb in the
br netfilter percpu frag area, and restore them for each of the
fragments.

Fixes: 3cc49492 ("ipv4: use skb coalescing in defragmentation")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d7b59742

net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling · 33b1f313

由 Florian Westphal 提交于 6月 05, 2015

since commit d6b915e2
("ip_fragment: don't forward defragmented DF packet") the largest
fragment size is available in the IPCB.

Therefore we no longer need to care about 'encapsulation'
overhead of stripped PPPOE/VLAN headers since ip_do_fragment
doesn't use device mtu in such cases.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

33b1f313

netfilter: bridge: forward IPv6 fragmented packets · efb6de9b

由 Bernhard Thaler 提交于 5月 30, 2015

IPv6 fragmented packets are not forwarded on an ethernet bridge
with netfilter ip6_tables loaded. e.g. steps to reproduce

1) create a simple bridge like this

        modprobe br_netfilter
        brctl addbr br0
        brctl addif br0 eth0
        brctl addif br0 eth2
        ifconfig eth0 up
        ifconfig eth2 up
        ifconfig br0 up

2) place a host with an IPv6 address on each side of the bridge

        set IPv6 address on host A:
        ip -6 addr add fd01:2345:6789:1::1/64 dev eth0

        set IPv6 address on host B:
        ip -6 addr add fd01:2345:6789:1::2/64 dev eth0

3) run a simple ping command on host A with packets > MTU

        ping6 -s 4000 fd01:2345:6789:1::2

4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge

IPv6 fragmented packets traverse the bridge cleanly until somebody runs.
"ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are
loaded) IPv6 fragmented packets do not traverse the bridge any more (you
see no more responses in ping's output).

After applying this patch IPv6 fragmented packets traverse the bridge
cleanly in above scenario.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
[pablo@netfilter.org: small changes to br_nf_dev_queue_xmit]
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

efb6de9b

netfilter: bridge: re-order check_hbh_len() · a4611d3b

由 Bernhard Thaler 提交于 5月 30, 2015

Prepare check_hbh_len() to be called from newly introduced
br_validate_ipv6() in next commit.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a4611d3b

netfilter: bridge: rename br_parse_ip_options · 77d574e7

由 Bernhard Thaler 提交于 5月 30, 2015

br_parse_ip_options() does not parse any IP options, it validates IP
packets as a whole and the function name is misleading.

Rename br_parse_ip_options() to br_validate_ipv4() and remove unneeded
commments.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

77d574e7

netfilter: bridge: refactor frag_max_size · 411ffb4f

由 Bernhard Thaler 提交于 5月 30, 2015

Currently frag_max_size is member of br_input_skb_cb and copied back and
forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or
used.

Attach frag_max_size to nf_bridge_info and set value in pre_routing and
forward functions. Use its value in forward and xmit functions.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

411ffb4f

netfilter: bridge: detect NAT66 correctly and change MAC address · 72b31f72

由 Bernhard Thaler 提交于 5月 30, 2015

IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge.

e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-iptables=1
$ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
  -j REDIRECT --to-ports 81

This does not work with ip6tables on a bridge in NAT66 scenario
because the REDIRECT/DNAT/SNAT is not correctly detected.

The bridge pre-routing (finish) netfilter hook has to check for a possible
redirect and then fix the destination mac address. This allows to use the
ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4
iptables version.

e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-ip6tables=1
$ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
  -j REDIRECT --to-ports 81

This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested
on a bridge with two interfaces using SNAT/DNAT NAT66 rules.
Reported-by: NArtie Hamilton <artiemhamilton@yahoo.com>
Signed-off-by: NSven Eckelmann <sven@open-mesh.com>
[bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()]
[bernhard.thaler@wvnet.at: rebased, split into separate patches]
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

72b31f72

netfilter: bridge: re-order br_nf_pre_routing_finish_ipv6() · 8cae308d

由 Bernhard Thaler 提交于 5月 30, 2015

Put br_nf_pre_routing_finish_ipv6() after daddr_was_changed() and
br_nf_pre_routing_finish_bridge() to prepare calling these functions
from there.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8cae308d

netfilter: bridge: refactor clearing BRNF_NF_BRIDGE_PREROUTING · d39a33ed

由 Bernhard Thaler 提交于 5月 30, 2015

use binary AND on complement of BRNF_NF_BRIDGE_PREROUTING to unset
bit in nf_bridge->mask.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d39a33ed

netfilter: conntrack: warn the user if there is a better helper to use · 77966845

由 Marcelo Ricardo Leitner 提交于 5月 21, 2015

After db29a950 ("netfilter: conntrack: disable generic tracking for
known protocols"), if the specific helper is built but not loaded
(a standard for most distributions) systems with a restrictive firewall
but weak configuration regarding netfilter modules to load, will
silently stop working.

This patch then puts a warning message so the sysadmin knows where to
start looking into. It's a pr_warn_once regardless of protocol itself
but it should be enough to give a hint on where to look.

Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

77966845

Merge branch 'tcp-gso-settings-defer' · c63264de

由 David S. Miller 提交于 6月 11, 2015

Eric Dumazet says:

====================
tcp: defer shinfo->gso_size|type settings

We put shinfo->gso_segs in TCP_SKB_CB(skb) a while back for performance
reasons.

This was in commit cd7d8498 ("tcp: change tcp_skb_pcount() location")

This patch series complete the job for gso_size and gso_type, so that
we do not bring 2 extra cache lines in tcp write xmit fast path,
and making tcp_init_tso_segs() simpler and faster.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c63264de

tcp: remove obsolete check in tcp_set_skb_tso_segs() · b5e2c457