- 23 4月, 2014 1 次提交
-
-
由 Richard Guy Briggs 提交于
Remove duplicity and simplify code flow by moving the rcu_read_unlock() above the condition and let the flow control exit naturally at the end of the function. Signed-off-by: NRichard Guy Briggs <rgb@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2014 2 次提交
-
-
由 Patrick McHardy 提交于
nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Andrey Vagin 提交于
[ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrey Vagin <avagin@openvz.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 08 4月, 2014 1 次提交
-
-
由 Andrey Vagin 提交于
nf_ct_gre_keymap_flush() removes a nf_ct_gre_keymap object from net_gre->keymap_list and frees the object. But it doesn't clean a reference on this object from ct_pptp_info->keymap[dir]. Then nf_ct_gre_keymap_destroy() may release the same object again. So nf_ct_gre_keymap_flush() can be called only when we are sure that when nf_ct_gre_keymap_destroy will not be called. nf_ct_gre_keymap is created by nf_ct_gre_keymap_add() and the right way to destroy it is to call nf_ct_gre_keymap_destroy(). This patch marks nf_ct_gre_keymap_flush() as static, so this patch can break compilation of third party modules, which use nf_ct_gre_keymap_flush. I'm not sure this is the right way to deprecate this function. [ 226.540793] general protection fault: 0000 [#1] SMP [ 226.541750] Modules linked in: nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre ip_gre ip_tunnel gre ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth tun bridge stp llc ppdev microcode joydev pcspkr serio_raw virtio_console virtio_balloon floppy parport_pc parport pvpanic i2c_piix4 virtio_net drm_kms_helper ttm ata_generic virtio_pci virtio_ring virtio drm i2c_core pata_acpi [last unloaded: ip_tunnel] [ 226.541776] CPU: 0 PID: 49 Comm: kworker/u4:2 Not tainted 3.14.0-rc8+ #101 [ 226.541776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 226.541776] Workqueue: netns cleanup_net [ 226.541776] task: ffff8800371e0000 ti: ffff88003730c000 task.ti: ffff88003730c000 [ 226.541776] RIP: 0010:[<ffffffff81389ba9>] [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP: 0018:ffff88003730dbd0 EFLAGS: 00010a83 [ 226.541776] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800374e6c40 RCX: dead000000200200 [ 226.541776] RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800371e07d0 RDI: ffff8800374e6c40 [ 226.541776] RBP: ffff88003730dbd0 R08: 0000000000000000 R09: 0000000000000000 [ 226.541776] R10: 0000000000000001 R11: ffff88003730d92e R12: 0000000000000002 [ 226.541776] R13: ffff88007a4c42d0 R14: ffff88007aef0000 R15: ffff880036cf0018 [ 226.541776] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 226.541776] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 226.541776] CR2: 00007f07f643f7d0 CR3: 0000000036fd2000 CR4: 00000000000006f0 [ 226.541776] Stack: [ 226.541776] ffff88003730dbe8 ffffffff81389c5d ffff8800374ffbe4 ffff88003730dc28 [ 226.541776] ffffffffa0162a43 ffffffffa01627c5 ffff88007a4c42d0 ffff88007aef0000 [ 226.541776] ffffffffa01651c0 ffff88007a4c45e0 ffff88007aef0000 ffff88003730dc40 [ 226.541776] Call Trace: [ 226.541776] [<ffffffff81389c5d>] list_del+0xd/0x30 [ 226.541776] [<ffffffffa0162a43>] nf_ct_gre_keymap_destroy+0x283/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa01627c5>] ? nf_ct_gre_keymap_destroy+0x5/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0162ab7>] gre_destroy+0x27/0x70 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0117de3>] destroy_conntrack+0x83/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d87>] ? destroy_conntrack+0x27/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d60>] ? nf_conntrack_hash_check_insert+0x2e0/0x2e0 [nf_conntrack] [ 226.541776] [<ffffffff81630142>] nf_conntrack_destroy+0x72/0x180 [ 226.541776] [<ffffffff816300d5>] ? nf_conntrack_destroy+0x5/0x180 [ 226.541776] [<ffffffffa011ef80>] ? kill_l3proto+0x20/0x20 [nf_conntrack] [ 226.541776] [<ffffffffa011847e>] nf_ct_iterate_cleanup+0x14e/0x170 [nf_conntrack] [ 226.541776] [<ffffffffa011f74b>] nf_ct_l4proto_pernet_unregister+0x5b/0x90 [nf_conntrack] [ 226.541776] [<ffffffffa0162409>] proto_gre_net_exit+0x19/0x30 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffff815edf89>] ops_exit_list.isra.1+0x39/0x60 [ 226.541776] [<ffffffff815eecc0>] cleanup_net+0x100/0x1d0 [ 226.541776] [<ffffffff810a608a>] process_one_work+0x1ea/0x4f0 [ 226.541776] [<ffffffff810a6028>] ? process_one_work+0x188/0x4f0 [ 226.541776] [<ffffffff810a64ab>] worker_thread+0x11b/0x3a0 [ 226.541776] [<ffffffff810a6390>] ? process_one_work+0x4f0/0x4f0 [ 226.541776] [<ffffffff810af42d>] kthread+0xed/0x110 [ 226.541776] [<ffffffff8173d4dc>] ? _raw_spin_unlock_irq+0x2c/0x40 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] [<ffffffff8174747c>] ret_from_fork+0x7c/0xb0 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [ 226.541776] RIP [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP <ffff88003730dbd0> [ 226.612193] ---[ end trace 985ae23ddfcc357c ]--- Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrey Vagin <avagin@openvz.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 4月, 2014 6 次提交
-
-
由 Pablo Neira Ayuso 提交于
The intended format in request_module is %.*s instead of %*.s. Reported-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Currently, nf_tables trims off the set name if it exceeeds 15 bytes, so explicitly reject set names that are too large. Reported-by: NGiuseppe Longo <giuseppelng@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Kirill Tkhai 提交于
There are no these aliases, so kernel can not request appropriate match table: $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP iptables: No chain/target/match by that name. setsockopt() requests ipt_osf module, which is not present. Add the aliases. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Alexey Perevalov 提交于
This simple modification allows iptables to work with INPUT chain in combination with cgroup module. It could be useful for counting ingress traffic per cgroup with nfacct netfilter module. There were no problems to count the egress traffic that way formerly. It's possible to get classified sk_buff after PREROUTING, due to socket lookup being done in early_demux (tcp_v4_early_demux). Also it works for udp as well. Trivial usage example, assuming we're in the same shell every step and we have enough permissions: 1) Classic net_cls cgroup initialization: mkdir /sys/fs/cgroup/net_cls mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls 2) Set up cgroup for interesting application: mkdir /sys/fs/cgroup/net_cls/wget echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs 3) Create kernel counters: nfacct add wget-cgroup-in iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in nfacct add wget-cgroup-out iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out 4) Network usage: wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz 5) Check results: nfacct list Cgroup approach is being used for the DataUsage (counting & blocking traffic) feature for Samsung's modification of the Tizen OS. Signed-off-by: NAlexey Perevalov <a.perevalov@samsung.com> Acked-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Eric points out that the locks can be global. Moreover, both Jesper and Eric note that using only 32 locks increases false sharing as only two cache lines are used. This increases locks to 256 (16 cache lines assuming 64byte cacheline and 4 bytes per spinlock). Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com> Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
cannot use ARRAY_SIZE() if spinlock_t is empty struct. Fixes: 1442e750 ("netfilter: connlimit: use keyed locks") Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 28 3月, 2014 1 次提交
-
-
由 Zoltan Kiss 提交于
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2014 1 次提交
-
-
由 David S. Miller 提交于
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an empty structure. Replace it by CONNTRACK_LOCKS Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 3月, 2014 3 次提交
-
-
由 Florian Westphal 提交于
With current match design every invocation of the connlimit_match function means we have to perform (number_of_conntracks % 256) lookups in the conntrack table [ to perform GC/delete stale entries ]. This is also the reason why ____nf_conntrack_find() in perf top has > 20% cpu time per core. This patch changes the storage to rbtree which cuts down the number of ct objects that need testing. When looking up a new tuple, we only test the connections of the host objects we visit while searching for the wanted host/network (or the leaf we need to insert at). The slot count is reduced to 32. Increasing slot count doesn't speed up things much because of rbtree nature. before patch (50kpps rx, 10kpps tx): + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw after (90kpps, 51kpps tx): + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw + 2.36% swapper [xt_connlimit] [k] count_tree Obvious disadvantages to previous version are the increase in code complexity and the increased memory cost. Partially based on Eric Dumazets fq scheduler. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
currently returns 1 if they're the same. Make it work like mem/strcmp so it can be used as rbtree search function. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
connlimit currently suffers from spinlock contention, example for 4-core system with rps enabled: + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw May allow parallel lookup/insert/delete if the entry is hashed to another slot. With patch: + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock Improved rx processing rate from ~35kpps to ~50 kpps. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 15 3月, 2014 1 次提交
-
-
由 Eric W. Biederman 提交于
Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2014 1 次提交
-
-
由 Joe Perches 提交于
The use of __constant_<foo> has been unnecessary for quite awhile now. Make these uses consistent with the rest of the kernel. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 12 3月, 2014 4 次提交
-
-
由 Florian Westphal 提交于
We might allocate thousands of these (one object per connection). Use distinct kmem cache to permit simplte tracking on how many objects are currently used by the connlimit match via the sysfs. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Allows easier code-reuse in followup patches. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Instead of freeing the entry from our list and then adding it back again in the 'packet to closing connection' case just keep the matching entry around. Also drop the found_ct != NULL test as nf_ct_tuplehash_to_ctrack is just container_of(). Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Simplifies followup patch that introduces separate locks for each of the hash slots. Reviewed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 08 3月, 2014 5 次提交
-
-
由 Patrick McHardy 提交于
The family in the NAT expression is basically completely useless since we have it available during runtime anyway. Nevertheless it is used to decide the NAT family, so at least validate it properly. As we don't support cross-family NAT, it needs to match the family of the table the expression exists in. Unfortunately we can't remove it completely since we need to dump it for userspace (*sigh*), so at least reduce the memory waste. Additionally clean up the module init function by removing useless temporary variables. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Patrick McHardy 提交于
Since we have the context available during destruction again, we can remove the family from the private structure. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Patrick McHardy 提交于
Since we have the context available again, we can restore notifications for destruction of anonymous sets. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Patrick McHardy 提交于
In order to fix set destruction notifications and get rid of unnecessary members in private data structures, pass the context to expressions' destructor functions again. In order to do so, replace various members in the nft_rule_trans structure by the full context. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Patrick McHardy 提交于
The context argument logically comes first, and this is what every other function dealing with contexts does. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 07 3月, 2014 8 次提交
-
-
由 Patrick McHardy 提交于
The hash set type is very broken and was never meant to be merged in this state. Missing RCU synchronization on element removal, leaking chain refcounts when used as a verdict map, races during lookups, a fixed table size are probably just some of the problems. Luckily it is currently never chosen by the kernel when the rbtree type is also available. Rewrite it to be usable. The new implementation supports automatic hash table resizing using RCU, based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing For RCU-Protected Hash Tables" described in [1]. Resizing doesn't require a second list head in the elements, it works by chosing a hash function that remaps elements to a predictable set of buckets, only resizing by integral factors and - during expansion: linking new buckets to the old bucket that contains elements for any of the new buckets, thereby creating imprecise chains, then incrementally seperating the elements until the new buckets only contain elements that hash directly to them. - during shrinking: linking the hash chains of all old buckets that hash to the same new bucket to form a single chain. Expansion requires at most the number of elements in the longest hash chain grace periods, shrinking requires a single grace period. Due to the requirement of having hash chains/elements linked to multiple buckets during resizing, homemade single linked lists are used instead of the existing list helpers, that don't support this in a clean fashion. As a side effect, the amount of memory required per element is reduced by one pointer. Expansion is triggered when the load factors exceeds 75%, shrinking when the load factor goes below 30%. Both operations are allowed to fail and will be retried on the next insertion or removal if their respective conditions still hold. [1] http://dl.acm.org/citation.cfm?id=2002181.2002192Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
Netfilter expectations are protected with the same lock as conntrack entries (nf_conntrack_lock). This patch split out expectations locking to use it's own lock (nf_conntrack_expect_lock). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
Preparation for disconnecting the nf_conntrack_lock from the expectations code. Once the nf_conntrack_lock is lifted, a race condition is exposed. The expectations master conntrack exp->master, can race with delete operations, as the refcnt increment happens too late in init_conntrack(). Race is against other CPUs invoking ->destroy() (destroy_conntrack()), or nf_ct_delete() (via timeout or early_drop()). Avoid this race in nf_ct_find_expectation() by using atomic_inc_not_zero(), and checking if nf_ct_is_dying() (path via nf_ct_delete()). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
One spinlock per cpu to protect dying/unconfirmed/template special lists. (These lists are now per cpu, a bit like the untracked ct) Add a @cpu field to nf_conn, to make sure we hold the appropriate spinlock at removal time. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jesper Dangaard Brouer 提交于
Changes while reading through the netfilter code. Added hint about how conntrack nf_conn refcnt is accessed. And renamed repl_hash to reply_hash for readability Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Tingwei Liu 提交于
Add whitespace after operator and put open brace { on the previous line Cc: Tingwei Liu <liutingwei@hisense.com> Cc: lvs-devel@vger.kernel.org Signed-off-by: NTingwei Liu <tingw.liu@gmail.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Andi Kleen 提交于
const __read_mostly does not make any sense, because const data is already read-only. Remove the __read_mostly for the ipvs genl_ops. This avoids a LTO section conflict compile problem. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Patrick McHardy <kaber@trash.net> Cc: lvs-devel@vger.kernel.org Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
- 06 3月, 2014 5 次提交
-
-
由 Josh Hunt 提交于
Adds a new property for hash set types, where if a set is created with the 'forceadd' option and the set becomes full the next addition to the set may succeed and evict a random entry from the set. To keep overhead low eviction is done very simply. It checks to see which bucket the new entry would be added. If the bucket's pos value is non-zero (meaning there's at least one entry in the bucket) it replaces the first entry in the bucket. If pos is zero, then it continues down the normal add process. This property is useful if you have a set for 'ban' lists where it may not matter if you release some entries from the set early. Signed-off-by: NJosh Hunt <johunt@akamai.com> Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
由 Ilia Mirkin 提交于
Commit 1785e8f4 ("netfiler: ipset: Add net namespace for ipset") moved the initialization print into net_init, which can get called a lot due to namespaces. Move it back into init, reduce to pr_info. Signed-off-by: NIlia Mirkin <imirkin@alum.mit.edu> Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
由 Vytas Dauksa 提交于
Introduce packet mark mask for hash:ip,mark data type. This allows to set mark bit filter for the ip set. Change-Id: Id8dd9ca7e64477c4f7b022a1d9c1a5b187f1c96e Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
由 Vytas Dauksa 提交于
Introduce packet mark support with new ip,mark hash set. This includes userspace and kernelspace code, hash:ip,mark set tests and man page updates. The intended use of ip,mark set is similar to the ip:port type, but for protocols which don't use a predictable port number. Instead of port number it matches a firewall mark determined by a layer 7 filtering program like opendpi. As well as allowing or blocking traffic it will also be used for accounting packets and bytes sent for each protocol. Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
由 Fengguang Wu 提交于
net/netfilter/ipset/ip_set_hash_netnet.c:115:8-9: WARNING: return of 0/1 in function 'hash_netnet4_data_list' with return type bool /c/kernel-tests/src/cocci/net/netfilter/ipset/ip_set_hash_netnet.c:338:8-9: WARNING: return of 0/1 in function 'hash_netnet6_data_list' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-