- 28 10月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
nfnetlink_bind request_module()s all the time as nfnetlink_get_subsys() shifts the argument by 8 to obtain the subsys id. So using type instead of type << 8 always returns NULL. Fixes: 03292745 ("netlink: add nlk->netlink_bind hook for module auto-loading") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 27 10月, 2015 1 次提交
-
-
由 Munehisa Kamata 提交于
Commit 8b13eddf ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") has introduced a trivial logic change which can result in the following crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffa033002d>] nf_nat_redirect_ipv4+0x2d/0xa0 [nf_nat_redirect] PGD 3ba662067 PUD 3ba661067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: ipv6(E) xt_REDIRECT(E) nf_nat_redirect(E) xt_tcpudp(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) ip_tables(E) x_tables(E) binfmt_misc(E) xfs(E) libcrc32c(E) evbug(E) evdev(E) psmouse(E) i2c_piix4(E) i2c_core(E) acpi_cpufreq(E) button(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 0 PID: 2536 Comm: ip Tainted: G E 4.1.7-15.23.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015 task: ffff8800eb438000 ti: ffff8803ba664000 task.ti: ffff8803ba664000 [...] Call Trace: <IRQ> [<ffffffffa0334065>] redirect_tg4+0x15/0x20 [xt_REDIRECT] [<ffffffffa02e2e99>] ipt_do_table+0x2b9/0x5e1 [ip_tables] [<ffffffffa0328045>] iptable_nat_do_chain+0x25/0x30 [iptable_nat] [<ffffffffa031777d>] nf_nat_ipv4_fn+0x13d/0x1f0 [nf_nat_ipv4] [<ffffffffa0328020>] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat] [<ffffffffa031785e>] nf_nat_ipv4_in+0x2e/0x90 [nf_nat_ipv4] [<ffffffffa03280a5>] iptable_nat_ipv4_in+0x15/0x20 [iptable_nat] [<ffffffff81449137>] nf_iterate+0x57/0x80 [<ffffffff814491f7>] nf_hook_slow+0x97/0x100 [<ffffffff814504d4>] ip_rcv+0x314/0x400 unsigned int nf_nat_redirect_ipv4(struct sk_buff *skb, ... { ... rcu_read_lock(); indev = __in_dev_get_rcu(skb->dev); if (indev != NULL) { ifa = indev->ifa_list; newdst = ifa->ifa_local; <--- } rcu_read_unlock(); ... } Before the commit, 'ifa' had been always checked before access. After the commit, however, it could be accessed even if it's NULL. Interestingly, this was once fixed in 2003. http://marc.info/?l=netfilter-devel&m=106668497403047&w=2 In addition to the original one, we have seen the crash when packets that need to be redirected somehow arrive on an interface which hasn't been yet fully configured. This change just reverts the logic to the old behavior to avoid the crash. Fixes: 8b13eddf ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") Signed-off-by: NMunehisa Kamata <kamatam@amazon.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 22 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
iptables -I INPUT ... -j TEE --gateway 10.1.2.3 <crash> because --oif was not specified tee_tg_check() sets ->priv pointer to NULL in this case. Fixes: bbde9fc1 ("netfilter: factor out packet duplication for IPv4/IPv6") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 17 10月, 2015 1 次提交
-
-
由 Nikolay Borisov 提交于
Commit 00590fdd introduced RCU locking in list type and in doing so introduced a memory allocation in list_set_add, which is done in an atomic context, due to the fact that ipset rcu list modifications are serialised with a spin lock. The reason why we can't use a mutex is that in addition to modifying the list with ipset commands, it's also being modified when a particular ipset rule timeout expires aka garbage collection. This gc is triggered from set_cleanup_entries, which in turn is invoked from a timer thus requiring the lock to be bh-safe. Concretely the following call chain can lead to "sleeping function called in atomic context" splat: call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). And since GFP_KERNEL allows initiating direct reclaim thus potentially sleeping in the allocation path. To fix the issue change the allocation type to GFP_ATOMIC, to correctly reflect that it is occuring in an atomic context. Fixes: 00590fdd ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: NNikolay Borisov <kernel@kyup.com> Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 13 10月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: 085db2c0 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 17 9月, 2015 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
The nf_log_unregister() function needs to call synchronize_rcu() to make sure that the objects are not dereferenced anymore on module removal. Fixes: 5962815a ("netfilter: nf_log: use an array of loggers instead of list") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 15 9月, 2015 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
Fix lookup of existing match/target structures in the corresponding list by skipping the family check if NFPROTO_UNSPEC is used. This is resulting in the allocation and insertion of one match/target structure for each use of them. So this not only bloats memory consumption but also severely affects the time to reload the ruleset from the iptables-compat utility. After this patch, iptables-compat-restore and iptables-compat take almost the same time to reload large rulesets. Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 14 9月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
like nf_log_unset, nf_log_unregister must not reset the list of loggers. Otherwise, a call to nf_log_unregister() will render loggers of other nf protocols unusable: iptables -A INPUT -j LOG modprobe nf_log_arp ; rmmod nf_log_arp iptables -A INPUT -j LOG iptables: No chain/target/match by that name Fixes: 30e0c6a6 ("netfilter: nf_log: prepare net namespace support for loggers") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 10 9月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
When netlink mmap on receive side is the consumer of nf queue data, it can happen that in some edge cases, we write skb shared info into the user space mmap buffer: Assume a possible rx ring frame size of only 4096, and the network skb, which is being zero-copied into the netlink skb, contains page frags with an overall skb->len larger than the linear part of the netlink skb. skb_zerocopy(), which is generic and thus not aware of the fact that shared info cannot be accessed for such skbs then tries to write and fill frags, thus leaking kernel data/pointers and in some corner cases possibly writing out of bounds of the mmap area (when filling the last slot in the ring buffer this way). I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has an advertised length larger than 4096, where the linear part is visible at the slot beginning, and the leaked sizeof(struct skb_shared_info) has been written to the beginning of the next slot (also corrupting the struct nl_mmap_hdr slot header incl. status etc), since skb->end points to skb->data + ring->frame_size - NL_MMAP_HDRLEN. The fix adds and lets __netlink_alloc_skb() take the actual needed linear room for the network skb + meta data into account. It's completely irrelevant for non-mmaped netlink sockets, but in case mmap sockets are used, it can be decided whether the available skb_tailroom() is really large enough for the buffer, or whether it needs to internally fallback to a normal alloc_skb(). >From nf queue side, the information whether the destination port is an mmap RX ring is not really available without extra port-to-socket lookup, thus it can only be determined in lower layers i.e. when __netlink_alloc_skb() is called that checks internally for this. I chose to add the extra ldiff parameter as mmap will then still work: We have data_len and hlen in nfqnl_build_packet_message(), data_len is the full length (capped at queue->copy_range) for skb_zerocopy() and hlen some possible part of data_len that needs to be copied; the rem_len variable indicates the needed remaining linear mmap space. The only other workaround in nf queue internally would be after allocation time by f.e. cap'ing the data_len to the skb_tailroom() iff we deal with an mmap skb, but that would 1) expose the fact that we use a mmap skb to upper layers, and 2) trim the skb where we otherwise could just have moved the full skb into the normal receive queue. After the patch, in my test case the ring slot doesn't fit and therefore shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and thus needs to be picked up via recv(). Fixes: 3ab1f683 ("nfnetlink: add support for memory mapped netlink") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 9月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: 308ac914 ("netfilter: nf_conntrack: push zone object into functions") Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
Commit 0838aa7f ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: 0838aa7f ("netfilter: fix netns dependencies with conntrack templates") Reported-by: NBrad Jackson <bjackson0971@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 29 8月, 2015 4 次提交
-
-
由 Pablo Neira Ayuso 提交于
The convention in nfnetlink is to use network byte order in every header field as well as in the attribute payload. The initial version of the batching infrastructure assumes that res_id comes in host byte order though. The only client of the batching infrastructure is nf_tables, so let's add a workaround to address this inconsistency. We currently have 11 nfnetlink subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem 2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias of nf_tables from the nfnetlink batching path when interpreting the res_id field. Based on original patch from Florian Westphal. Reported-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Elad Raz 提交于
In continue to proposed Vinson Lee's post [1], this patch fixes compilation issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed unions causes compilation error in gcc 4.4.x. References Visible links [1] https://lkml.org/lkml/2015/7/5/74Signed-off-by: NElad Raz <eladr@mellanox.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers) -> remove __pure annotation. ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16 -> switch ntohs to htons and vice versa. netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static? -> delete it, got removed net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32 -> Use __be32 instead of u32. Tested with objdiff that these changes do not affect generated code. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jozsef Kadlecsik 提交于
Dave Jones reported that KASan detected out of bounds access in hash:net* types: [ 23.139532] ================================================================== [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58 [ 23.152937] Write of size 4 by task ipset/457 [ 23.159742] ============================================================================= [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 23.173641] ----------------------------------------------------------------------------- [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456 [ 23.201836] __slab_alloc.constprop.66+0x554/0x620 [ 23.208994] __kmalloc+0x2f2/0x360 [ 23.216105] hash_net_create+0x16a/0x470 [ 23.223238] ip_set_create+0x3e6/0x740 [ 23.230343] nfnetlink_rcv_msg+0x599/0x640 [ 23.237454] netlink_rcv_skb+0x14f/0x190 [ 23.244533] nfnetlink_rcv+0x3f6/0x790 [ 23.251579] netlink_unicast+0x272/0x390 [ 23.258573] netlink_sendmsg+0x5a1/0xa50 [ 23.265485] SYSC_sendto+0x1da/0x2c0 [ 23.272364] SyS_sendto+0xe/0x10 [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f The bug is fixed in the patch and the testsuite is extended in ipset to check cidr handling more thoroughly. Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
- 28 8月, 2015 2 次提交
-
-
由 Joe Stringer 提交于
Add functions to change connlabel length into nf_conntrack_labels.c so they may be reused by other modules like OVS and nftables without needing to jump through xt_match_check() hoops. Suggested-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NJoe Stringer <joestringer@nicira.com> Acked-by: NFlorian Westphal <fw@strlen.de> Acked-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Stringer 提交于
The following patches will reuse this code from OVS. Signed-off-by: NJoe Stringer <joestringer@nicira.com> Acked-by: NPravin B Shelar <pshelar@nicira.com> Acked-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 8月, 2015 5 次提交
-
-
由 Pablo Neira Ayuso 提交于
Instead of IS_ENABLED(CONFIG_IPV6), otherwise we hit: et/built-in.o: In function `tee_tg6': >> xt_TEE.c:(.text+0x6cd8c): undefined reference to `nf_dup_ipv6' when: CONFIG_IPV6=y CONFIG_NF_DUP_IPV4=y # CONFIG_NF_DUP_IPV6 is not set CONFIG_NETFILTER_XT_TARGET_TEE=y Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Julian Anastasov 提交于
- mcast_group: configure the multicast address, now IPv6 is supported too - mcast_port: configure the multicast port - mcast_ttl: configure the multicast TTL/HOP_LIMIT Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
Allow setups with large MTU to send large sync packets by adding sync_maxlen parameter. The default value is now based on MTU but no more than 1500 for compatibility reasons. To avoid problems if MTU changes allow fragmentation by sending packets with DF=0. Problem reported by Dan Carpenter. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Julian Anastasov 提交于
When the sync damon is started we need to hold rtnl lock while calling ip_mc_join_group. Currently, we have a wrong locking order because the correct one is rtnl_lock->__ip_vs_mutex. It is implied from the usage of __ip_vs_mutex in ip_vs_dst_event() which is called under rtnl lock during NETDEV_* notifications. Fix the problem by calling rtnl_lock early only for the start_sync_thread call. As a bonus this fixes the usage __dev_get_by_name which was not called under rtnl lock. This patch actually extends and depends on commit 54ff9ef3 ("ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}"). Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Raducu Deaconu 提交于
The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. Signed-off-by: NRaducu Deaconu <rhadoo.io88@gmail.com> Acked-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
- 19 8月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
make payload expression aware of the fact that VLAN offload may have removed a vlan header. When we encounter tagged skb, transparently insert the tag into the register so that vlan header matching can work without userspace being aware of offload features. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 18 8月, 2015 3 次提交
-
-
由 Tom Herbert 提交于
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates the checksum field carries a pseudo header. This argument should be a boolean instead of an int. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This work adds the possibility of deriving the zone id from the skb->mark field in a scalable manner. This allows for having only a single template serving hundreds/thousands of different zones, for example, instead of the need to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. In order to not add/expose an extra ct->status bit, the zone structure has been extended to carry a flag for deriving the mark. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Daniel Borkmann 提交于
This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct->mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a "stable" value that would be equal for both directions at all times, f.e. if only zone->id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 | netperf |--+ +----------------+ | CT zone := mark [ORIGINAL] [ip,sport] := X +--------------+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--------------+ +---------------+ | +--- tenant-2 ---+ | ~~~|~~~ | netperf |--+ +-----------+ | +----------------+ mark := 2 | netserver |------ ... + [ip,sport] := X +-----------+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 11 8月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
This patch replaces the zone id which is pushed down into functions with the actual zone object. It's a bigger one-time change, but needed for later on extending zones with a direction parameter, and thus decoupling this additional information from all call-sites. No functional changes in this patch. The default zone becomes a global const object, namely nf_ct_zone_dflt and will be returned directly in various cases, one being, when there's f.e. no zoning support. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 07 8月, 2015 10 次提交
-
-
由 Andreas Schultz 提交于
- Move the nfnl_acct_list into the network namespace, initialize and destroy it per namespace - Keep track of refcnt on nfacct objects, the old logic does not longer work with a per namespace list - Adjust xt_nfacct to pass the namespace when registring objects Signed-off-by: NAndreas Schultz <aschultz@tpip.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of limiting. Contrary to per-packet limiting, the cost is calculated from the packet path since this depends on the packet length. The burst attribute indicates the number of bytes in which the rate can be exceeded. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
The cost per packet can be calculated from the control plane path since this doesn't ever change. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch adds the burst parameter. This burst indicates the number of packets that can exceed the limit. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch prepares the introduction of per-byte limiting. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Rework the limit expression to use a token-based limiting approach that refills the bucket gradually. The tokens are calculated at nanosecond granularity instead jiffies to improve precision. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
To prepare introduction of bytes ratelimit support. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Extracted from the xtables TEE target. This creates two new modules for IPv4 and IPv6 that are shared between the TEE target and the new nf_tables dup expressions. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch converts the existing seqlock to per-cpu counters. Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Suggested-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 05 8月, 2015 1 次提交
-
-
由 Joe Stringer 提交于
The flags were ignored for this function when it was introduced. Also fix the style problem in kzalloc. Fixes: 0838aa7f (netfilter: fix netns dependencies with conntrack templates) Signed-off-by: NJoe Stringer <joestringer@nicira.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 30 7月, 2015 2 次提交
-
-
由 Dan Carpenter 提交于
We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: 0838aa7f ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Michal Kubeček 提交于
Currently nf_conntrack_proto_sctp module handles only packets between primary addresses used to establish the connection. Any packets between secondary addresses are classified as invalid so that usual firewall configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to establish a new conntrack would allow traffic between secondary addresses to pass through. A more sophisticated solution based on the addresses advertised in the initial handshake (and possibly also later dynamic address addition and removal) would be much harder to implement. Moreover, in general we cannot assume to always see the initial handshake as it can be routed through a different path. The patch adds two new conntrack states: SCTP_CONNTRACK_HEARTBEAT_SENT - a HEARTBEAT chunk seen but not acked SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK State transition rules: - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that the behaviour changes as little as possible) - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED does, except the resulting state is HEARTBEAT_ACKED rather than ESTABLISHED - previously existing states except NONE are preserved when HEARTBEAT or HEARTBEAT-ACK is seen - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT and to CLOSED on HEARTBEAT-ACK - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the reply direction - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and HEARTBEAT-ACK otherwise Normally, vtag is set from the INIT chunk for the reply direction and from the INIT-ACK chunk for the originating direction (i.e. each of these defines vtag value for the opposite direction). For secondary conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have seen them, we would need to connect two different conntracks. Therefore simplified logic is applied: vtag of first packet in each direction (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is saved and all following packets in that direction are compared with this saved value. While INIT and INIT-ACK define vtag for the opposite direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always for their direction. Default timeout values for new states are HEARTBEAT_SENT: 30 seconds (default hb_interval) HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto) (We cannot expect to see the shutdown sequence so that, unlike ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.) Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-