提交 · d4ef38354120d873f5db14ca6e13d051ef4ab068 · openanolis / cloud-kernel

08 4月, 2017 1 次提交

netfilter: Remove exceptional & on function name · d4ef3835

由 Arushi Singhal 提交于 4月 02, 2017

Remove & from function pointers to conform to the style found elsewhere
in the file. Done using the following semantic patch

// <smpl>
@r@
identifier f;
@@

f(...) { ... }
@@
identifier r.f;
@@

- &f
+ f
// </smpl>
Signed-off-by: NArushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d4ef3835

07 4月, 2017 10 次提交

net: netfilter: Use list_{next/prev}_entry instead of list_entry · cbbb40e2

由 simran singhal 提交于 3月 29, 2017

This patch replace list_entry with list_prev_entry as it makes the
code more clear to read.
Signed-off-by: Nsimran singhal <singhalsimran0@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cbbb40e2

netfilter: Use seq_puts()/seq_putc() where possible · cdec2685

由 simran singhal 提交于 3月 29, 2017

For string without format specifiers, use seq_puts(). For
seq_printf("\n"), use seq_putc('\n').
Signed-off-by: Nsimran singhal <singhalsimran0@gmail.com>
Acked-by: NSimon Horman <horms+renesas@verge.net.au>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cdec2685

netfilter: Remove unnecessary cast on void pointer · 68ad546a

由 simran singhal 提交于 3月 29, 2017

The following Coccinelle script was used to detect this:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T*)x)->f
|

- (T*)
  e
)

Unnecessary parantheses are also remove.
Signed-off-by: Nsimran singhal <singhalsimran0@gmail.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

68ad546a

netfilter: Add nfnl_msg_type() helper function · dedb67c4

由 Pablo Neira Ayuso 提交于 3月 28, 2017

Add and use nfnl_msg_type() function to replace opencoded nfnetlink
message type. I suggested this change, Arushi Singhal made an initial
patch to address this but was missing several spots.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

dedb67c4

netfilter: ctnetlink: Expectations must have a conntrack helper area · 2c62e0bc

由 Gao Feng 提交于 3月 28, 2017

The expect check function __nf_ct_expect_check() asks the master_help is
necessary. So it is unnecessary to go ahead in ctnetlink_alloc_expect
when there is no help.

Actually the commit bc01befd ("netfilter: ctnetlink: add support for
user-space expectation helpers") permits ctnetlink create one expect
even though there is no master help. But the latter commit 3d058d7b
("netfilter: rework user-space expectation helper support") disables it
again.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2c62e0bc

netfilter: nat: avoid use of nf_conn_nat extension · 6e699867

由 Florian Westphal 提交于 3月 28, 2017

successful insert into the bysource hash sets IPS_SRC_NAT_DONE status bit
so we can check that instead of presence of nat extension which requires
extra deref.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6e699867

netfilter: nat: nf_nat_mangle_{udp,tcp}_packet returns boolean · cba81cc4

由 Gao Feng 提交于 3月 27, 2017

nf_nat_mangle_{udp,tcp}_packet() returns int. However, it is used as
bool type in many spots. Fix this by consistently handle this return
value as a boolean.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cba81cc4

netfilter: nf_ct_expect: Add nf_ct_remove_expect() · ec0e3f01

由 Gao Feng 提交于 3月 27, 2017

When remove one expect, it needs three statements. And there are
multiple duplicated codes in current code. So add one common function
nf_ct_remove_expect to consolidate this.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ec0e3f01

netfilter: expect: Make sure the max_expected limit is effective · 92f73221

由 Gao Feng 提交于 3月 24, 2017

Because the type of expecting, the member of nf_conn_help, is u8, it
would overflow after reach U8_MAX(255). So it doesn't work when we
configure the max_expected exceeds 255 with expect policy.

Now add the check for max_expected. Return the -EINVAL when it exceeds
the limit.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

92f73221

netfilter: nf_tables: add nft_is_base_chain() helper · f323d954

由 Pablo Neira Ayuso 提交于 3月 20, 2017

This new helper function allows us to check if this is a basechain.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f323d954

29 3月, 2017 1 次提交

netfilter: nfnetlink_queue: fix secctx memory leak · 77c1c03c

由 Liping Zhang 提交于 3月 28, 2017

We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.

Fixes: ef493bd9 ("netfilter: nfnetlink_queue: add security context information")
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

77c1c03c

27 3月, 2017 3 次提交

netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister · 9c3f3794

由 Liping Zhang 提交于 3月 25, 2017

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.

But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.

For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.

For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.

I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)

Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
  # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
        Also note, in the userspace we only add a 20s' delay, then
        reinject the syn packet to the kernel:
  # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
  # iptables -F FORWARD
  # rmmod ipt_SYNPROXY
  # rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.

Now you will see the panic like this:
  kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
  Call Trace:
   ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
   init_conntrack+0x12b/0x600 [nf_conntrack]
   nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
   ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
   nf_reinject+0x104/0x270
   nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
   ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
   ? nla_parse+0xa0/0x100
   nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
   [...]

One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.

But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.

Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9c3f3794

netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table · 83d90219

由 Liping Zhang 提交于 3月 25, 2017

The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
So it's possible that one CPU is walking the nf_ct_helper_hash for
cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
at the same time. This is dangrous, and may cause use after free error.

Note, delete operation will flush all cthelpers added via nfnetlink, so
using rcu to do protect is not easy.

Now introduce a dummy list to record all the cthelpers added via
nfnetlink, then we can walk the dummy list instead of walking the
nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83d90219

netfilter: invoke synchronize_rcu after set the _hook_ to NULL · 3b7dabf0

由 Liping Zhang 提交于 3月 25, 2017

Otherwise, another CPU may access the invalid pointer. For example:
    CPU0                CPU1
     -              rcu_read_lock();
     -              pfunc = _hook_;
  _hook_ = NULL;          -
  mod unload              -
     -                 pfunc(); // invalid, panic
     -             rcu_read_unlock();

So we must call synchronize_rcu() to wait the rcu reader to finish.

Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
by later nf_conntrack_helper_unregister, but I'm inclined to add a
explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
on such obscure assumptions is not a good idea.

Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
remove it too.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3b7dabf0

22 3月, 2017 2 次提交

netfilter: nfnl_cthelper: Fix memory leak · f83bf8da

由 Jeffy Chen 提交于 3月 21, 2017

We have memory leaks of nf_conntrack_helper & expect_policy.
Signed-off-by: NJeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f83bf8da

netfilter: nfnl_cthelper: fix runtime expectation policy updates · 2c422257

由 Pablo Neira Ayuso 提交于 3月 21, 2017

We only allow runtime updates of expectation policies for timeout and
maximum number of expectations, otherwise reject the update.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NLiping Zhang <zlpnobody@gmail.com>

2c422257

21 3月, 2017 1 次提交

netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max · ae5c6821

由 Liping Zhang 提交于 3月 19, 2017

The helper->expect_class_max must be set to the total number of
expect_policy minus 1, since we will use the statement "if (class >
helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in
ctnetlink_alloc_expect.

So for compatibility, set the helper->expect_class_max to the
NFCTH_POLICY_SET_NUM attr's value minus 1.

Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero.
1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);";
2. we cannot set the helper->expect_class_max to a proper value.

So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to
the userspace.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ae5c6821

20 3月, 2017 1 次提交

netfilter: fix the warning on unused refcount variable · 4485a841

由 Reshetova, Elena 提交于 3月 20, 2017

net/netfilter/nfnetlink_acct.c: In function 'nfnl_acct_try_del':
net/netfilter/nfnetlink_acct.c:329:15: warning: unused variable 'refcount' [-Wunused-variable]
unsigned int refcount;
             ^

Fixes: b54ab92b ("netfilter: refcounter conversions")
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4485a841

17 3月, 2017 1 次提交

netfilter: refcounter conversions · b54ab92b

由 Reshetova, Elena 提交于 3月 16, 2017

refcount_t type and corresponding API (see include/linux/refcount.h)
should be used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b54ab92b

16 3月, 2017 2 次提交

ipvs: remove an annoying printk in netns init · 864e91ca

由 Cong Wang 提交于 12月 09, 2016

At most it is used for debugging purpose, but I don't think
it is even useful for debugging, just remove it.
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

864e91ca

netfilter: nft_ct: do cleanup work when NFTA_CT_DIRECTION is invalid · 4494dbc6

由 Liping Zhang 提交于 3月 15, 2017

We should jump to invoke __nft_ct_set_destroy() instead of just
return error.

Fixes: edee4f1e ("netfilter: nft_ct: add zone id set support")
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4494dbc6

14 3月, 2017 4 次提交

netfilter: nft_set_rbtree: use per-set rwlock to improve the scalability · 03e5fd0e

由 Liping Zhang 提交于 3月 12, 2017

Karel Rericha reported that in his test case, ICMP packets going through
boxes had normally about 5ms latency. But when running nft, actually
listing the sets with interval flags, latency would go up to 30-100ms.
This was observed when router throughput is from 600Mbps to 2Gbps.

This is because we use a single global spinlock to protect the whole
rbtree sets, so "dumping sets" will race with the "key lookup" inevitably.
But actually they are all _readers_, so it's ok to convert the spinlock
to rwlock to avoid competition between them. Also use per-set rwlock since
each set is independent.
Reported-by: NKarel Rericha <karel@unitednetworks.cz>
Tested-by: NKarel Rericha <karel@unitednetworks.cz>
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

03e5fd0e

netfilter: limit: use per-rule spinlock to improve the scalability · 2cb4bbd7

由 Liping Zhang 提交于 3月 11, 2017

The limit token is independent between each rules, so there's no
need to use a global spinlock.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2cb4bbd7

netfilter: nf_conntrack: reduce resolve_normal_ct args · fc09e4a7

由 Florian Westphal 提交于 3月 09, 2017

also mark init_conntrack noinline, in most cases resolve_normal_ct will
find an existing conntrack entry.

text    data     bss     dec     hex filename
16735    5707     176   22618    585a net/netfilter/nf_conntrack_core.o
16687    5707     176   22570    582a net/netfilter/nf_conntrack_core.o
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

fc09e4a7

Revert "netfilter: nf_tables: add flush field to struct nft_set_iter" · 04166f48

由 Pablo Neira Ayuso 提交于 3月 13, 2017

This reverts commit 1f48ff6c.

This patch is not required anymore now that we keep a dummy list of
set elements in the bitmap set implementation, so revert this before
we forget this code has no clients.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

04166f48

13 3月, 2017 7 次提交

netfilter: nft_fib: Support existence check · 055c4b34

由 Phil Sutter 提交于 3月 10, 2017

Instead of the actual interface index or name, set destination register
to just 1 or 0 depending on whether the lookup succeeded or not if
NFTA_FIB_F_PRESENT was set in userspace.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

055c4b34

netfilter: nft_ct: add helper set support · 1a64edf5

由 Florian Westphal 提交于 3月 08, 2017

this allows to assign connection tracking helpers to
connections via nft objref infrastructure.

The idea is to first specifiy a helper object:

 table ip filter {
    ct helper some-name {
      type "ftp"
      protocol tcp
      l3proto ip
    }
 }

and then assign it via

nft add ... ct helper set "some-name"

helper assignment works for new conntracks only as we cannot expand the
conntrack extension area once it has been committed to the main conntrack
table.

ipv4 and ipv6 protocols are tracked stored separately so
we can also handle families that observe both ipv4 and ipv6 traffic.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1a64edf5

netfilter: provide nft_ctx in object init function · 84fba055

由 Florian Westphal 提交于 3月 08, 2017

this is needed by the upcoming ct helper object type --
we'd like to be able use the table family (ip, ip6, inet) to figure
out which helper has to be requested.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

84fba055

netfilter: nft_set_bitmap: keep a list of dummy elements · e920dde5

由 Pablo Neira Ayuso 提交于 3月 10, 2017

Element comments may come without any prior set flag, so we have to keep
a list of dummy struct nft_set_ext to keep this information around. This
is only useful for set dumps to userspace. From the packet path, this
set type relies on the bitmap representation. This patch simplifies the
logic since we don't need to allocate the dummy nft_set_ext structure
anymore on the fly at the cost of increasing memory consumption because
of the list of dummy struct nft_set_ext.

Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e920dde5

netfilter: Force fake conntrack entry to be at least 8 bytes aligned · 170a1fb9

由 Steven Rostedt (VMware) 提交于 3月 11, 2017

Since the nfct and nfctinfo have been combined, the nf_conn structure
must be at least 8 bytes aligned, as the 3 LSB bits are used for the
nfctinfo. But there's a fake nf_conn structure to denote untracked
connections, which is created by a PER_CPU construct. This does not
guarantee that it will be 8 bytes aligned and can break the logic in
determining the correct nfctinfo.

I triggered this on a 32bit machine with the following error:

BUG: unable to handle kernel NULL pointer dereference at 00000af4
IP: nf_ct_deliver_cached_events+0x1b/0xfb
*pdpt = 0000000031962001 *pde = 0000000000000000

Oops: 0000 [#1] SMP
[Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport
  OK  ]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task: c126ec00 task.stack: c1258000
EIP: nf_ct_deliver_cached_events+0x1b/0xfb
EFLAGS: 00010202 CPU: 0
EAX: 0021cd01 EBX: 00000000 ECX: 27b0c767 EDX: 32bcb17a
ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 80050033 CR2: 00000af4 CR3: 309a0440 CR4: 001406f0
Call Trace:
 <SOFTIRQ>
 ? ipv6_skip_exthdr+0xac/0xcb
 ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6]
 nf_hook_slow+0x22/0xc7
 nf_hook+0x9a/0xad [ipv6]
 ? ip6t_do_table+0x356/0x379 [ip6_tables]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 ip6_output+0xee/0x107 [ipv6]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 dst_output+0x36/0x4d [ipv6]
 NF_HOOK.constprop.37+0xb2/0xba [ipv6]
 ? icmp6_dst_alloc+0x2c/0xfd [ipv6]
 ? local_bh_enable+0x14/0x14 [ipv6]
 mld_sendpack+0x1c5/0x281 [ipv6]
 ? mark_held_locks+0x40/0x5c
 mld_ifc_timer_expire+0x1f6/0x21e [ipv6]
 call_timer_fn+0x135/0x283
 ? detach_if_pending+0x55/0x55
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 __run_timers+0x111/0x14b
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 run_timer_softirq+0x1c/0x36
 __do_softirq+0x185/0x37c
 ? test_ti_thread_flag.constprop.19+0xd/0xd
 do_softirq_own_stack+0x22/0x28
 </SOFTIRQ>
 irq_exit+0x5a/0xa4
 smp_apic_timer_interrupt+0x2a/0x34
 apic_timer_interrupt+0x37/0x3c

By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte
alignment as all cache line sizes are at least 8 bytes or more.

Fixes: a9e419dc ("netfilter: merge ctinfo into nfct pointer storage area")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

170a1fb9

netfilter: nf_tables: fix mismatch in big-endian system · 10596608

由 Liping Zhang 提交于 3月 08, 2017

Currently, there are two different methods to store an u16 integer to
the u32 data register. For example:
  u32 *dest = &regs->data[priv->dreg];
  1. *dest = 0; *(u16 *) dest = val_u16;
  2. *dest = val_u16;

For method 1, the u16 value will be stored like this, either in
big-endian or little-endian system:
  0          15           31
  +-+-+-+-+-+-+-+-+-+-+-+-+
  |   Value   |     0     |
  +-+-+-+-+-+-+-+-+-+-+-+-+

For method 2, in little-endian system, the u16 value will be the same
as listed above. But in big-endian system, the u16 value will be stored
like this:
  0          15           31
  +-+-+-+-+-+-+-+-+-+-+-+-+
  |     0     |   Value   |
  +-+-+-+-+-+-+-+-+-+-+-+-+

So later we use "memcmp(&regs->data[priv->sreg], data, 2);" to do
compare in nft_cmp, nft_lookup expr ..., method 2 will get the wrong
result in big-endian system, as 0~15 bits will always be zero.

For the similar reason, when loading an u16 value from the u32 data
register, we should use "*(u16 *) sreg;" instead of "(u16)*sreg;",
the 2nd method will get the wrong value in the big-endian system.

So introduce some wrapper functions to store/load an u8 or u16
integer to/from the u32 data register, and use them in the right
place.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

10596608

netfilter: nft_set_bitmap: fetch the element key based on the set->klen · fd89b23a

由 Liping Zhang 提交于 3月 06, 2017

Currently we just assume the element key as a u32 integer, regardless of
the set key length.

This is incorrect, for example, the tcp port number is only 16 bits.
So when we use the nft_payload expr to get the tcp dport and store
it to dreg, the dport will be stored at 0~15 bits, and 16~31 bits
will be padded with zero.

So the reg->data[dreg] will be looked like as below:
  0          15           31
  +-+-+-+-+-+-+-+-+-+-+-+-+
  | tcp dport |      0    |
  +-+-+-+-+-+-+-+-+-+-+-+-+
But for these big-endian systems, if we treate this register as a u32
integer, the element key will be larger than 65535, so the following
lookup in bitmap set will cause out of bound access.

Another issue is that if we add element with comments in bitmap
set(although the comments will be ignored eventually), the element will
vanish strangely. Because we treate the element key as a u32 integer, so
the comments will become the part of the element key, then the element
key will also be larger than 65535 and out of bound access will happen:
  # nft add element t s { 1 comment test }

Since set->klen is 1 or 2, it's fine to treate the element key as a u8 or
u16 integer.

Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

fd89b23a

09 3月, 2017 1 次提交

netfilter: nf_nat_sctp: fix ICMP packet to be dropped accidently · 8e05ba7f

由 Ying Xue 提交于 3月 04, 2017

Regarding RFC 792, the first 64 bits of the original SCTP datagram's
data could be contained in ICMP packet, such as:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             unused                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

However, according to RFC 4960, SCTP datagram header is as below:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Source Port Number        |     Destination Port Number   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Verification Tag                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           Checksum                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It means only the first three fields of SCTP header can be carried in
ICMP packet except for Checksum field.

At present in sctp_manip_pkt(), no matter whether the packet is ICMP or
not, it always calculates SCTP packet checksum. However, not only the
calculation of checksum is unnecessary for ICMP, but also it causes
another fatal issue that ICMP packet is dropped. The header size of
SCTP is used to identify whether the writeable length of skb is bigger
than skb->len through skb_make_writable() in sctp_manip_pkt(). But
when it deals with ICMP packet, skb_make_writable() directly returns
false as the writeable length of skb is bigger than skb->len.
Subsequently ICMP is dropped.

Now we correct this misbahavior. When sctp_manip_pkt() handles ICMP
packet, 8 bytes rather than the whole SCTP header size is used to check
if writeable length of skb is overflowed. Meanwhile, as it's meaningless
to calculate checksum when packet is ICMP, the computation of checksum
is ignored as well.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8e05ba7f

07 3月, 2017 5 次提交

netfilter: nf_tables: add nft_set_lookup() · c7a72e3f

由 Pablo Neira Ayuso 提交于 3月 06, 2017

This new function consolidates set lookup via either name or ID by
introducing a new nft_set_lookup() function. Replace existing spots
where we can use this too.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7a72e3f

netfilter: nf_tables: validate the expr explicitly after init successfully · c56e3956

由 Liping Zhang 提交于 3月 05, 2017

When we want to validate the expr's dependency or hooks, we must do two
things to accomplish it. First, write a X_validate callback function
and point ->validate to it. Second, call X_validate in init routine.
This is very common, such as fib, nat, reject expr and so on ...

It is a little ugly, since we will call X_validate in the expr's init
routine, it's better to do it in nf_tables_newexpr. So we can avoid to
do this again and again. After doing this, the second step listed above
is not useful anymore, remove them now.

Patch was tested by nftables/tests/py/nft-test.py and
nftables/tests/shell/run-tests.sh.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c56e3956

netfilter: nft_hash: support of symmetric hash · 3206cade

由 Laura Garcia Liebana 提交于 3月 02, 2017

This patch provides symmetric hash support according to source
ip address and port, and destination ip address and port.

For this purpose, the __skb_get_hash_symmetric() is used to
identify the flow as it uses FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
flag by default.

The new attribute NFTA_HASH_TYPE has been included to support
different types of hashing functions. Currently supported
NFT_HASH_JENKINS through jhash and NFT_HASH_SYM through symhash.

The main difference between both types are:
 - jhash requires an expression with sreg, symhash doesn't.
 - symhash supports modulus and offset, but not seed.

Examples:

 nft add rule ip nat prerouting ct mark set jhash ip saddr mod 2
 nft add rule ip nat prerouting ct mark set symhash mod 2

By default, jenkins hash will be used if no hash type is
provided for compatibility reasons.
Signed-off-by: NLaura Garcia Liebana <laura.garcia@zevenet.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3206cade

netfilter: nft_hash: rename nft_hash to nft_jhash · 511040ee

由 Laura Garcia Liebana 提交于 2月 23, 2017

This patch renames the local nft_hash structure and functions
to nft_jhash in order to prepare the nft_hash module code to
add new hash functions.
Signed-off-by: NLaura Garcia Liebana <laura.garcia@zevenet.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

511040ee

netfilter: nft_exthdr: Allow checking TCP option presence, too · 3c1fece8

由 Phil Sutter 提交于 2月 20, 2017

Honor NFT_EXTHDR_F_PRESENT flag so we check if the TCP option is
present.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3c1fece8

03 3月, 2017 1 次提交

netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails · 25e94a99

由 Pablo Neira Ayuso 提交于 3月 01, 2017

The underlying nlmsg_multicast() already sets sk->sk_err for us to
notify socket overruns, so we should not do anything with this return
value. So we just call nfnetlink_set_err() if:

1) We fail to allocate the netlink message.

or

2) We don't have enough space in the netlink message to place attributes,
   which means that we likely need to allocate a larger message.

Before this patch, the internal ESRCH netlink error code was propagated
to userspace, which is quite misleading. Netlink semantics mandate that
listeners just hit ENOBUFS if the socket buffer overruns.
Reported-by: NAlexander Alemayhu <alexander@alemayhu.com>
Tested-by: NAlexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

25e94a99

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功