1. 07 8月, 2015 3 次提交
  2. 30 7月, 2015 1 次提交
    • M
      netfilter: nf_ct_sctp: minimal multihoming support · d7ee3519
      Michal Kubeček 提交于
      Currently nf_conntrack_proto_sctp module handles only packets between
      primary addresses used to establish the connection. Any packets between
      secondary addresses are classified as invalid so that usual firewall
      configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to
      establish a new conntrack would allow traffic between secondary
      addresses to pass through. A more sophisticated solution based on the
      addresses advertised in the initial handshake (and possibly also later
      dynamic address addition and removal) would be much harder to implement.
      Moreover, in general we cannot assume to always see the initial
      handshake as it can be routed through a different path.
      
      The patch adds two new conntrack states:
      
        SCTP_CONNTRACK_HEARTBEAT_SENT  - a HEARTBEAT chunk seen but not acked
        SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK
      
      State transition rules:
      
      - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that
        the behaviour changes as little as possible)
      - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED
        does, except the resulting state is HEARTBEAT_ACKED rather than
        ESTABLISHED
      - previously existing states except NONE are preserved when HEARTBEAT or
        HEARTBEAT-ACK is seen
      - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT
        and to CLOSED on HEARTBEAT-ACK
      - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the
        reply direction
      - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and
        HEARTBEAT-ACK otherwise
      
      Normally, vtag is set from the INIT chunk for the reply direction and
      from the INIT-ACK chunk for the originating direction (i.e. each of
      these defines vtag value for the opposite direction). For secondary
      conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have
      seen them, we would need to connect two different conntracks. Therefore
      simplified logic is applied: vtag of first packet in each direction
      (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is
      saved and all following packets in that direction are compared with this
      saved value. While INIT and INIT-ACK define vtag for the opposite
      direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always
      for their direction.
      
      Default timeout values for new states are
      
        HEARTBEAT_SENT: 30 seconds (default hb_interval)
        HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto)
      
      (We cannot expect to see the shutdown sequence so that, unlike
      ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d7ee3519
  3. 23 7月, 2015 3 次提交
    • P
      netfilter: rename local nf_hook_list to hook_list · 3bbd14e0
      Pablo Neira Ayuso 提交于
      085db2c0 ("netfilter: Per network namespace netfilter hooks.") introduced a
      new nf_hook_list that is global, so let's avoid this overlap.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3bbd14e0
    • P
      netfilter: fix possible removal of wrong hook · 7181ebaf
      Pablo Neira Ayuso 提交于
      nf_unregister_net_hook() uses the nf_hook_ops fields as tuple to look up for
      the corresponding hook in the list. However, we may have two hooks with exactly
      the same configuration.
      
      This shouldn't be a problem for nftables since every new chain has an unique
      priv field set, but this may still cause us problems in the future, so better
      address this problem now by keeping a reference to the original nf_hook_ops
      structure to make sure we delete the right hook from nf_unregister_net_hook().
      
      Fixes: 085db2c0 ("netfilter: Per network namespace netfilter hooks.")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7181ebaf
    • P
      netfilter: nf_queue: fix nf_queue_nf_hook_drop() · 2385eb0c
      Pablo Neira Ayuso 提交于
      This function reacquires the rtnl_lock() which is already held by
      nf_unregister_hook().
      
      This can be triggered via: modprobe nf_conntrack_ipv4 && rmmod nf_conntrack_ipv4
      
      [  720.628746] INFO: task rmmod:3578 blocked for more than 120 seconds.
      [  720.628749]       Not tainted 4.2.0-rc2+ #113
      [  720.628752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  720.628754] rmmod           D ffff8800ca46fd58     0  3578   3571 0x00000080
      [...]
      [  720.628783] Call Trace:
      [  720.628790]  [<ffffffff8152ea0b>] schedule+0x6b/0x90
      [  720.628795]  [<ffffffff8152ecb3>] schedule_preempt_disabled+0x13/0x20
      [  720.628799]  [<ffffffff8152ff55>] mutex_lock_nested+0x1f5/0x380
      [  720.628803]  [<ffffffff81462622>] ? rtnl_lock+0x12/0x20
      [  720.628807]  [<ffffffff81462622>] ? rtnl_lock+0x12/0x20
      [  720.628812]  [<ffffffff81462622>] rtnl_lock+0x12/0x20
      [  720.628817]  [<ffffffff8148ab25>] nf_queue_nf_hook_drop+0x15/0x160
      [  720.628825]  [<ffffffff81488d48>] nf_unregister_net_hook+0x168/0x190
      [  720.628831]  [<ffffffff81488e24>] nf_unregister_hook+0x64/0x80
      [  720.628837]  [<ffffffff81488e60>] nf_unregister_hooks+0x20/0x30
      [...]
      
      Moreover, nf_unregister_net_hook() should only destroy the queue for this
      netns, not for every netns.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Fixes: 085db2c0 ("netfilter: Per network namespace netfilter hooks.")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2385eb0c
  4. 22 7月, 2015 2 次提交
  5. 20 7月, 2015 2 次提交
    • P
      netfilter: fix netns dependencies with conntrack templates · 0838aa7f
      Pablo Neira Ayuso 提交于
      Quoting Daniel Borkmann:
      
      "When adding connection tracking template rules to a netns, f.e. to
      configure netfilter zones, the kernel will endlessly busy-loop as soon
      as we try to delete the given netns in case there's at least one
      template present, which is problematic i.e. if there is such bravery that
      the priviledged user inside the netns is assumed untrusted.
      
      Minimal example:
      
        ip netns add foo
        ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
        ip netns del foo
      
      What happens is that when nf_ct_iterate_cleanup() is being called from
      nf_conntrack_cleanup_net_list() for a provided netns, we always end up
      with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
      don't get a soft-lockup as we still have a schedule() point, but the
      serving CPU spins on 100% from that point onwards.
      
      Since templates are normally allocated with nf_conntrack_alloc(), we
      also bump net->ct.count. The issue why they are not yet nf_ct_put() is
      because the per netns .exit() handler from x_tables (which would eventually
      invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
      called in the dependency chain at a *later* point in time than the per
      netns .exit() handler for the connection tracker.
      
      This is clearly a chicken'n'egg problem: after the connection tracker
      .exit() handler, we've teared down all the connection tracking
      infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
      invoked at a later point in time during the netns cleanup, as that would
      lead to a use-after-free. At the same time, we cannot make x_tables depend
      on the connection tracker module, so that the xt_ct_tg_destroy() would
      be invoked earlier in the cleanup chain."
      
      Daniel confirms this has to do with the order in which modules are loaded or
      having compiled nf_conntrack as modules while x_tables built-in. So we have no
      guarantees regarding the order in which netns callbacks are executed.
      
      Fix this by allocating the templates through kmalloc() from the respective
      SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
      Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
      is marked as unlikely since conntrack templates are rarely allocated and only
      from the configuration plane path.
      
      Note that templates are not kept in any list to avoid further dependencies with
      nf_conntrack anymore, thus, the tmpl larval list is removed.
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      0838aa7f
    • E
      netfilter: Fix memory leak in nf_register_net_hook · e317fa50
      Eric W. Biederman 提交于
      In the rare case that when it is a attempted to use a per network device
      netfilter hook and the network device does not exist the newly allocated
      structure can leak.
      
      Be a good citizen and free the newly allocated structure in the error
      handling code.
      
      Fixes: 085db2c0 ("netfilter: Per network namespace netfilter hooks.")
      Reported-by: kbuild@01.org
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e317fa50
  6. 16 7月, 2015 6 次提交
    • F
      netfilter: add and use jump label for xt_tee · dcebd315
      Florian Westphal 提交于
      Don't bother testing if we need to switch to alternate stack
      unless TEE target is used.
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dcebd315
    • F
      netfilter: xtables: don't save/restore jumpstack offset · 7814b6ec
      Florian Westphal 提交于
      In most cases there is no reentrancy into ip/ip6tables.
      
      For skbs sent by REJECT or SYNPROXY targets, there is one level
      of reentrancy, but its not relevant as those targets issue an absolute
      verdict, i.e. the jumpstack can be clobbered since its not used
      after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).
      
      So the only special case where it is relevant is the TEE target, which
      returns XT_CONTINUE.
      
      This patch changes ip(6)_do_table to always use the jump stack starting
      from 0.
      
      When we detect we're operating on an skb sent via TEE (percpu
      nf_skb_duplicated is 1) we switch to an alternate stack to leave
      the original one alone.
      
      Since there is no TEE support for arptables, it doesn't need to
      test if tee is active.
      
      The jump stack overflow tests are no longer needed as well --
      since ->stacksize is the largest call depth we cannot exceed it.
      
      A much better alternative to the external jumpstack would be to just
      declare a jumps[32] stack on the local stack frame, but that would mean
      we'd have to reject iptables rulesets that used to work before.
      
      Another alternative would be to start rejecting rulesets with a larger
      call depth, e.g. 1000 -- in this case it would be feasible to allocate the
      entire stack in the percpu area which would avoid one dereference.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7814b6ec
    • F
      netfilter: move tee_active to core · e7c8899f
      Florian Westphal 提交于
      This prepares for a TEE like expression in nftables.
      We want to ensure only one duplicate is sent, so both will
      use the same percpu variable to detect duplication.
      
      The other use case is detection of recursive call to xtables, but since
      we don't want dependency from nft to xtables core its put into core.c
      instead of the x_tables core.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e7c8899f
    • F
      netfilter: xtables: compute exact size needed for jumpstack · 98d1bd80
      Florian Westphal 提交于
      The {arp,ip,ip6tables} jump stack is currently sized based
      on the number of user chains.
      
      However, its rather unlikely that every user defined chain jumps to the
      next, so lets use the existing loop detection logic to also track the
      chain depths.
      
      The stacksize is then set to the largest chain depth seen.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98d1bd80
    • E
      netfilter: nftables: Only run the nftables chains in the proper netns · fd2ecda0
      Eric W. Biederman 提交于
      - Register the nftables chains in the network namespace that they need
        to run in.
      
      - Remove the hacks that stopped chains running in the wrong network
        namespace.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fd2ecda0
    • E
      netfilter: Per network namespace netfilter hooks. · 085db2c0
      Eric W. Biederman 提交于
      - Add a new set of functions for registering and unregistering per
        network namespace hooks.
      
      - Modify the old global namespace hook functions to use the per
        network namespace hooks in their implementation, so their remains a
        single list that needs to be walked for any hook (this is important
        for keeping the hook priority working and for keeping the code
        walking the hooks simple).
      
      - Only allow registering the per netdevice hooks in the network
        namespace where the network device lives.
      
      - Dynamically allocate the structures in the per network namespace
        hook list in nf_register_net_hook, and unregister them in
        nf_unregister_net_hook.
      
        Dynamic allocate is required somewhere as the number of network
        namespaces are not fixed so we might as well allocate them in the
        registration function.
      
        The chain of registered hooks on any list is expected to be small so
        the cost of walking that list to find the entry we are unregistering
        should also be small.
      
        Performing the management of the dynamically allocated list entries
        in the registration and unregistration functions keeps the complexity
        from spreading.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      085db2c0
  7. 15 7月, 2015 3 次提交
  8. 14 7月, 2015 6 次提交
  9. 13 7月, 2015 1 次提交
    • D
      netfilter: IDLETIMER: fix lockdep warning · 484836ec
      Dmitry Torokhov 提交于
      Dynamically allocated sysfs attributes should be initialized with
      sysfs_attr_init() otherwise lockdep will be angry with us:
      
      [   45.468653] BUG: key ffffffc030fad4e0 not in .data!
      [   45.468655] ------------[ cut here ]------------
      [   45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490()
      [   45.468672] DEBUG_LOCKS_WARN_ON(1)
      [   45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G     U  W 3.18.0 #43
      [   45.468674] Hardware name: XXX
      [   45.468675] Call trace:
      [   45.468680] [<ffffffc0002072b4>] dump_backtrace+0x0/0x10c
      [   45.468683] [<ffffffc0002073d0>] show_stack+0x10/0x1c
      [   45.468688] [<ffffffc000a86cd4>] dump_stack+0x74/0x94
      [   45.468692] [<ffffffc000217ae0>] warn_slowpath_common+0x84/0xb0
      [   45.468694] [<ffffffc000217b84>] warn_slowpath_fmt+0x4c/0x58
      [   45.468697] [<ffffffc0002530a4>] lockdep_init_map+0x128/0x490
      [   45.468701] [<ffffffc000367ef0>] __kernfs_create_file+0x80/0xe4
      [   45.468704] [<ffffffc00036862c>] sysfs_add_file_mode_ns+0x104/0x170
      [   45.468706] [<ffffffc00036870c>] sysfs_create_file_ns+0x58/0x64
      [   45.468711] [<ffffffc000930430>] idletimer_tg_checkentry+0x14c/0x324
      [   45.468714] [<ffffffc00092a728>] xt_check_target+0x170/0x198
      [   45.468717] [<ffffffc000993efc>] check_target+0x58/0x6c
      [   45.468720] [<ffffffc000994c64>] translate_table+0x30c/0x424
      [   45.468723] [<ffffffc00099529c>] do_ipt_set_ctl+0x144/0x1d0
      [   45.468728] [<ffffffc0009079f0>] nf_setsockopt+0x50/0x60
      [   45.468732] [<ffffffc000946870>] ip_setsockopt+0x8c/0xb4
      [   45.468735] [<ffffffc0009661c0>] raw_setsockopt+0x10/0x50
      [   45.468739] [<ffffffc0008c1550>] sock_common_setsockopt+0x14/0x20
      [   45.468742] [<ffffffc0008bd190>] SyS_setsockopt+0x88/0xb8
      [   45.468744] ---[ end trace 41d156354d18c039 ]---
      Signed-off-by: NDmitry Torokhov <dtor@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      484836ec
  10. 10 7月, 2015 2 次提交
  11. 02 7月, 2015 2 次提交
  12. 23 6月, 2015 2 次提交
    • E
      netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook · 8405a8ff
      Eric W. Biederman 提交于
      Add code to nf_unregister_hook to flush the nf_queue when a hook is
      unregistered.  This guarantees that the pointer that the nf_queue code
      retains into the nf_hook list will remain valid while a packet is
      queued.
      
      I tested what would happen if we do not flush queued packets and was
      trivially able to obtain the oops below.  All that was required was
      to stop the nf_queue listening process, to delete all of the nf_tables,
      and to awaken the nf_queue listening process.
      
      > BUG: unable to handle kernel paging request at 0000000100000001
      > IP: [<0000000100000001>] 0x100000001
      > PGD b9c35067 PUD 0
      > Oops: 0010 [#1] SMP
      > Modules linked in:
      > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
      > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000
      > RIP: 0010:[<0000000100000001>]  [<0000000100000001>] 0x100000001
      > RSP: 0018:ffff8800ba9dba40  EFLAGS: 00010a16
      > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90
      > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00
      > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28
      > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900
      > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000
      > FS:  00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0
      > Stack:
      >  ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8
      >  ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128
      >  ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0
      > Call Trace:
      >  [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0
      >  [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190
      >  [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360
      >  [<ffffffff81386290>] ? nla_parse+0x80/0xf0
      >  [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240
      >  [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150
      >  [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20
      >  [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0
      >  [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0
      >  [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650
      >  [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50
      >  [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0
      >  [<ffffffff810e8f73>] ? __wake_up+0x43/0x70
      >  [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0
      >  [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80
      >  [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a
      > Code:  Bad RIP value.
      > RIP  [<0000000100000001>] 0x100000001
      >  RSP <ffff8800ba9dba40>
      > CR2: 0000000100000001
      > ---[ end trace 08eb65d42362793f ]---
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8405a8ff
    • E
      netfilter: nftables: Do not run chains in the wrong network namespace · fdab6a4c
      Eric W. Biederman 提交于
      Currenlty nf_tables chains added in one network namespace are being
      run in all network namespace.  The issues are myriad with the simplest
      being an unprivileged user can cause any network packets to be dropped.
      
      Address this by simply not running nf_tables chains in the wrong
      network namespace.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdab6a4c
  13. 19 6月, 2015 2 次提交
  14. 18 6月, 2015 2 次提交
    • H
      netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag · 01555e74
      Harout Hedeshian 提交于
      xt_socket is useful for matching sockets with IP_TRANSPARENT and
      taking some action on the matching packets. However, it lacks the
      ability to match only a small subset of transparent sockets.
      
      Suppose there are 2 applications, each with its own set of transparent
      sockets. The first application wants all matching packets dropped,
      while the second application wants them forwarded somewhere else.
      
      Add the ability to retore the skb->mark from the sk_mark. The mark
      is only restored if a matching socket is found and the transparent /
      nowildcard conditions are satisfied.
      
      Now the 2 hypothetical applications can differentiate their sockets
      based on a mark value set with SO_MARK.
      
      iptables -t mangle -I PREROUTING -m socket --transparent \
                                                 --restore-skmark -j action
      iptables -t mangle -A action -m mark --mark 10 -j action2
      iptables -t mangle -A action -m mark --mark 11 -j action3
      Signed-off-by: NHarout Hedeshian <harouth@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      01555e74
    • R
      netfilter: nfnetlink_queue: add security context information · ef493bd9
      Roman Kubiak 提交于
      This patch adds an additional attribute when sending
      packet information via netlink in netfilter_queue module.
      It will send additional security context data, so that
      userspace applications can verify this context against
      their own security databases.
      Signed-off-by: NRoman Kubiak <r.kubiak@samsung.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ef493bd9
  15. 16 6月, 2015 3 次提交