1. 08 4月, 2017 1 次提交
  2. 07 4月, 2017 10 次提交
  3. 29 3月, 2017 1 次提交
  4. 27 3月, 2017 3 次提交
    • L
      netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister · 9c3f3794
      Liping Zhang 提交于
      If one cpu is doing nf_ct_extend_unregister while another cpu is doing
      __nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
      there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
      NULL, so it's possible that we may access invalid pointer.
      
      But actually, most of the ct extends are built-in, so the problem listed
      above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
      and NF_CT_EXT_SYNPROXY.
      
      For _EXT_NAT, the panic will not happen, since adding the nat extend and
      unregistering the nat extend are located in the same file(nf_nat_core.c),
      this means that after the nat module is removed, we cannot add the nat
      extend too.
      
      For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
      synproxy extend unregister will be done by synproxy_core_exit. So after
      nf_synproxy_core.ko is removed, we may still try to add the synproxy
      extend, then kernel panic may happen.
      
      I know it's very hard to reproduce this issue, but I can play a tricky
      game to make it happen very easily :)
      
      Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
        # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
      Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
              Also note, in the userspace we only add a 20s' delay, then
              reinject the syn packet to the kernel:
        # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
      Step 3. Using "nc 2.2.2.2 1234" to connect the server.
      Step 4. Now remove the nf_synproxy_core.ko quickly:
        # iptables -F FORWARD
        # rmmod ipt_SYNPROXY
        # rmmod nf_synproxy_core
      Step 5. After 20s' delay, the syn packet is reinjected to the kernel.
      
      Now you will see the panic like this:
        kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
        Call Trace:
         ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
         init_conntrack+0x12b/0x600 [nf_conntrack]
         nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
         ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
         nf_reinject+0x104/0x270
         nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
         ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
         ? nla_parse+0xa0/0x100
         nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
         [...]
      
      One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
      introduce nf_conntrack_synproxy.c and only do ct extend register and
      unregister in it, similar to nf_conntrack_timeout.c.
      
      But having such a obscure restriction of nf_ct_extend_unregister is not a
      good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
      to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
      it will be easier if we add new ct extend in the future.
      
      Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
      anymore, remove it too.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9c3f3794
    • L
      netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table · 83d90219
      Liping Zhang 提交于
      The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
      nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
      So it's possible that one CPU is walking the nf_ct_helper_hash for
      cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
      at the same time. This is dangrous, and may cause use after free error.
      
      Note, delete operation will flush all cthelpers added via nfnetlink, so
      using rcu to do protect is not easy.
      
      Now introduce a dummy list to record all the cthelpers added via
      nfnetlink, then we can walk the dummy list instead of walking the
      nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
      may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      83d90219
    • L
      netfilter: invoke synchronize_rcu after set the _hook_ to NULL · 3b7dabf0
      Liping Zhang 提交于
      Otherwise, another CPU may access the invalid pointer. For example:
          CPU0                CPU1
           -              rcu_read_lock();
           -              pfunc = _hook_;
        _hook_ = NULL;          -
        mod unload              -
           -                 pfunc(); // invalid, panic
           -             rcu_read_unlock();
      
      So we must call synchronize_rcu() to wait the rcu reader to finish.
      
      Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
      by later nf_conntrack_helper_unregister, but I'm inclined to add a
      explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
      on such obscure assumptions is not a good idea.
      
      Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
      so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
      remove it too.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3b7dabf0
  5. 22 3月, 2017 2 次提交
  6. 21 3月, 2017 1 次提交
    • L
      netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max · ae5c6821
      Liping Zhang 提交于
      The helper->expect_class_max must be set to the total number of
      expect_policy minus 1, since we will use the statement "if (class >
      helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in
      ctnetlink_alloc_expect.
      
      So for compatibility, set the helper->expect_class_max to the
      NFCTH_POLICY_SET_NUM attr's value minus 1.
      
      Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero.
      1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);";
      2. we cannot set the helper->expect_class_max to a proper value.
      
      So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to
      the userspace.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ae5c6821
  7. 20 3月, 2017 1 次提交
  8. 17 3月, 2017 1 次提交
  9. 16 3月, 2017 2 次提交
  10. 14 3月, 2017 4 次提交
  11. 13 3月, 2017 7 次提交
    • P
      netfilter: nft_fib: Support existence check · 055c4b34
      Phil Sutter 提交于
      Instead of the actual interface index or name, set destination register
      to just 1 or 0 depending on whether the lookup succeeded or not if
      NFTA_FIB_F_PRESENT was set in userspace.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      055c4b34
    • F
      netfilter: nft_ct: add helper set support · 1a64edf5
      Florian Westphal 提交于
      this allows to assign connection tracking helpers to
      connections via nft objref infrastructure.
      
      The idea is to first specifiy a helper object:
      
       table ip filter {
          ct helper some-name {
            type "ftp"
            protocol tcp
            l3proto ip
          }
       }
      
      and then assign it via
      
      nft add ... ct helper set "some-name"
      
      helper assignment works for new conntracks only as we cannot expand the
      conntrack extension area once it has been committed to the main conntrack
      table.
      
      ipv4 and ipv6 protocols are tracked stored separately so
      we can also handle families that observe both ipv4 and ipv6 traffic.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1a64edf5
    • F
      netfilter: provide nft_ctx in object init function · 84fba055
      Florian Westphal 提交于
      this is needed by the upcoming ct helper object type --
      we'd like to be able use the table family (ip, ip6, inet) to figure
      out which helper has to be requested.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      84fba055
    • P
      netfilter: nft_set_bitmap: keep a list of dummy elements · e920dde5
      Pablo Neira Ayuso 提交于
      Element comments may come without any prior set flag, so we have to keep
      a list of dummy struct nft_set_ext to keep this information around. This
      is only useful for set dumps to userspace. From the packet path, this
      set type relies on the bitmap representation. This patch simplifies the
      logic since we don't need to allocate the dummy nft_set_ext structure
      anymore on the fly at the cost of increasing memory consumption because
      of the list of dummy struct nft_set_ext.
      
      Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e920dde5
    • S
      netfilter: Force fake conntrack entry to be at least 8 bytes aligned · 170a1fb9
      Steven Rostedt (VMware) 提交于
      Since the nfct and nfctinfo have been combined, the nf_conn structure
      must be at least 8 bytes aligned, as the 3 LSB bits are used for the
      nfctinfo. But there's a fake nf_conn structure to denote untracked
      connections, which is created by a PER_CPU construct. This does not
      guarantee that it will be 8 bytes aligned and can break the logic in
      determining the correct nfctinfo.
      
      I triggered this on a 32bit machine with the following error:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000af4
      IP: nf_ct_deliver_cached_events+0x1b/0xfb
      *pdpt = 0000000031962001 *pde = 0000000000000000
      
      Oops: 0000 [#1] SMP
      [Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport
        OK  ]
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      task: c126ec00 task.stack: c1258000
      EIP: nf_ct_deliver_cached_events+0x1b/0xfb
      EFLAGS: 00010202 CPU: 0
      EAX: 0021cd01 EBX: 00000000 ECX: 27b0c767 EDX: 32bcb17a
      ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c
       DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      CR0: 80050033 CR2: 00000af4 CR3: 309a0440 CR4: 001406f0
      Call Trace:
       <SOFTIRQ>
       ? ipv6_skip_exthdr+0xac/0xcb
       ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6]
       nf_hook_slow+0x22/0xc7
       nf_hook+0x9a/0xad [ipv6]
       ? ip6t_do_table+0x356/0x379 [ip6_tables]
       ? ip6_fragment+0x9e9/0x9e9 [ipv6]
       ip6_output+0xee/0x107 [ipv6]
       ? ip6_fragment+0x9e9/0x9e9 [ipv6]
       dst_output+0x36/0x4d [ipv6]
       NF_HOOK.constprop.37+0xb2/0xba [ipv6]
       ? icmp6_dst_alloc+0x2c/0xfd [ipv6]
       ? local_bh_enable+0x14/0x14 [ipv6]
       mld_sendpack+0x1c5/0x281 [ipv6]
       ? mark_held_locks+0x40/0x5c
       mld_ifc_timer_expire+0x1f6/0x21e [ipv6]
       call_timer_fn+0x135/0x283
       ? detach_if_pending+0x55/0x55
       ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
       __run_timers+0x111/0x14b
       ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
       run_timer_softirq+0x1c/0x36
       __do_softirq+0x185/0x37c
       ? test_ti_thread_flag.constprop.19+0xd/0xd
       do_softirq_own_stack+0x22/0x28
       </SOFTIRQ>
       irq_exit+0x5a/0xa4
       smp_apic_timer_interrupt+0x2a/0x34
       apic_timer_interrupt+0x37/0x3c
      
      By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte
      alignment as all cache line sizes are at least 8 bytes or more.
      
      Fixes: a9e419dc ("netfilter: merge ctinfo into nfct pointer storage area")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      170a1fb9
    • L
      netfilter: nf_tables: fix mismatch in big-endian system · 10596608
      Liping Zhang 提交于
      Currently, there are two different methods to store an u16 integer to
      the u32 data register. For example:
        u32 *dest = &regs->data[priv->dreg];
        1. *dest = 0; *(u16 *) dest = val_u16;
        2. *dest = val_u16;
      
      For method 1, the u16 value will be stored like this, either in
      big-endian or little-endian system:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        |   Value   |     0     |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      
      For method 2, in little-endian system, the u16 value will be the same
      as listed above. But in big-endian system, the u16 value will be stored
      like this:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        |     0     |   Value   |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      
      So later we use "memcmp(&regs->data[priv->sreg], data, 2);" to do
      compare in nft_cmp, nft_lookup expr ..., method 2 will get the wrong
      result in big-endian system, as 0~15 bits will always be zero.
      
      For the similar reason, when loading an u16 value from the u32 data
      register, we should use "*(u16 *) sreg;" instead of "(u16)*sreg;",
      the 2nd method will get the wrong value in the big-endian system.
      
      So introduce some wrapper functions to store/load an u8 or u16
      integer to/from the u32 data register, and use them in the right
      place.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      10596608
    • L
      netfilter: nft_set_bitmap: fetch the element key based on the set->klen · fd89b23a
      Liping Zhang 提交于
      Currently we just assume the element key as a u32 integer, regardless of
      the set key length.
      
      This is incorrect, for example, the tcp port number is only 16 bits.
      So when we use the nft_payload expr to get the tcp dport and store
      it to dreg, the dport will be stored at 0~15 bits, and 16~31 bits
      will be padded with zero.
      
      So the reg->data[dreg] will be looked like as below:
        0          15           31
        +-+-+-+-+-+-+-+-+-+-+-+-+
        | tcp dport |      0    |
        +-+-+-+-+-+-+-+-+-+-+-+-+
      But for these big-endian systems, if we treate this register as a u32
      integer, the element key will be larger than 65535, so the following
      lookup in bitmap set will cause out of bound access.
      
      Another issue is that if we add element with comments in bitmap
      set(although the comments will be ignored eventually), the element will
      vanish strangely. Because we treate the element key as a u32 integer, so
      the comments will become the part of the element key, then the element
      key will also be larger than 65535 and out of bound access will happen:
        # nft add element t s { 1 comment test }
      
      Since set->klen is 1 or 2, it's fine to treate the element key as a u8 or
      u16 integer.
      
      Fixes: 665153ff ("netfilter: nf_tables: add bitmap set type")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fd89b23a
  12. 09 3月, 2017 1 次提交
    • Y
      netfilter: nf_nat_sctp: fix ICMP packet to be dropped accidently · 8e05ba7f
      Ying Xue 提交于
      Regarding RFC 792, the first 64 bits of the original SCTP datagram's
      data could be contained in ICMP packet, such as:
      
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |     Type      |     Code      |          Checksum             |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                             unused                            |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |      Internet Header + 64 bits of Original Data Datagram      |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
      However, according to RFC 4960, SCTP datagram header is as below:
      
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |     Source Port Number        |     Destination Port Number   |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                      Verification Tag                         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                           Checksum                            |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
      It means only the first three fields of SCTP header can be carried in
      ICMP packet except for Checksum field.
      
      At present in sctp_manip_pkt(), no matter whether the packet is ICMP or
      not, it always calculates SCTP packet checksum. However, not only the
      calculation of checksum is unnecessary for ICMP, but also it causes
      another fatal issue that ICMP packet is dropped. The header size of
      SCTP is used to identify whether the writeable length of skb is bigger
      than skb->len through skb_make_writable() in sctp_manip_pkt(). But
      when it deals with ICMP packet, skb_make_writable() directly returns
      false as the writeable length of skb is bigger than skb->len.
      Subsequently ICMP is dropped.
      
      Now we correct this misbahavior. When sctp_manip_pkt() handles ICMP
      packet, 8 bytes rather than the whole SCTP header size is used to check
      if writeable length of skb is overflowed. Meanwhile, as it's meaningless
      to calculate checksum when packet is ICMP, the computation of checksum
      is ignored as well.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8e05ba7f
  13. 07 3月, 2017 5 次提交
  14. 03 3月, 2017 1 次提交