1. 09 2月, 2010 3 次提交
    • P
      netfilter: nf_conntrack: fix hash resizing with namespaces · d696c7bd
      Patrick McHardy 提交于
      As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
      size is global and not per namespace, but modifiable at runtime through
      /sys/module/nf_conntrack/hashsize. Changing the hash size will only
      resize the hash in the current namespace however, so other namespaces
      will use an invalid hash size. This can cause crashes when enlarging
      the hashsize, or false negative lookups when shrinking it.
      
      Move the hash size into the per-namespace data and only use the global
      hash size to initialize the per-namespace value when instanciating a
      new namespace. Additionally restrict hash resizing to init_net for
      now as other namespaces are not handled currently.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d696c7bd
    • E
      netfilter: nf_conntrack: per netns nf_conntrack_cachep · 5b3501fa
      Eric Dumazet 提交于
      nf_conntrack_cachep is currently shared by all netns instances, but
      because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
      
      If we use a shared slab cache, one object can instantly flight between
      one hash table (netns ONE) to another one (netns TWO), and concurrent
      reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
      can be fooled without notice, because no RCU grace period has to be
      observed between object freeing and its reuse.
      
      We dont have this problem with UDP/TCP slab caches because TCP/UDP
      hashtables are global to the machine (and each object has a pointer to
      its netns).
      
      If we use per netns conntrack hash tables, we also *must* use per netns
      conntrack slab caches, to guarantee an object can not escape from one
      namespace to another one.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      [Patrick: added unique slab name allocation]
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      5b3501fa
    • P
      netfilter: nf_conntrack: fix memory corruption with multiple namespaces · 9edd7ca0
      Patrick McHardy 提交于
      As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
      conntrack, which is located in the data section, might be accidentally
      freed when a new namespace is instantiated while the untracked conntrack
      is attached to a skb because the reference count it re-initialized.
      
      The best fix would be to use a seperate untracked conntrack per
      namespace since it includes a namespace pointer. Unfortunately this is
      not possible without larger changes since the namespace is not easily
      available everywhere we need it. For now move the untracked conntrack
      initialization to the init_net setup function to make sure the reference
      count is not re-initialized and handle cleanup in the init_net cleanup
      function to make sure namespaces can exit properly while the untracked
      conntrack is in use in other namespaces.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9edd7ca0
  2. 06 11月, 2009 1 次提交
    • J
      netfilter: nf_nat: fix NAT issue in 2.6.30.4+ · f9dd09c7
      Jozsef Kadlecsik 提交于
      Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
      over NAT. The "cause" of the problem was a fix of unacknowledged data
      detection with NAT (commit a3a9f79e).
      However, actually, that fix uncovered a long standing bug in TCP conntrack:
      when NAT was enabled, we simply updated the max of the right edge of
      the segments we have seen (td_end), by the offset NAT produced with
      changing IP/port in the data. However, we did not update the other parameter
      (td_maxend) which is affected by the NAT offset. Thus that could drift
      away from the correct value and thus resulted breaking active FTP.
      
      The patch below fixes the issue by *not* updating the conntrack parameters
      from NAT, but instead taking into account the NAT offsets in conntrack in a
      consistent way. (Updating from NAT would be more harder and expensive because
      it'd need to re-calculate parameters we already calculated in conntrack.)
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9dd09c7
  3. 05 11月, 2009 1 次提交
  4. 12 10月, 2009 1 次提交
  5. 22 9月, 2009 1 次提交
  6. 31 8月, 2009 1 次提交
  7. 25 8月, 2009 1 次提交
  8. 16 7月, 2009 1 次提交
    • E
      netfilter: nf_conntrack: nf_conntrack_alloc() fixes · 941297f4
      Eric Dumazet 提交于
      When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating
      objects, since slab allocator could give a freed object still used by lockless
      readers.
      
      In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next
      being always valid (ie containing a valid 'nulls' value, or a valid pointer to next
      object in hash chain.)
      
      kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid
      for ct->tuplehash[xxx].hnnode.next.
      
      Fix is to call kmem_cache_alloc() and do the zeroing ourself.
      
      As spotted by Patrick, we also need to make sure lookup keys are committed to
      memory before setting refcount to 1, or a lockless reader could get a reference
      on the old version of the object. Its key re-check could then pass the barrier.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      941297f4
  9. 22 6月, 2009 3 次提交
    • P
      netfilter: nf_conntrack: fix conntrack lookup race · 8d8890b7
      Patrick McHardy 提交于
      The RCU protected conntrack hash lookup only checks whether the entry
      has a refcount of zero to decide whether it is stale. This is not
      sufficient, entries are explicitly removed while there is at least
      one reference left, possibly more. Explicitly check whether the entry
      has been marked as dying to fix this.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      8d8890b7
    • P
      netfilter: nf_conntrack: fix confirmation race condition · 5c8ec910
      Patrick McHardy 提交于
      New connection tracking entries are inserted into the hash before they
      are fully set up, namely the CONFIRMED bit is not set and the timer not
      started yet. This can theoretically lead to a race with timer, which
      would set the timeout value to a relative value, most likely already in
      the past.
      
      Perform hash insertion as the final step to fix this.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      5c8ec910
    • E
      netfilter: nf_conntrack: death_by_timeout() fix · 8cc20198
      Eric Dumazet 提交于
      death_by_timeout() might delete a conntrack from hash list
      and insert it in dying list.
      
       nf_ct_delete_from_lists(ct);
       nf_ct_insert_dying_list(ct);
      
      I believe a (lockless) reader could *catch* ct while doing a lookup
      and miss the end of its chain.
      (nulls lookup algo must check the null value at the end of lookup and
      should restart if the null value is not the expected one.
      cf Documentation/RCU/rculist_nulls.txt for details)
      
      We need to change nf_conntrack_init_net() and use a different "null" value,
      guaranteed not being used in regular lists. Choose very large values, since
      hash table uses [0..size-1] null values.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      8cc20198
  10. 13 6月, 2009 4 次提交
    • P
      netfilter: conntrack: optional reliable conntrack event delivery · dd7669a9
      Pablo Neira Ayuso 提交于
      This patch improves ctnetlink event reliability if one broadcast
      listener has set the NETLINK_BROADCAST_ERROR socket option.
      
      The logic is the following: if an event delivery fails, we keep
      the undelivered events in the missed event cache. Once the next
      packet arrives, we add the new events (if any) to the missed
      events in the cache and we try a new delivery, and so on. Thus,
      if ctnetlink fails to deliver an event, we try to deliver them
      once we see a new packet. Therefore, we may lose state
      transitions but the userspace process gets in sync at some point.
      
      At worst case, if no events were delivered to userspace, we make
      sure that destroy events are successfully delivered. Basically,
      if ctnetlink fails to deliver the destroy event, we remove the
      conntrack entry from the hashes and we insert them in the dying
      list, which contains inactive entries. Then, the conntrack timer
      is added with an extra grace timeout of random32() % 15 seconds
      to trigger the event again (this grace timeout is tunable via
      /proc). The use of a limited random timeout value allows
      distributing the "destroy" resends, thus, avoiding accumulating
      lots "destroy" events at the same time. Event delivery may
      re-order but we can identify them by means of the tuple plus
      the conntrack ID.
      
      The maximum number of conntrack entries (active or inactive) is
      still handled by nf_conntrack_max. Thus, we may start dropping
      packets at some point if we accumulate a lot of inactive conntrack
      entries that did not successfully report the destroy event to
      userspace.
      
      During my stress tests consisting of setting a very small buffer
      of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
      flag, and generating lots of very small connections, I noticed
      very few destroy entries on the fly waiting to be resend.
      
      A simple way to test this patch consist of creating a lot of
      entries, set a very small Netlink buffer in conntrackd (+ a patch
      which is not in the git tree to set the BROADCAST_ERROR flag)
      and invoke `conntrack -F'.
      
      For expectations, no changes are introduced in this patch.
      Currently, event delivery is only done for new expectations (no
      events from expectation expiration, removal and confirmation).
      In that case, they need a per-expectation event cache to implement
      the same idea that is exposed in this patch.
      
      This patch can be useful to provide reliable flow-accouting. We
      still have to add a new conntrack extension to store the creation
      and destroy time.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      dd7669a9
    • P
      netfilter: conntrack: move helper destruction to nf_ct_helper_destroy() · 9858a3ae
      Pablo Neira Ayuso 提交于
      This patch moves the helper destruction to a function that lives
      in nf_conntrack_helper.c. This new function is used in the patch
      to add ctnetlink reliable event delivery.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9858a3ae
    • P
      netfilter: conntrack: move event caching to conntrack extension infrastructure · a0891aa6
      Pablo Neira Ayuso 提交于
      This patch reworks the per-cpu event caching to use the conntrack
      extension infrastructure.
      
      The main drawback is that we consume more memory per conntrack
      if event delivery is enabled. This patch is required by the
      reliable event delivery that follows to this patch.
      
      BTW, this patch allows you to enable/disable event delivery via
      /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
      you can still disable event caching as compilation option.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a0891aa6
    • P
      netfilter: nf_conntrack: use mod_timer_pending() for conntrack refresh · 65cb9fda
      Patrick McHardy 提交于
      Use mod_timer_pending() instead of atomic sequence of del_timer()/
      add_timer(). mod_timer_pending() does not rearm an inactive timer,
      so we don't need the conntrack lock anymore to make sure we don't
      accidentally rearm a timer of a conntrack which is in the process
      of being destroyed.
      
      With this change, we don't need to take the global lock anymore at all,
      counter updates can be performed under the per-conntrack lock.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      65cb9fda
  11. 10 6月, 2009 1 次提交
  12. 03 6月, 2009 2 次提交
    • P
      netfilter: conntrack: simplify event caching system · 17e6e4ea
      Pablo Neira Ayuso 提交于
      This patch simplifies the conntrack event caching system by removing
      several events:
      
       * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
         since the have no clients.
       * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
         days.
       * IPCT_REFRESH which is not of any use since we always include the
         timeout in the messages.
      
      After this patch, the existing events are:
      
       * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
       addition and deletion of entries.
       * IPCT_STATUS, that notes that the status bits have changes,
       eg. IPS_SEEN_REPLY and IPS_ASSURED.
       * IPCT_PROTOINFO, that reports that internal protocol information has
       changed, eg. the TCP, DCCP and SCTP protocol state.
       * IPCT_HELPER, that a helper has been assigned or unassigned to this
       entry.
       * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
       covers the case when a mark is set to zero.
       * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
       adjustment.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      17e6e4ea
    • P
      netfilter: conntrack: don't report events on module removal · 274d383b
      Pablo Neira Ayuso 提交于
      During the module removal there are no possible event listeners
      since ctnetlink must be removed before to allow removing
      nf_conntrack. This patch removes the event reporting for the
      module removal case which is not of any use in the existing code.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      274d383b
  13. 26 3月, 2009 3 次提交
  14. 24 3月, 2009 1 次提交
  15. 16 3月, 2009 1 次提交
  16. 24 2月, 2009 1 次提交
  17. 20 2月, 2009 2 次提交
  18. 13 1月, 2009 1 次提交
    • J
      netfilter 07/09: simplify nf_conntrack_alloc() error handling · cd7fcbf1
      Julia Lawall 提交于
      nf_conntrack_alloc cannot return NULL, so there is no need to check for
      NULL before using the value.  I have also removed the initialization of ct
      to NULL in nf_conntrack_alloc, since the value is never used, and since
      perhaps it might lead one to think that return ct at the end might return
      NULL.
      
      The semantic patch that finds this problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @match exists@
      expression x, E;
      position p1,p2;
      statement S1, S2;
      @@
      
      x@p1 = nf_conntrack_alloc(...)
      ... when != x = E
      (
        if (x@p2 == NULL || ...) S1 else S2
      |
        if (x@p2 == NULL && ...) S1 else S2
      )
      
      @other_match exists@
      expression match.x, E1, E2;
      position p1!=match.p1,match.p2;
      @@
      
      x@p1 = E1
      ... when != x = E2
      x@p2
      
      @ script:python depends on !other_match@
      p1 << match.p1;
      p2 << match.p2;
      @@
      
      print "%s: call to nf_conntrack_alloc %s bad test %s" % (p1[0].file,p1[0].line,p2[0].line)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd7fcbf1
  19. 25 11月, 2008 1 次提交
  20. 18 11月, 2008 3 次提交
    • P
      netfilter: nf_conntrack: fix warning and prototype mismatch · e17b666a
      Patrick McHardy 提交于
      net/netfilter/nf_conntrack_core.c:46:1: warning: symbol 'nfnetlink_parse_nat_setup_hook' was not declared. Should it be static?
      
      Including the proper header also revealed an incorrect prototype.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      e17b666a
    • P
      netfilter: ctnetlink: deliver events for conntracks changed from userspace · 19abb7b0
      Pablo Neira Ayuso 提交于
      As for now, the creation and update of conntracks via ctnetlink do not
      propagate an event to userspace. This can result in inconsistent situations
      if several userspace processes modify the connection tracking table by means
      of ctnetlink at the same time. Specifically, using the conntrack command
      line tool and conntrackd at the same time can trigger unconsistencies.
      
      This patch also modifies the event cache infrastructure to pass the
      process PID and the ECHO flag to nfnetlink_send() to report back
      to userspace if the process that triggered the change needs so.
      Based on a suggestion from Patrick McHardy.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      19abb7b0
    • P
      netfilter: ctnetlink: helper modules load-on-demand support · 226c0c0e
      Pablo Neira Ayuso 提交于
      This patch adds module loading for helpers via ctnetlink.
      
      * Creation path: We support explicit and implicit helper assignation. For
        the explicit case, we try to load the module. If the module is correctly
        loaded and the helper is present, we return EAGAIN to re-start the
        creation. Otherwise, we return EOPNOTSUPP.
      * Update path: release the spin lock, load the module and check. If it is
        present, then return EAGAIN to re-start the update.
      
      This patch provides a refactorized function to lookup-and-set the
      connection tracking helper. The function removes the exported symbol
      __nf_ct_helper_find as it has not clients anymore.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      226c0c0e
  21. 15 10月, 2008 1 次提交
    • P
      netfilter: ctnetlink: remove bogus module dependency between ctnetlink and nf_nat · e6a7d3c0
      Pablo Neira Ayuso 提交于
      This patch removes the module dependency between ctnetlink and
      nf_nat by means of an indirect call that is initialized when
      nf_nat is loaded. Now, nf_conntrack_netlink only requires
      nf_conntrack and nfnetlink.
      
      This patch puts nfnetlink_parse_nat_setup_hook into the
      nf_conntrack_core to avoid dependencies between ctnetlink,
      nf_conntrack_ipv4 and nf_conntrack_ipv6.
      
      This patch also introduces the function ctnetlink_change_nat
      that is only invoked from the creation path. Actually, the
      nat handling cannot be invoked from the update path since
      this is not allowed. By introducing this function, we remove
      the useless nat handling in the update path and we avoid
      deadlock-prone code.
      
      This patch also adds the required EAGAIN logic for nfnetlink.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6a7d3c0
  22. 08 10月, 2008 6 次提交