1. 07 6月, 2012 3 次提交
  2. 09 5月, 2012 1 次提交
    • E
      netfilter: nf_ct_helper: allow to disable automatic helper assignment · a9006892
      Eric Leblond 提交于
      This patch allows you to disable automatic conntrack helper
      lookup based on TCP/UDP ports, eg.
      
      echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
      
      [ Note: flows that already got a helper will keep using it even
        if automatic helper assignment has been disabled ]
      
      Once this behaviour has been disabled, you have to explicitly
      use the iptables CT target to attach helper to flows.
      
      There are good reasons to stop supporting automatic helper
      assignment, for further information, please read:
      
      http://www.netfilter.org/news.html#2012-04-03
      
      This patch also adds one message to inform that automatic helper
      assignment is deprecated and it will be removed soon (this is
      spotted only once, with the first flow that gets a helper attached
      to make it as less annoying as possible).
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a9006892
  3. 22 11月, 2011 1 次提交
    • P
      netfilter: nf_conntrack: make event callback registration per-netns · 70e9942f
      Pablo Neira Ayuso 提交于
      This patch fixes an oops that can be triggered following this recipe:
      
      0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
      1) container is started.
      2) connect to it via lxc-console.
      3) generate some traffic with the container to create some conntrack
         entries in its table.
      4) stop the container: you hit one oops because the conntrack table
         cleanup tries to report the destroy event to user-space but the
         per-netns nfnetlink socket has already gone (as the nfnetlink
         socket is per-netns but event callback registration is global).
      
      To fix this situation, we make the ctnl_notifier per-netns so the
      callback is registered/unregistered if the container is
      created/destroyed.
      
      Alex Bligh and Alexey Dobriyan originally proposed one small patch to
      check if the nfnetlink socket is gone in nfnetlink_has_listeners,
      but this is a very visited path for events, thus, it may reduce
      performance and it looks a bit hackish to check for the nfnetlink
      socket only to workaround this situation. As a result, I decided
      to follow the bigger path choice, which seems to look nicer to me.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Reported-by: NAlex Bligh <alex@alex.org.uk>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      70e9942f
  4. 27 7月, 2011 1 次提交
  5. 19 1月, 2011 1 次提交
    • P
      netfilter: nf_conntrack_tstamp: add flow-based timestamp extension · a992ca2a
      Pablo Neira Ayuso 提交于
      This patch adds flow-based timestamping for conntracks. This
      conntrack extension is disabled by default. Basically, we use
      two 64-bits variables to store the creation timestamp once the
      conntrack has been confirmed and the other to store the deletion
      time. This extension is disabled by default, to enable it, you
      have to:
      
      echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
      
      This patch allows to save memory for user-space flow-based
      loogers such as ulogd2. In short, ulogd2 does not need to
      keep a hashtable with the conntrack in user-space to know
      when they were created and destroyed, instead we use the
      kernel timestamp. If we want to have a sane IPFIX implementation
      in user-space, this nanosecs resolution timestamps are also
      useful. Other custom user-space applications can benefit from
      this via libnetfilter_conntrack.
      
      This patch modifies the /proc output to display the delta time
      in seconds since the flow start. You can also obtain the
      flow-start date by means of the conntrack-tools.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a992ca2a
  6. 14 1月, 2011 1 次提交
  7. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to net · 7d720c3e
      Tejun Heo 提交于
      Add __percpu sparse annotations to net.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      
      The macro and type tricks around snmp stats make things a bit
      interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
      as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
      snmp_mib_*() users which used to cast the argument to (void **) are
      updated to cast it to (void __percpu **).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d720c3e
  8. 09 2月, 2010 3 次提交
    • P
      netfilter: nf_conntrack: fix hash resizing with namespaces · d696c7bd
      Patrick McHardy 提交于
      As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
      size is global and not per namespace, but modifiable at runtime through
      /sys/module/nf_conntrack/hashsize. Changing the hash size will only
      resize the hash in the current namespace however, so other namespaces
      will use an invalid hash size. This can cause crashes when enlarging
      the hashsize, or false negative lookups when shrinking it.
      
      Move the hash size into the per-namespace data and only use the global
      hash size to initialize the per-namespace value when instanciating a
      new namespace. Additionally restrict hash resizing to init_net for
      now as other namespaces are not handled currently.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d696c7bd
    • E
      netfilter: nf_conntrack: per netns nf_conntrack_cachep · 5b3501fa
      Eric Dumazet 提交于
      nf_conntrack_cachep is currently shared by all netns instances, but
      because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
      
      If we use a shared slab cache, one object can instantly flight between
      one hash table (netns ONE) to another one (netns TWO), and concurrent
      reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
      can be fooled without notice, because no RCU grace period has to be
      observed between object freeing and its reuse.
      
      We dont have this problem with UDP/TCP slab caches because TCP/UDP
      hashtables are global to the machine (and each object has a pointer to
      its netns).
      
      If we use per netns conntrack hash tables, we also *must* use per netns
      conntrack slab caches, to guarantee an object can not escape from one
      namespace to another one.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      [Patrick: added unique slab name allocation]
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      5b3501fa
    • P
      netfilter: nf_conntrack: fix hash resizing with namespaces · 9ab48ddc
      Patrick McHardy 提交于
      As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
      size is global and not per namespace, but modifiable at runtime through
      /sys/module/nf_conntrack/hashsize. Changing the hash size will only
      resize the hash in the current namespace however, so other namespaces
      will use an invalid hash size. This can cause crashes when enlarging
      the hashsize, or false negative lookups when shrinking it.
      
      Move the hash size into the per-namespace data and only use the global
      hash size to initialize the per-namespace value when instanciating a
      new namespace. Additionally restrict hash resizing to init_net for
      now as other namespaces are not handled currently.
      
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9ab48ddc
  9. 04 2月, 2010 1 次提交
    • E
      netfilter: nf_conntrack: per netns nf_conntrack_cachep · ab59b19b
      Eric Dumazet 提交于
      nf_conntrack_cachep is currently shared by all netns instances, but
      because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
      
      If we use a shared slab cache, one object can instantly flight between
      one hash table (netns ONE) to another one (netns TWO), and concurrent
      reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
      can be fooled without notice, because no RCU grace period has to be
      observed between object freeing and its reuse.
      
      We dont have this problem with UDP/TCP slab caches because TCP/UDP
      hashtables are global to the machine (and each object has a pointer to
      its netns).
      
      If we use per netns conntrack hash tables, we also *must* use per netns
      conntrack slab caches, to guarantee an object can not escape from one
      namespace to another one.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      [Patrick: added unique slab name allocation]
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      ab59b19b
  10. 13 6月, 2009 2 次提交
    • P
      netfilter: conntrack: optional reliable conntrack event delivery · dd7669a9
      Pablo Neira Ayuso 提交于
      This patch improves ctnetlink event reliability if one broadcast
      listener has set the NETLINK_BROADCAST_ERROR socket option.
      
      The logic is the following: if an event delivery fails, we keep
      the undelivered events in the missed event cache. Once the next
      packet arrives, we add the new events (if any) to the missed
      events in the cache and we try a new delivery, and so on. Thus,
      if ctnetlink fails to deliver an event, we try to deliver them
      once we see a new packet. Therefore, we may lose state
      transitions but the userspace process gets in sync at some point.
      
      At worst case, if no events were delivered to userspace, we make
      sure that destroy events are successfully delivered. Basically,
      if ctnetlink fails to deliver the destroy event, we remove the
      conntrack entry from the hashes and we insert them in the dying
      list, which contains inactive entries. Then, the conntrack timer
      is added with an extra grace timeout of random32() % 15 seconds
      to trigger the event again (this grace timeout is tunable via
      /proc). The use of a limited random timeout value allows
      distributing the "destroy" resends, thus, avoiding accumulating
      lots "destroy" events at the same time. Event delivery may
      re-order but we can identify them by means of the tuple plus
      the conntrack ID.
      
      The maximum number of conntrack entries (active or inactive) is
      still handled by nf_conntrack_max. Thus, we may start dropping
      packets at some point if we accumulate a lot of inactive conntrack
      entries that did not successfully report the destroy event to
      userspace.
      
      During my stress tests consisting of setting a very small buffer
      of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
      flag, and generating lots of very small connections, I noticed
      very few destroy entries on the fly waiting to be resend.
      
      A simple way to test this patch consist of creating a lot of
      entries, set a very small Netlink buffer in conntrackd (+ a patch
      which is not in the git tree to set the BROADCAST_ERROR flag)
      and invoke `conntrack -F'.
      
      For expectations, no changes are introduced in this patch.
      Currently, event delivery is only done for new expectations (no
      events from expectation expiration, removal and confirmation).
      In that case, they need a per-expectation event cache to implement
      the same idea that is exposed in this patch.
      
      This patch can be useful to provide reliable flow-accouting. We
      still have to add a new conntrack extension to store the creation
      and destroy time.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      dd7669a9
    • P
      netfilter: conntrack: move event caching to conntrack extension infrastructure · a0891aa6
      Pablo Neira Ayuso 提交于
      This patch reworks the per-cpu event caching to use the conntrack
      extension infrastructure.
      
      The main drawback is that we consume more memory per conntrack
      if event delivery is enabled. This patch is required by the
      reliable event delivery that follows to this patch.
      
      BTW, this patch allows you to enable/disable event delivery via
      /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
      you can still disable event caching as compilation option.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a0891aa6
  11. 26 3月, 2009 1 次提交
    • E
      netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() · ea781f19
      Eric Dumazet 提交于
      Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.
      
      This permits an easy conversion from call_rcu() based hash lists to a
      SLAB_DESTROY_BY_RCU one.
      
      Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.
      
      First, it doesnt fill RCU queues (up to 10000 elements per cpu).
      This reduces OOM possibility, if queued elements are not taken into account
      This reduces latency problems when RCU queue size hits hilimit and triggers
      emergency mode.
      
      - It allows fast reuse of just freed elements, permitting better use of
      CPU cache.
      
      - We delete rcu_head from "struct nf_conn", shrinking size of this structure
      by 8 or 16 bytes.
      
      This patch only takes care of "struct nf_conn".
      call_rcu() is still used for less critical conntrack parts, that may
      be converted later if necessary.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      ea781f19
  12. 08 10月, 2008 11 次提交