1. 06 12月, 2013 2 次提交
    • S
      xfrm: Remove ancient sleeping when the SA is in acquire state · 5b8ef341
      Steffen Klassert 提交于
      We now queue packets to the policy if the states are not yet resolved,
      this replaces the ancient sleeping code. Also the sleeping can cause
      indefinite task hangs if the needed state does not get resolved.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5b8ef341
    • F
      xfrm: Namespacify xfrm state/policy locks · 283bc9f3
      Fan Du 提交于
      By semantics, xfrm layer is fully name space aware,
      so will the locks, e.g. xfrm_state/pocliy_lock.
      Ensure exclusive access into state/policy link list
      for different name space with one global lock is not
      right in terms of semantics aspect at first place,
      as they are indeed mutually independent with each
      other, but also more seriously causes scalability
      problem.
      
      One practical scenario is on a Open Network Stack,
      more than hundreds of lxc tenants acts as routers
      within one host, a global xfrm_state/policy_lock
      becomes the bottleneck. But onces those locks are
      decoupled in a per-namespace fashion, locks contend
      is just with in specific name space scope, without
      causing additional SPD/SAD access delay for other
      name space.
      
      Also this patch improve scalability while as without
      changing original xfrm behavior.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      283bc9f3
  2. 22 10月, 2013 1 次提交
  3. 15 10月, 2013 3 次提交
    • P
      netfilter: nf_tables: add ARP filtering support · ed683f13
      Pablo Neira Ayuso 提交于
      This patch registers the ARP family and he filter chain type
      for this family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed683f13
    • P
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso 提交于
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
  4. 01 10月, 2013 1 次提交
  5. 10 8月, 2013 2 次提交
  6. 01 8月, 2013 1 次提交
  7. 23 5月, 2013 1 次提交
  8. 06 4月, 2013 2 次提交
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
    • G
      netfilter: make /proc/net/netfilter pernet · f3c1a44a
      Gao feng 提交于
      This patch makes this proc dentry pernet. So far only init_net
      had a /proc/net/netfilter directory.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f3c1a44a
  9. 25 3月, 2013 1 次提交
  10. 06 2月, 2013 1 次提交
  11. 18 1月, 2013 1 次提交
    • F
      netfilter: add connlabel conntrack extension · c539f017
      Florian Westphal 提交于
      similar to connmarks, except labels are bit-based; i.e.
      all labels may be attached to a flow at the same time.
      
      Up to 128 labels are supported.  Supporting more labels
      is possible, but requires increasing the ct offset delta
      from u8 to u16 type due to increased extension sizes.
      
      Mapping of bit-identifier to label name is done in userspace.
      
      The extension is enabled at run-time once "-m connlabel" netfilter
      rules are added.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c539f017
  12. 07 1月, 2013 1 次提交
  13. 24 12月, 2012 1 次提交
    • P
      netfilter: xt_CT: recover NOTRACK target support · 10db9069
      Pablo Neira Ayuso 提交于
      Florian Westphal reported that the removal of the NOTRACK target
      (96550501 netfilter: remove xt_NOTRACK) is breaking some existing
      setups.
      
      That removal was scheduled for removal since long time ago as
      described in Documentation/feature-removal-schedule.txt
      
      What:  xt_NOTRACK
      Files: net/netfilter/xt_NOTRACK.c
      When:  April 2011
      Why:   Superseded by xt_CT
      
      Still, people may have not notice / may have decided to stick to an
      old iptables version. I agree with him in that some more conservative
      approach by spotting some printk to warn users for some time is less
      agressive.
      
      Current iptables 1.4.16.3 already contains the aliasing support
      that makes it point to the CT target, so upgrading would fix it.
      Still, the policy so far has been to avoid pushing our users to
      upgrade.
      
      As a solution, this patch recovers the NOTRACK target inside the CT
      target and it now spots a warning.
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      10db9069
  14. 17 12月, 2012 1 次提交
    • P
      netfilter: xt_CT: fix crash while destroy ct templates · 252b3e8c
      Pablo Neira Ayuso 提交于
      In (d871befe netfilter: ctnetlink: dump entries from the dying and
      unconfirmed lists), we assume that all conntrack objects are
      inserted in any of the existing lists. However, template conntrack
      objects were not. This results in hitting BUG_ON in the
      destroy_conntrack path while removing a rule that uses the CT target.
      
      This patch fixes the situation by adding the template lists, which
      is where template conntrack objects reside now.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      252b3e8c
  15. 26 10月, 2012 1 次提交
    • N
      sctp: Make hmac algorithm selection for cookie generation dynamic · 3c68198e
      Neil Horman 提交于
      Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
      generate cookie values when establishing new connections via two build time
      config options.  Theres no real reason to make this a static selection.  We can
      add a sysctl that allows for the dynamic selection of these algorithms at run
      time, with the default value determined by the corresponding crypto library
      availability.
      This comes in handy when, for example running a system in FIPS mode, where use
      of md5 is disallowed, but SHA1 is permitted.
      
      Note: This new sysctl has no corresponding socket option to select the cookie
      hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
      contains no option for this value, and I opted not to pollute the socket option
      namespace.
      
      Change notes:
      v2)
      	* Updated subject to have the proper sctp prefix as per Dave M.
      	* Replaced deafult selection options with new options that allow
      	  developers to explicitly select available hmac algs at build time
      	  as per suggestion by Vlad Y.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c68198e
  16. 20 9月, 2012 1 次提交
  17. 19 9月, 2012 1 次提交
  18. 30 8月, 2012 2 次提交
  19. 24 8月, 2012 1 次提交
    • R
      packet: fix broken build. · f63c45e0
      Rami Rosen 提交于
      This patch fixes a broken build due to a missing header:
      ...
        CC      net/ipv4/proc.o
      In file included from include/net/net_namespace.h:15,
                       from net/ipv4/proc.c:35:
      include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
      ...
      
      The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
      but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
      
      See commit 0fa7fa98:
      packet: Protect packet sk list with mutex (v2) patch,
      Signed-off-by: NRami Rosen <rosenr@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f63c45e0
  20. 23 8月, 2012 1 次提交
    • P
      packet: Protect packet sk list with mutex (v2) · 0fa7fa98
      Pavel Emelyanov 提交于
      Change since v1:
      
      * Fixed inuse counters access spotted by Eric
      
      In patch eea68e2f (packet: Report socket mclist info via diag module) I've
      introduced a "scheduling in atomic" problem in packet diag module -- the
      socket list is traversed under rcu_read_lock() while performed under it sk
      mclist access requires rtnl lock (i.e. -- mutex) to be taken.
      
      [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
      [152363.820573] 4 locks held by crtools/12517:
      [152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
      [152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
      [152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
      [152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
      
      Similar thing was then re-introduced by further packet diag patches (fanount
      mutex and pgvec mutex for rings) :(
      
      Apart from being terribly sorry for the above, I propose to change the packet
      sk list protection from spinlock to mutex. This lock currently protects two
      modifications:
      
      * sklist
      * prot inuse counters
      
      The sklist modifications can be just reprotected with mutex since they already
      occur in a sleeping context. The inuse counters modifications are trickier -- the
      __this_cpu_-s are used inside, thus requiring the caller to handle the potential
      issues with contexts himself. Since packet sockets' counters are modified in two
      places only (packet_create and packet_release) we only need to protect the context
      from being preempted. BH disabling is not required in this case.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa7fa98
  21. 15 8月, 2012 7 次提交
  22. 31 7月, 2012 1 次提交
  23. 21 7月, 2012 1 次提交
  24. 20 7月, 2012 1 次提交
    • E
      ipv4: tcp: remove per net tcp_sock · be9f4a44
      Eric Dumazet 提交于
      tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
      per network namespace.
      
      This leads to bad behavior on multiqueue NICS, because many cpus
      contend for the socket lock and once socket lock is acquired, extra
      false sharing on various socket fields slow down the operations.
      
      To better resist to attacks, we use a percpu socket. Each cpu can
      run without contention, using appropriate memory (local node)
      
      Additional features :
      
      1) We also mirror the queue_mapping of the incoming skb, so that
      answers use the same queue if possible.
      
      2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()
      
      3) We now limit the number of in-flight RST/ACK [1] packets
      per cpu, instead of per namespace, and we honor the sysctl_wmem_default
      limit dynamically. (Prior to this patch, sysctl_wmem_default value was
      copied at boot time, so any further change would not affect tcp_sock
      limit)
      
      [1] These packets are only generated when no socket was matched for
      the incoming packet.
      Reported-by: NBill Sommerfeld <wsommerfeld@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be9f4a44
  25. 11 7月, 2012 1 次提交
    • D
      tcp: Maintain dynamic metrics in local cache. · 51c5d0c4
      David S. Miller 提交于
      Maintain a local hash table of TCP dynamic metrics blobs.
      
      Computed TCP metrics are no longer maintained in the route metrics.
      
      The table uses RCU and an extremely simple hash so that it has low
      latency and low overhead.  A simple hash is legitimate because we only
      make metrics blobs for fully established connections.
      
      Some tweaking of the default hash table sizes, metric timeouts, and
      the hash chain length limit certainly could use some tweaking.  But
      the basic design seems sound.
      
      With help from Eric Dumazet and Joe Perches.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c5d0c4
  26. 06 7月, 2012 1 次提交
  27. 09 6月, 2012 1 次提交
  28. 07 6月, 2012 1 次提交