1. 11 3月, 2014 1 次提交
  2. 07 3月, 2014 2 次提交
  3. 01 3月, 2014 2 次提交
  4. 19 2月, 2014 1 次提交
  5. 12 2月, 2014 1 次提交
    • F
      flowcache: Make flow cache name space aware · ca925cf1
      Fan Du 提交于
      Inserting a entry into flowcache, or flushing flowcache should be based
      on per net scope. The reason to do so is flushing operation from fat
      netns crammed with flow entries will also making the slim netns with only
      a few flow cache entries go away in original implementation.
      
      Since flowcache is tightly coupled with IPsec, so it would be easier to
      put flow cache global parameters into xfrm namespace part. And one last
      thing needs to do is bumping flow cache genid, and flush flow cache should
      also be made in per net style.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ca925cf1
  6. 20 1月, 2014 1 次提交
  7. 15 1月, 2014 1 次提交
  8. 14 1月, 2014 1 次提交
    • H
      ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing · f87c10a8
      Hannes Frederic Sowa 提交于
      While forwarding we should not use the protocol path mtu to calculate
      the mtu for a forwarded packet but instead use the interface mtu.
      
      We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
      introduced for multicast forwarding. But as it does not conflict with
      our usage in unicast code path it is perfect for reuse.
      
      I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
      along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
      dependencies because of IPSKB_FORWARDED.
      
      Because someone might have written a software which does probe
      destinations manually and expects the kernel to honour those path mtus
      I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
      can disable this new behaviour. We also still use mtus which are locked on a
      route for forwarding.
      
      The reason for this change is, that path mtus information can be injected
      into the kernel via e.g. icmp_err protocol handler without verification
      of local sockets. As such, this could cause the IPv4 forwarding path to
      wrongfully emit fragmentation needed notifications or start to fragment
      packets along a path.
      
      Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
      won't be set and further fragmentation logic will use the path mtu to
      determine the fragmentation size. They also recheck packet size with
      help of path mtu discovery and report appropriate errors.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f87c10a8
  9. 08 1月, 2014 2 次提交
  10. 19 12月, 2013 1 次提交
  11. 13 12月, 2013 1 次提交
  12. 06 12月, 2013 2 次提交
    • S
      xfrm: Remove ancient sleeping when the SA is in acquire state · 5b8ef341
      Steffen Klassert 提交于
      We now queue packets to the policy if the states are not yet resolved,
      this replaces the ancient sleeping code. Also the sleeping can cause
      indefinite task hangs if the needed state does not get resolved.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5b8ef341
    • F
      xfrm: Namespacify xfrm state/policy locks · 283bc9f3
      Fan Du 提交于
      By semantics, xfrm layer is fully name space aware,
      so will the locks, e.g. xfrm_state/pocliy_lock.
      Ensure exclusive access into state/policy link list
      for different name space with one global lock is not
      right in terms of semantics aspect at first place,
      as they are indeed mutually independent with each
      other, but also more seriously causes scalability
      problem.
      
      One practical scenario is on a Open Network Stack,
      more than hundreds of lxc tenants acts as routers
      within one host, a global xfrm_state/policy_lock
      becomes the bottleneck. But onces those locks are
      decoupled in a per-namespace fashion, locks contend
      is just with in specific name space scope, without
      causing additional SPD/SAD access delay for other
      name space.
      
      Also this patch improve scalability while as without
      changing original xfrm behavior.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      283bc9f3
  13. 22 10月, 2013 1 次提交
  14. 15 10月, 2013 3 次提交
    • P
      netfilter: nf_tables: add ARP filtering support · ed683f13
      Pablo Neira Ayuso 提交于
      This patch registers the ARP family and he filter chain type
      for this family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed683f13
    • P
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso 提交于
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • P
      netfilter: nf_tables: complete net namespace support · 99633ab2
      Pablo Neira Ayuso 提交于
      Register family per netnamespace to ensure that sets are
      only visible in its approapriate namespace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99633ab2
  15. 01 10月, 2013 1 次提交
  16. 10 8月, 2013 2 次提交
  17. 01 8月, 2013 1 次提交
  18. 23 5月, 2013 1 次提交
  19. 06 4月, 2013 2 次提交
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
    • G
      netfilter: make /proc/net/netfilter pernet · f3c1a44a
      Gao feng 提交于
      This patch makes this proc dentry pernet. So far only init_net
      had a /proc/net/netfilter directory.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f3c1a44a
  20. 25 3月, 2013 1 次提交
  21. 06 2月, 2013 1 次提交
  22. 18 1月, 2013 1 次提交
    • F
      netfilter: add connlabel conntrack extension · c539f017
      Florian Westphal 提交于
      similar to connmarks, except labels are bit-based; i.e.
      all labels may be attached to a flow at the same time.
      
      Up to 128 labels are supported.  Supporting more labels
      is possible, but requires increasing the ct offset delta
      from u8 to u16 type due to increased extension sizes.
      
      Mapping of bit-identifier to label name is done in userspace.
      
      The extension is enabled at run-time once "-m connlabel" netfilter
      rules are added.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c539f017
  23. 07 1月, 2013 1 次提交
  24. 24 12月, 2012 1 次提交
    • P
      netfilter: xt_CT: recover NOTRACK target support · 10db9069
      Pablo Neira Ayuso 提交于
      Florian Westphal reported that the removal of the NOTRACK target
      (96550501 netfilter: remove xt_NOTRACK) is breaking some existing
      setups.
      
      That removal was scheduled for removal since long time ago as
      described in Documentation/feature-removal-schedule.txt
      
      What:  xt_NOTRACK
      Files: net/netfilter/xt_NOTRACK.c
      When:  April 2011
      Why:   Superseded by xt_CT
      
      Still, people may have not notice / may have decided to stick to an
      old iptables version. I agree with him in that some more conservative
      approach by spotting some printk to warn users for some time is less
      agressive.
      
      Current iptables 1.4.16.3 already contains the aliasing support
      that makes it point to the CT target, so upgrading would fix it.
      Still, the policy so far has been to avoid pushing our users to
      upgrade.
      
      As a solution, this patch recovers the NOTRACK target inside the CT
      target and it now spots a warning.
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      10db9069
  25. 17 12月, 2012 1 次提交
    • P
      netfilter: xt_CT: fix crash while destroy ct templates · 252b3e8c
      Pablo Neira Ayuso 提交于
      In (d871befe netfilter: ctnetlink: dump entries from the dying and
      unconfirmed lists), we assume that all conntrack objects are
      inserted in any of the existing lists. However, template conntrack
      objects were not. This results in hitting BUG_ON in the
      destroy_conntrack path while removing a rule that uses the CT target.
      
      This patch fixes the situation by adding the template lists, which
      is where template conntrack objects reside now.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      252b3e8c
  26. 26 10月, 2012 1 次提交
    • N
      sctp: Make hmac algorithm selection for cookie generation dynamic · 3c68198e
      Neil Horman 提交于
      Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
      generate cookie values when establishing new connections via two build time
      config options.  Theres no real reason to make this a static selection.  We can
      add a sysctl that allows for the dynamic selection of these algorithms at run
      time, with the default value determined by the corresponding crypto library
      availability.
      This comes in handy when, for example running a system in FIPS mode, where use
      of md5 is disallowed, but SHA1 is permitted.
      
      Note: This new sysctl has no corresponding socket option to select the cookie
      hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
      contains no option for this value, and I opted not to pollute the socket option
      namespace.
      
      Change notes:
      v2)
      	* Updated subject to have the proper sctp prefix as per Dave M.
      	* Replaced deafult selection options with new options that allow
      	  developers to explicitly select available hmac algs at build time
      	  as per suggestion by Vlad Y.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c68198e
  27. 20 9月, 2012 1 次提交
  28. 19 9月, 2012 1 次提交
  29. 30 8月, 2012 2 次提交
  30. 24 8月, 2012 1 次提交
    • R
      packet: fix broken build. · f63c45e0
      Rami Rosen 提交于
      This patch fixes a broken build due to a missing header:
      ...
        CC      net/ipv4/proc.o
      In file included from include/net/net_namespace.h:15,
                       from net/ipv4/proc.c:35:
      include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
      ...
      
      The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
      but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
      
      See commit 0fa7fa98:
      packet: Protect packet sk list with mutex (v2) patch,
      Signed-off-by: NRami Rosen <rosenr@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f63c45e0
  31. 23 8月, 2012 1 次提交
    • P
      packet: Protect packet sk list with mutex (v2) · 0fa7fa98
      Pavel Emelyanov 提交于
      Change since v1:
      
      * Fixed inuse counters access spotted by Eric
      
      In patch eea68e2f (packet: Report socket mclist info via diag module) I've
      introduced a "scheduling in atomic" problem in packet diag module -- the
      socket list is traversed under rcu_read_lock() while performed under it sk
      mclist access requires rtnl lock (i.e. -- mutex) to be taken.
      
      [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
      [152363.820573] 4 locks held by crtools/12517:
      [152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
      [152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
      [152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
      [152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
      
      Similar thing was then re-introduced by further packet diag patches (fanount
      mutex and pgvec mutex for rings) :(
      
      Apart from being terribly sorry for the above, I propose to change the packet
      sk list protection from spinlock to mutex. This lock currently protects two
      modifications:
      
      * sklist
      * prot inuse counters
      
      The sklist modifications can be just reprotected with mutex since they already
      occur in a sleeping context. The inuse counters modifications are trickier -- the
      __this_cpu_-s are used inside, thus requiring the caller to handle the potential
      issues with contexts himself. Since packet sockets' counters are modified in two
      places only (packet_create and packet_release) we only need to protect the context
      from being preempted. BH disabling is not required in this case.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa7fa98