1. 11 7月, 2016 1 次提交
  2. 24 6月, 2016 1 次提交
    • P
      netfilter: nft_hash: support deletion of inactive elements · 8eee54be
      Pablo Neira Ayuso 提交于
      New elements are inactive in the preparation phase, and its
      NFT_SET_ELEM_BUSY_MASK flag is set on.
      
      This busy flag doesn't allow us to delete it from the same transaction,
      following a sequence like:
      
      	begin transaction
      	add element X
      	delete element X
      	end transaction
      
      This sequence is valid and may be triggered by robots. To resolve this
      problem, allow deactivating elements that are active in the current
      generation (ie. those that has been just added in this batch).
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8eee54be
  3. 15 6月, 2016 1 次提交
    • P
      netfilter: nf_tables: reject loops from set element jump to chain · 8588ac09
      Pablo Neira Ayuso 提交于
      Liping Zhang says:
      
      "Users may add such a wrong nft rules successfully, which will cause an
      endless jump loop:
      
        # nft add rule filter test tcp dport vmap {1: jump test}
      
      This is because before we commit, the element in the current anonymous
      set is inactive, so osp->walk will skip this element and miss the
      validate check."
      
      To resolve this problem, this patch passes the generation mask to the
      walk function through the iter container structure depending on the code
      path:
      
      1) If we're dumping the elements, then we have to check if the element
         is active in the current generation. Thus, we check for the current
         bit in the genmask.
      
      2) If we're checking for loops, then we have to check if the element is
         active in the next generation, as we're in the middle of a
         transaction. Thus, we check for the next bit in the genmask.
      
      Based on original patch from Liping Zhang.
      Reported-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      8588ac09
  4. 05 4月, 2016 1 次提交
  5. 13 4月, 2015 4 次提交
  6. 08 4月, 2015 2 次提交
  7. 01 4月, 2015 1 次提交
  8. 26 3月, 2015 7 次提交
  9. 25 3月, 2015 2 次提交
  10. 21 3月, 2015 1 次提交
    • H
      netfilter: Convert nft_hash to inlined rhashtable · fa377321
      Herbert Xu 提交于
      This patch converts nft_hash to the inlined rhashtable interface.
      
      This patch also replaces the call to rhashtable_lookup_compare with
      a straight rhashtable_lookup_fast because it's simply doing a memcmp
      (in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).
      
      Furthermore, the compare function is only meant to compare, it is not
      supposed to have side-effects.  The current side-effect code can
      simply be moved into the nft_hash_get.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa377321
  11. 13 3月, 2015 1 次提交
  12. 28 2月, 2015 1 次提交
  13. 05 2月, 2015 1 次提交
  14. 04 1月, 2015 4 次提交
    • T
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf 提交于
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97defe1e
    • T
      nft_hash: Remove rhashtable_remove_pprev() · 897362e4
      Thomas Graf 提交于
      The removal function of nft_hash currently stores a reference to the
      previous element during lookup which is used to optimize removal later
      on. This was possible because a lock is held throughout calling
      rhashtable_lookup() and rhashtable_remove().
      
      With the introdution of deferred table resizing in parallel to lookups
      and insertions, the nftables lock will no longer synchronize all
      table mutations and the stored pprev may become invalid.
      
      Removing this optimization makes removal slightly more expensive on
      average but allows taking the resize cost out of the insert and
      remove path.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      897362e4
    • T
      rhashtable: Convert bucket iterators to take table and index · 88d6ed15
      Thomas Graf 提交于
      This patch is in preparation to introduce per bucket spinlocks. It
      extends all iterator macros to take the bucket table and bucket
      index. It also introduces a new rht_dereference_bucket() to
      handle protected accesses to buckets.
      
      It introduces a barrier() to the RCU iterators to the prevent
      the compiler from caching the first element.
      
      The lockdep verifier is introduced as stub which always succeeds
      and properly implement in the next patch when the locks are
      introduced.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88d6ed15
    • T
      rhashtable: Do hashing inside of rhashtable_lookup_compare() · 8d24c0b4
      Thomas Graf 提交于
      Hash the key inside of rhashtable_lookup_compare() like
      rhashtable_lookup() does. This allows to simplify the hashing
      functions and keep them private.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d24c0b4
  15. 14 11月, 2014 3 次提交
  16. 03 9月, 2014 1 次提交
    • P
      netfilter: nft_hash: no need for rcu in the hash set destroy path · 39f39016
      Pablo Neira Ayuso 提交于
      The sets are released from the rcu callback, after the rule is removed
      from the chain list, which implies that nfnetlink cannot update the
      hashes (thus, no resizing may occur) and no packets are walking on the
      set anymore.
      
      This resolves a lockdep splat in the nft_hash_destroy() path since the
      nfnl mutex is not held there.
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.16.0-rc2+ #168 Not tainted
      -------------------------------
      net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 1
      1 lock held by ksoftirqd/0/3:
       #0:  (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7
      
      stack backtrace:
      CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168
      Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
       0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006
       ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400
       ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08
      Call Trace:
       [<ffffffff8142c922>] dump_stack+0x4e/0x68
       [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103
       [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash]
       [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables]
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      39f39016
  17. 03 8月, 2014 1 次提交
  18. 05 6月, 2014 1 次提交
  19. 03 4月, 2014 2 次提交
    • P
      netfilter: nft_hash: use set global element counter instead of private one · 2c96c25d
      Patrick McHardy 提交于
      Now that nf_tables performs global accounting of set elements, it is not
      needed in the hash type anymore.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c96c25d
    • P
      netfilter: nf_tables: implement proper set selection · c50b960c
      Patrick McHardy 提交于
      The current set selection simply choses the first set type that provides
      the requested features, which always results in the rbtree being chosen
      by virtue of being the first set in the list.
      
      What we actually want to do is choose the implementation that can provide
      the requested features and is optimal from either a performance or memory
      perspective depending on the characteristics of the elements and the
      preferences specified by the user.
      
      The elements are not known when creating a set. Even if we would provide
      them for anonymous (literal) sets, we'd still have standalone sets where
      the elements are not known in advance. We therefore need an abstract
      description of the data charcteristics.
      
      The kernel already knows the size of the key, this patch starts by
      introducing a nested set description which so far contains only the maximum
      amount of elements. Based on this the set implementations are changed to
      provide an estimate of the required amount of memory and the lookup
      complexity class.
      
      The set ops have a new callback ->estimate() that is invoked during set
      selection. It receives a structure containing the attributes known to the
      kernel and is supposed to populate a struct nft_set_estimate with the
      complexity class and, in case the size is known, the complete amount of
      memory required, or the amount of memory required per element otherwise.
      
      Based on the policy specified by the user (performance/memory, defaulting
      to performance) the kernel will then select the best suited implementation.
      
      Even if the set implementation would allow to add more than the specified
      maximum amount of elements, they are enforced since new implementations
      might not be able to add more than maximum based on which they were
      selected.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c50b960c
  20. 19 3月, 2014 1 次提交
  21. 07 3月, 2014 1 次提交
    • P
      netfilter: nft_hash: bug fixes and resizing · ce6eb0d7
      Patrick McHardy 提交于
      The hash set type is very broken and was never meant to be merged in this
      state. Missing RCU synchronization on element removal, leaking chain
      refcounts when used as a verdict map, races during lookups, a fixed table
      size are probably just some of the problems. Luckily it is currently
      never chosen by the kernel when the rbtree type is also available.
      
      Rewrite it to be usable.
      
      The new implementation supports automatic hash table resizing using RCU,
      based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing
      For RCU-Protected Hash Tables" described in [1].
      
      Resizing doesn't require a second list head in the elements, it works by
      chosing a hash function that remaps elements to a predictable set of buckets,
      only resizing by integral factors and
      
      - during expansion: linking new buckets to the old bucket that contains
        elements for any of the new buckets, thereby creating imprecise chains,
        then incrementally seperating the elements until the new buckets only
        contain elements that hash directly to them.
      
      - during shrinking: linking the hash chains of all old buckets that hash
        to the same new bucket to form a single chain.
      
      Expansion requires at most the number of elements in the longest hash chain
      grace periods, shrinking requires a single grace period.
      
      Due to the requirement of having hash chains/elements linked to multiple
      buckets during resizing, homemade single linked lists are used instead of
      the existing list helpers, that don't support this in a clean fashion.
      As a side effect, the amount of memory required per element is reduced by
      one pointer.
      
      Expansion is triggered when the load factors exceeds 75%, shrinking when
      the load factor goes below 30%. Both operations are allowed to fail and
      will be retried on the next insertion or removal if their respective
      conditions still hold.
      
      [1] http://dl.acm.org/citation.cfm?id=2002181.2002192Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ce6eb0d7
  22. 06 1月, 2014 1 次提交
  23. 20 12月, 2013 1 次提交