1. 05 12月, 2015 1 次提交
    • H
      rhashtable: Prevent spurious EBUSY errors on insertion · 3cf92222
      Herbert Xu 提交于
      Thomas and Phil observed that under stress rhashtable insertion
      sometimes failed with EBUSY, even though this error should only
      ever been seen when we're under attack and our hash chain length
      has grown to an unacceptable level, even after a rehash.
      
      It turns out that the logic for detecting whether there is an
      existing rehash is faulty.  In particular, when two threads both
      try to grow the same table at the same time, one of them may see
      the newly grown table and thus erroneously conclude that it had
      been rehashed.  This is what leads to the EBUSY error.
      
      This patch fixes this by remembering the current last table we
      used during insertion so that rhashtable_insert_rehash can detect
      when another thread has also done a resize/rehash.  When this is
      detected we will give up our resize/rehash and simply retry the
      insertion with the new table.
      Reported-by: NThomas Graf <tgraf@suug.ch>
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cf92222
  2. 17 5月, 2015 1 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
  3. 24 4月, 2015 1 次提交
    • J
      rhashtable: don't attempt to grow when at max_size · 1d8dc3d3
      Johannes Berg 提交于
      The conversion of mac80211's station table to rhashtable had a bug
      that I found by accident in code review, that hadn't been found as
      rhashtable apparently managed to have a maximum hash chain length
      of one (!) in all our testing.
      
      In order to test the bug and verify the fix I set my rhashtable's
      max_size very low (4) in order to force getting hash collisions.
      
      At that point, rhashtable WARNed in rhashtable_insert_rehash() but
      didn't actually reject the hash table insertion. This caused it to
      lose insertions - my master list of stations would have 9 entries,
      but the rhashtable only had 5. This may warrant a deeper look, but
      that WARN_ON() just shouldn't happen.
      
      Fix this by not returning true from rht_grow_above_100() when the
      rhashtable's max_size has been reached - in this case the user is
      explicitly configuring it to be at most that big, so even if it's
      now above 100% it shouldn't attempt to resize.
      
      This fixes the "lost insertion" issue and consequently allows my
      code to display its error (and verify my fix for it.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8dc3d3
  4. 26 3月, 2015 1 次提交
  5. 25 3月, 2015 5 次提交
  6. 24 3月, 2015 5 次提交
  7. 21 3月, 2015 4 次提交
  8. 19 3月, 2015 3 次提交
  9. 15 3月, 2015 4 次提交
  10. 13 3月, 2015 1 次提交
    • D
      rhashtable: kill ht->shift atomic operations · a5b6846f
      Daniel Borkmann 提交于
      Commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker
      queue") changed ht->shift to be atomic, which is actually unnecessary.
      
      Instead of leaving the current shift in the core rhashtable structure,
      it can be cached inside the individual bucket tables.
      
      There, it will only be initialized once during a new table allocation
      in the shrink/expansion slow path, and from then onward it stays immutable
      for the rest of the bucket table liftime.
      
      That allows shift to be non-atomic. The patch also moves hash_rnd
      management into the table setup. The rhashtable structure now consumes
      3 instead of 4 cachelines.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Ying Xue <ying.xue@windriver.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b6846f
  11. 12 3月, 2015 1 次提交
  12. 28 2月, 2015 1 次提交
  13. 22 2月, 2015 1 次提交
  14. 05 2月, 2015 1 次提交
    • H
      rhashtable: Introduce rhashtable_walk_* · f2dba9c6
      Herbert Xu 提交于
      Some existing rhashtable users get too intimate with it by walking
      the buckets directly.  This prevents us from easily changing the
      internals of rhashtable.
      
      This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
      which will replace these custom walkers.
      
      They are meant to be usable for both procfs seq_file walks as well
      as walking by a netlink dump.  The iterator structure should fit
      inside a netlink dump cb structure, with at least one element to
      spare.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2dba9c6
  15. 26 1月, 2015 1 次提交
  16. 16 1月, 2015 1 次提交
    • Y
      rhashtable: Fix race in rhashtable_destroy() and use regular work_struct · 57699a40
      Ying Xue 提交于
      When we put our declared work task in the global workqueue with
      schedule_delayed_work(), its delay parameter is always zero.
      Therefore, we should define a regular work in rhashtable structure
      instead of a delayed work.
      
      By the way, we add a condition to check whether resizing functions
      are NULL before cancelling the work, avoiding to cancel an
      uninitialized work.
      
      Lastly, while we wait for all work items we submitted before to run
      to completion with cancel_delayed_work(), ht->mutex has been taken in
      rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
      until all work items are accomplished, and when work items are
      scheduled, the work's function - rht_deferred_worker() will be called.
      However, as rht_deferred_worker() also needs to acquire the lock,
      deadlock might happen at the moment as the lock is already held before.
      So if the cancel work function is moved out of the lock covered scope,
      this will avoid the deadlock.
      
      Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57699a40
  17. 14 1月, 2015 2 次提交
  18. 09 1月, 2015 2 次提交
  19. 05 1月, 2015 1 次提交
  20. 04 1月, 2015 3 次提交
    • T
      rhashtable: Supports for nulls marker · f89bd6f8
      Thomas Graf 提交于
      In order to allow for wider usage of rhashtable, use a special nulls
      marker to terminate each chain. The reason for not using the existing
      nulls_list is that the prev pointer usage would not be valid as entries
      can be linked in two different buckets at the same time.
      
      The 4 nulls base bits can be set through the rhashtable_params structure
      like this:
      
      struct rhashtable_params params = {
              [...]
              .nulls_base = (1U << RHT_BASE_SHIFT),
      };
      
      This reduces the hash length from 32 bits to 27 bits.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f89bd6f8
    • T
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf 提交于
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97defe1e
    • T
      nft_hash: Remove rhashtable_remove_pprev() · 897362e4
      Thomas Graf 提交于
      The removal function of nft_hash currently stores a reference to the
      previous element during lookup which is used to optimize removal later
      on. This was possible because a lock is held throughout calling
      rhashtable_lookup() and rhashtable_remove().
      
      With the introdution of deferred table resizing in parallel to lookups
      and insertions, the nftables lock will no longer synchronize all
      table mutations and the stored pprev may become invalid.
      
      Removing this optimization makes removal slightly more expensive on
      average but allows taking the resize cost out of the insert and
      remove path.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      897362e4