1. 15 3月, 2015 1 次提交
    • H
      rhashtable: Fix walker behaviour during rehash · eddee5ba
      Herbert Xu 提交于
      Previously whenever the walker encountered a resize it simply
      snaps back to the beginning and starts again.  However, this only
      works if the rehash started and completed while the walker was
      idle.
      
      If the walker attempts to restart while the rehash is still ongoing,
      we may miss objects that we shouldn't have.
      
      This patch fixes this by making the walker walk the old table
      followed by the new table just like all other readers.  If a
      rehash is detected we will still signal our caller of the fact
      so they can prepare for duplicates but we will simply continue
      the walk onto the new table after the old one is finished either
      by us or by the rehasher.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eddee5ba
  2. 13 3月, 2015 7 次提交
  3. 12 3月, 2015 3 次提交
  4. 28 2月, 2015 3 次提交
    • E
      rhashtable: use cond_resched() · 5beb5c90
      Eric Dumazet 提交于
      If a hash table has 128 slots and 16384 elems, expand to 256 slots
      takes more than one second. For larger sets, a soft lockup is detected.
      
      Holding cpu for that long, even in a work queue is a show stopper
      for non preemptable kernels.
      
      cond_resched() at strategic points to allow process scheduler
      to reschedule us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5beb5c90
    • D
      rhashtable: remove indirection for grow/shrink decision functions · 4c4b52d9
      Daniel Borkmann 提交于
      Currently, all real users of rhashtable default their grow and shrink
      decision functions to rht_grow_above_75() and rht_shrink_below_30(),
      so that there's currently no need to have this explicitly selectable.
      
      It can/should be generic and private inside rhashtable until a real
      use case pops up. Since we can make this private, we'll save us this
      additional indirection layer and can improve insertion/deletion time
      as well.
      
      Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c4b52d9
    • D
      rhashtable: unconditionally grow when max_shift is not specified · 8331de75
      Daniel Borkmann 提交于
      While commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for
      worker queue") rightfully moved part of the decision making of
      whether we should expand or shrink from the expand/shrink functions
      themselves into insert/delete functions in order to avoid unnecessary
      worker wake-ups, it however introduced a regression by doing so.
      
      Before that change, if no max_shift was specified (= 0) on rhashtable
      initialization, rhashtable_expand() would just grow unconditionally
      and lets the available memory be the limiting factor. After that
      change, if no max_shift was specified, there would be _no_ expansion
      step at all.
      
      Given that netlink and tipc have a max_shift specified, it was not
      visible there, but Josh Hunt reported that if nft that starts out
      with a default element hint of 3 if not otherwise provided, would
      slow i.e. inserts down trememdously as it cannot grow larger to
      relax table occupancy.
      
      Given that the test case verifies shrinks/expands manually, we also
      must remove pointer to the helper functions to explicitly avoid
      parallel resizing on insertions/deletions. test_bucket_stats() and
      test_rht_lookup() could also be wrapped around rhashtable mutex to
      explicitly synchronize a walk from resizing, but I think that defeats
      the actual test case which intended to have explicit test steps,
      i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object
      verification after each stage.
      Reported-by: NJosh Hunt <johunt@akamai.com>
      Fixes: c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: Josh Hunt <johunt@akamai.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8331de75
  5. 24 2月, 2015 1 次提交
  6. 21 2月, 2015 2 次提交
  7. 09 2月, 2015 1 次提交
  8. 07 2月, 2015 7 次提交
  9. 05 2月, 2015 2 次提交
  10. 31 1月, 2015 1 次提交
  11. 27 1月, 2015 1 次提交
  12. 16 1月, 2015 1 次提交
    • Y
      rhashtable: Fix race in rhashtable_destroy() and use regular work_struct · 57699a40
      Ying Xue 提交于
      When we put our declared work task in the global workqueue with
      schedule_delayed_work(), its delay parameter is always zero.
      Therefore, we should define a regular work in rhashtable structure
      instead of a delayed work.
      
      By the way, we add a condition to check whether resizing functions
      are NULL before cancelling the work, avoiding to cancel an
      uninitialized work.
      
      Lastly, while we wait for all work items we submitted before to run
      to completion with cancel_delayed_work(), ht->mutex has been taken in
      rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
      until all work items are accomplished, and when work items are
      scheduled, the work's function - rht_deferred_worker() will be called.
      However, as rht_deferred_worker() also needs to acquire the lock,
      deadlock might happen at the moment as the lock is already held before.
      So if the cancel work function is moved out of the lock covered scope,
      this will avoid the deadlock.
      
      Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57699a40
  13. 14 1月, 2015 2 次提交
  14. 09 1月, 2015 6 次提交
  15. 04 1月, 2015 2 次提交
    • T
      rhashtable: Supports for nulls marker · f89bd6f8
      Thomas Graf 提交于
      In order to allow for wider usage of rhashtable, use a special nulls
      marker to terminate each chain. The reason for not using the existing
      nulls_list is that the prev pointer usage would not be valid as entries
      can be linked in two different buckets at the same time.
      
      The 4 nulls base bits can be set through the rhashtable_params structure
      like this:
      
      struct rhashtable_params params = {
              [...]
              .nulls_base = (1U << RHT_BASE_SHIFT),
      };
      
      This reduces the hash length from 32 bits to 27 bits.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f89bd6f8
    • T
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf 提交于
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97defe1e