提交 · 9497df88ab5567daa001829051c5f87161a81ff0 · openeuler / raspberrypi-kernel

13 3月, 2015 5 次提交

rhashtable: Fix reader/rehash race · 9497df88

由 Herbert Xu 提交于 3月 12, 2015

There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.

This patch closes this window by adding smp_wmb/smp_rmb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9497df88

rhashtable: Remove obj_raw_hashfn · ec9f71c5

由 Herbert Xu 提交于 3月 12, 2015

Now that the only caller of obj_raw_hashfn is head_hashfn, we can
simply kill it and fold it into the latter.

This patch also moves the common shift from head_hashfn/key_hashfn
into rht_bucket_index.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec9f71c5

rhashtable: Remove key length argument to key_hashfn · cffaa9cb

由 Herbert Xu 提交于 3月 12, 2015

key_hashfn has only one caller and it doesn't really need to supply
the key length as an extra parameter.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cffaa9cb

rhashtable: Use head_hashfn instead of obj_raw_hashfn · eca84933

由 Herbert Xu 提交于 3月 12, 2015

Now that we don't have cross-table hashes, we no longer need to
keep the entire hash value so all users of obj_raw_hashfn can
use head_hashfn instead.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eca84933

rhashtable: Move masking back into key_hashfn · 8d2b1879

由 Herbert Xu 提交于 3月 12, 2015

This patch reverts commit c88455ce
("rhashtable: key_hashfn() must return full hash value") because
the only user of it always masks the hash value.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d2b1879

12 3月, 2015 3 次提交

rhashtable: Add annotation to nested lock · 84ed82b7

由 Herbert Xu 提交于 3月 12, 2015

Commit aa34a6cb ("rhashtable:
Add arbitrary rehash function") killed the annotation on the
nested lock which leads to bitching from lockdep.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84ed82b7

rhashtable: Add arbitrary rehash function · aa34a6cb

由 Herbert Xu 提交于 3月 11, 2015

This patch adds a rehash function that supports the use of any
hash function for the new table.  This is needed to support changing
the random seed value during the lifetime of the hash table.

However for now the random seed value is still constant and the
rehash function is simply used to replace the existing expand/shrink
functions.

[ ASSERT_BUCKET_LOCK() and thus debug_dump_table() +
  debug_dump_buckets() are not longer used, so delete them
  entirely. -DaveM ]
Signed-off-by: NHerbert Xu <herbert.xu@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa34a6cb

rhashtable: Move hash_rnd into bucket_table · 988dfbd7

由 Herbert Xu 提交于 3月 10, 2015

Currently hash_rnd is a parameter that users can set.  However,
no existing users set this parameter.  It is also something that
people are unlikely to want to set directly since it's just a
random number.

In preparation for allowing the reseeding/rehashing of rhashtable,
this patch moves hash_rnd into bucket_table so that it's now an
internal state rather than a parameter.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988dfbd7

28 2月, 2015 3 次提交

rhashtable: use cond_resched() · 5beb5c90

由 Eric Dumazet 提交于 2月 26, 2015

If a hash table has 128 slots and 16384 elems, expand to 256 slots
takes more than one second. For larger sets, a soft lockup is detected.

Holding cpu for that long, even in a work queue is a show stopper
for non preemptable kernels.

cond_resched() at strategic points to allow process scheduler
to reschedule us.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5beb5c90

rhashtable: remove indirection for grow/shrink decision functions · 4c4b52d9

由 Daniel Borkmann 提交于 2月 25, 2015

Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c4b52d9

rhashtable: unconditionally grow when max_shift is not specified · 8331de75

由 Daniel Borkmann 提交于 2月 25, 2015

While commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for
worker queue") rightfully moved part of the decision making of
whether we should expand or shrink from the expand/shrink functions
themselves into insert/delete functions in order to avoid unnecessary
worker wake-ups, it however introduced a regression by doing so.

Before that change, if no max_shift was specified (= 0) on rhashtable
initialization, rhashtable_expand() would just grow unconditionally
and lets the available memory be the limiting factor. After that
change, if no max_shift was specified, there would be _no_ expansion
step at all.

Given that netlink and tipc have a max_shift specified, it was not
visible there, but Josh Hunt reported that if nft that starts out
with a default element hint of 3 if not otherwise provided, would
slow i.e. inserts down trememdously as it cannot grow larger to
relax table occupancy.

Given that the test case verifies shrinks/expands manually, we also
must remove pointer to the helper functions to explicitly avoid
parallel resizing on insertions/deletions. test_bucket_stats() and
test_rht_lookup() could also be wrapped around rhashtable mutex to
explicitly synchronize a walk from resizing, but I think that defeats
the actual test case which intended to have explicit test steps,
i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object
verification after each stage.
Reported-by: NJosh Hunt <johunt@akamai.com>
Fixes: c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Josh Hunt <johunt@akamai.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8331de75

24 2月, 2015 1 次提交

rhashtable: initialize all rhashtable walker members · 71bb0012

由 Sasha Levin 提交于 2月 23, 2015

Commit f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") forgot to
initialize the members of struct rhashtable_walker after allocating it, which
caused an undefined value for 'resize' which is used later on.

Fixes: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71bb0012

21 2月, 2015 2 次提交

rhashtable: better high order allocation attempts · eb6d1abf

由 Daniel Borkmann 提交于 2月 20, 2015

When trying to allocate future tables via bucket_table_alloc(), it seems
overkill on large table shifts that we probe for kzalloc() unconditionally
first, as it's likely to fail.

Only probe with kzalloc() for more reasonable table sizes and use vzalloc()
either as a fallback on failure or directly in case of large table sizes.

Fixes: 7e1e7763 ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb6d1abf

rhashtable: don't test for shrink on insert, expansion on delete · 342100d9

由 Daniel Borkmann 提交于 2月 20, 2015

Restore pre 54c5b7d3 behaviour and only probe for expansions on inserts
and shrinks on deletes. Currently, it will happen that on initial inserts
into a sparse hash table, we may i.e. shrink it first simply because it's
not fully populated yet, only to later realize that we need to grow again.

This however is counter intuitive, e.g. an initial default size of 64
elements is already small enough, and in case an elements size hint is given
to the hash table by a user, we should avoid unnecessary expansion steps,
so a shrink is clearly unintended here.

Fixes: 54c5b7d3 ("rhashtable: introduce rhashtable_wakeup_worker helper function")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

342100d9

09 2月, 2015 1 次提交

rhashtable: using ERR_PTR requires linux/err.h · 61d7b097

由 Stephen Rothwell 提交于 2月 09, 2015

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61d7b097

07 2月, 2015 7 次提交

rhashtable: Fix remove logic to avoid cross references between buckets · 020219a6

由 Thomas Graf 提交于 2月 06, 2015

The remove logic properly searched the remaining chain for a matching
entry with an identical hash but it did this while searching from both
the old and new table. Instead in order to not leave stale references
behind we need to:

 1. When growing and searching from the new table:
    Search remaining chain for entry with same hash to avoid having
    the new table directly point to a entry with a different hash.

 2. When shrinking and searching from the old table:
    Check if the element after the removed would create a cross
    reference and avoid it if so.

These bugs were present from the beginning in nft_hash.

Also, both insert functions calculated the hash based on the mask of
the new table. This worked while growing. Wwhile shrinking, the mask
of the inew table is smaller than the mask of the old table. This lead
to a bit not being taken into account when selecting the bucket lock
and thus caused the wrong bucket to be locked eventually.

Fixes: 7e1e7763 ("lib: Resizable, Scalable, Concurrent Hash Table")
Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

020219a6

rhashtable: Avoid bucket cross reference after removal · cf52d52f

由 Thomas Graf 提交于 2月 05, 2015

During a resize, when two buckets in the larger table map to
a single bucket in the smaller table and the new table has already
been (partially) linked to the old table. Removal of an element
may result the bucket in the larger table to point to entries
which all hash to a different value than the bucket index. Thus
causing two buckets to point to the same sub chain after unzipping.
This is not illegal *during* the resize phase but after it has
completed.

Keep the old table around until all of the unzipping is done to
allow the removal code to only search for matching hashed entries
during this special period.
Reported-by: NYing Xue <ying.xue@windriver.com>
Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf52d52f

rhashtable: Add more lock verification · 7cd10db8

由 Thomas Graf 提交于 2月 05, 2015

Catch hash miscalculations which result in hard to track down race
conditions.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cd10db8

rhashtable: Dump bucket tables on locking violation under PROVE_LOCKING · a03eaec0

由 Thomas Graf 提交于 2月 05, 2015

This simplifies debugging of locking violations if compiled with
CONFIG_PROVE_LOCKING.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a03eaec0

rhashtable: Wait for RCU readers after final unzip work · 2af4b529

由 Thomas Graf 提交于 2月 05, 2015

We need to wait for all RCU readers to complete after the last bit of
unzipping has been completed. Otherwise the old table is freed up
prematurely.

Fixes: 7e1e7763 ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2af4b529

rhashtable: Use a single bucket lock for sibling buckets · a5ec68e3

由 Thomas Graf 提交于 2月 05, 2015

rhashtable currently allows to use a bucket lock per bucket. This
requires multiple levels of complicated nested locking because when
resizing, a single bucket of the smaller table will map to two
buckets in the larger table. So far rhashtable has explicitly locked
both buckets in the larger table.

By excluding the highest bit of the hash from the bucket lock map and
thus only allowing locks to buckets in a ratio of 1:2, the locking
can be simplified a lot without losing the benefits of multiple locks.
Larger tables which benefit from multiple locks will not have a single
lock per bucket anyway.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5ec68e3

rhashtable: key_hashfn() must return full hash value · c88455ce

由 Thomas Graf 提交于 2月 05, 2015

The value computed by key_hashfn() is used by rhashtable_lookup_compare()
to traverse both tables during a resize. key_hashfn() must therefore
return the hash value without the buckets mask applied so it can be
masked to the size of each individual table.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c88455ce

05 2月, 2015 2 次提交

rhashtable: Introduce rhashtable_walk_* · f2dba9c6

由 Herbert Xu 提交于 2月 04, 2015

Some existing rhashtable users get too intimate with it by walking
the buckets directly.  This prevents us from easily changing the
internals of rhashtable.

This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.

They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump.  The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2dba9c6

rhashtable: Fix potential crash on destroy in rhashtable_shrink · 28134a53

由 Herbert Xu 提交于 2月 04, 2015

The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.

This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28134a53

31 1月, 2015 1 次提交

rhashtable: Make selftest modular · 9d6dbe1b

由 Geert Uytterhoeven 提交于 1月 29, 2015

Allow the selftest on the resizable hash table to be built modular, just
like all other tests that do not depend on DEBUG_KERNEL.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d6dbe1b

27 1月, 2015 1 次提交

rhashtable: rhashtable_remove() must unlink in both tbl and future_tbl · fe6a043c

由 Thomas Graf 提交于 1月 21, 2015

As removals can occur during resizes, entries may be referred to from
both tbl and future_tbl when the removal is requested. Therefore
rhashtable_remove() must unlink the entry in both tables if this is
the case. The existing code did search both tables but stopped when it
hit the first match.

Failing to unlink in both tables resulted in use after free.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe6a043c

16 1月, 2015 1 次提交

rhashtable: Fix race in rhashtable_destroy() and use regular work_struct · 57699a40

由 Ying Xue 提交于 1月 16, 2015

When we put our declared work task in the global workqueue with
schedule_delayed_work(), its delay parameter is always zero.
Therefore, we should define a regular work in rhashtable structure
instead of a delayed work.

By the way, we add a condition to check whether resizing functions
are NULL before cancelling the work, avoiding to cancel an
uninitialized work.

Lastly, while we wait for all work items we submitted before to run
to completion with cancel_delayed_work(), ht->mutex has been taken in
rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
until all work items are accomplished, and when work items are
scheduled, the work's function - rht_deferred_worker() will be called.
However, as rht_deferred_worker() also needs to acquire the lock,
deadlock might happen at the moment as the lock is already held before.
So if the cancel work function is moved out of the lock covered scope,
this will avoid the deadlock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57699a40

14 1月, 2015 2 次提交

rhashtable: Lower/upper bucket may map to same lock while shrinking · 80ca8c3a

由 Thomas Graf 提交于 1月 12, 2015

Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80ca8c3a

rhashtable: involve rhashtable_lookup_compare_insert routine · 7a868d1e

由 Ying Xue 提交于 1月 12, 2015

Introduce a new function called rhashtable_lookup_compare_insert()
which is very similar to rhashtable_lookup_insert(). But the former
makes use of users' given compare function to look for an object,
and then inserts it into hash table if found. As the entire process
of search and insertion is under protection of per bucket lock, this
can help users to avoid the involvement of extra lock.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a868d1e

09 1月, 2015 6 次提交

rhashtable: initialize atomic nelems variable · 545a148e

由 Ying Xue 提交于 1月 07, 2015

Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

545a148e

rhashtable: avoid unnecessary wakeup for worker queue · c0c09bfd

由 Ying Xue 提交于 1月 07, 2015

Move condition statements of verifying whether hash table size exceeds
its maximum threshold or reaches its minimum threshold from resizing
functions to resizing decision functions, avoiding unnecessary wakeup
for worker queue thread.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0c09bfd

rhashtable: future table needs to be traversed when remove an object · bd6d4db5

由 Ying Xue 提交于 1月 07, 2015

When remove an object from hash table, we currently only traverse old
bucket table to check whether the object exists. If the object is not
found in it, we will try again. But in the second search loop, we still
search the object from the old table instead of future table. As a
result, the object may be not removed from hash table especially when
resizing is currently in progress and the object is just saved in the
future table.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd6d4db5

rhashtable: involve rhashtable_lookup_insert routine · db304854

由 Ying Xue 提交于 1月 07, 2015

Involve a new function called rhashtable_lookup_insert() which makes
lookup and insertion atomic under bucket lock protection, helping us
avoid to introduce an extra lock when we search and insert an object
into hash table.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db304854

rhashtable: introduce rhashtable_wakeup_worker helper function · 54c5b7d3

由 Ying Xue 提交于 1月 07, 2015

Introduce rhashtable_wakeup_worker() helper function to reduce
duplicated code where to wake up worker.

By the way, as long as the both "future_tbl" and "tbl" bucket table
pointers point to the same bucket array, we should try to wake up
the resizing worker thread, otherwise, it indicates the work of
resizing hash table is not finished yet. However, currently we will
wake up the worker thread only when the two pointers point to
different bucket array. Obviously this is wrong. So, the issue is
also fixed as well in the patch.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54c5b7d3

rhashtable: optimize rhashtable_lookup routine · efb975a6

由 Ying Xue 提交于 1月 07, 2015

Define an internal compare function and relevant compare argument,
and then make use of rhashtable_lookup_compare() to lookup key in
hash table, reducing duplicated code between rhashtable_lookup()
and rhashtable_lookup_compare().
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efb975a6

04 1月, 2015 5 次提交

rhashtable: Supports for nulls marker · f89bd6f8

由 Thomas Graf 提交于 1月 02, 2015

In order to allow for wider usage of rhashtable, use a special nulls
marker to terminate each chain. The reason for not using the existing
nulls_list is that the prev pointer usage would not be valid as entries
can be linked in two different buckets at the same time.

The 4 nulls base bits can be set through the rhashtable_params structure
like this:

struct rhashtable_params params = {
        [...]
        .nulls_base = (1U << RHT_BASE_SHIFT),
};

This reduces the hash length from 32 bits to 27 bits.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f89bd6f8

rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e

由 Thomas Graf 提交于 1月 02, 2015

Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts.  Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97defe1e

nft_hash: Remove rhashtable_remove_pprev() · 897362e4

由 Thomas Graf 提交于 1月 02, 2015

The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().

With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.

Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

897362e4

rhashtable: Factor out bucket_tail() function · b8e1943e

由 Thomas Graf 提交于 1月 02, 2015

Subsequent patches will require access to the bucket tail. Access
to the tail is relatively cheap as the automatic resizing of the
table should keep the number of entries per bucket to no more
than 0.75 on average.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8e1943e

rhashtable: Convert bucket iterators to take table and index · 88d6ed15

由 Thomas Graf 提交于 1月 02, 2015

This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.

It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.

The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88d6ed15