提交 · 8f0db018006a421956965e1149234c4e8db718ee · openeuler / Kernel

08 4月, 2019 3 次提交

rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018

由 NeilBrown 提交于 4月 02, 2019

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
 - no need to allocate a separate array of locks.
 - no need to have a configuration option to guide the
   choice of the size of this array
 - locking cost is often a single test-and-set in a cache line
   that will have to be loaded anyway.  When inserting at, or removing
   from, the head of the chain, the unlock is free - writing the new
   address in the bucket head implicitly clears the lock bit.
   For __rhashtable_insert_fast() we ensure this always happens
   when adding a new key.
 - even when lockings costs 2 updates (lock and unlock), they are
   in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved.  This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
  pointer plus lock-bit
that is stored in the bucket-table.  This is "struct rhash_lock_head"
and is empty.  A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head.  As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f0db018

rhashtable: allow rht_bucket_var to return NULL. · ff302db9

由 NeilBrown 提交于 4月 02, 2019

Rather than returning a pointer to a static nulls, rht_bucket_var()
now returns NULL if the bucket doesn't exist.
This will make the next patch, which stores a bitlock in the
bucket pointer, somewhat cleaner.

This change involves introducing __rht_bucket_nested() which is
like rht_bucket_nested(), but doesn't provide the static nulls,
and changing rht_bucket_nested() to call this and possible
provide a static nulls - as is still needed for the non-var case.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff302db9

rhashtable: use cmpxchg() in nested_table_alloc() · 7a41c294

由 NeilBrown 提交于 4月 02, 2019

nested_table_alloc() relies on the fact that there is
at most one spinlock allocated for every slot in the top
level nested table, so it is not possible for two threads
to try to allocate the same table at the same time.

This assumption is a little fragile (it is not explicit) and is
unnecessary as cmpxchg() can be used instead.

A future patch will replace the spinlocks by per-bucket bitlocks,
and then we won't be able to protect the slot pointer with a spinlock.

So replace rcu_assign_pointer() with cmpxchg() - which has equivalent
barrier properties.
If it the cmp fails, free the table that was just allocated.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a41c294

22 3月, 2019 3 次提交

rhashtable: rename rht_for_each*continue as *from. · f7ad68bf

由 NeilBrown 提交于 3月 21, 2019

The pattern set by list.h is that for_each..continue()
iterators start at the next entry after the given one,
while for_each..from() iterators start at the given
entry.

The rht_for_each*continue() iterators are documented as though the
start at the 'next' entry, but actually start at the given entry,
and they are used expecting that behaviour.
So fix the documentation and change the names to *from for consistency
with list.h
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7ad68bf

rhashtable: don't hold lock on first table throughout insertion. · 4feb7c7a

由 NeilBrown 提交于 3月 21, 2019

rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table.  The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop().  This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NPaul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4feb7c7a

rhashtable: Still do rehash when we get EEXIST · 408f13ef

由 Herbert Xu 提交于 3月 21, 2019

As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f ("rhashtable: Add nested tables")
Reported-by: NJosh Elsasser <jelsasser@appneta.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NJosh Elsasser <jelsasser@appneta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

408f13ef

22 2月, 2019 1 次提交

rhashtable: Remove obsolete rhashtable_walk_init function · 6c4128f6

由 Herbert Xu 提交于 2月 14, 2019

The rhashtable_walk_init function has been obsolete for more than
two years.  This patch finally converts its last users over to
rhashtable_walk_enter and removes it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6c4128f6

04 12月, 2018 1 次提交

rhashtable: detect when object movement between tables might have invalidated a lookup · 82208d0d

由 NeilBrown 提交于 11月 30, 2018

Some users of rhashtables might need to move an object from one table
to another -  this appears to be the reason for the incomplete usage
of NULLS markers.

To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one.  If not, the search
may not have examined all objects in the target bucket, so it is
repeated.

The unique NULLS_MARKER is derived from the address of the
head of the chain.  As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.

Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.

Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82208d0d

23 8月, 2018 2 次提交

lib/rhashtable: guarantee initial hashtable allocation · 2d22ecf6

由 Davidlohr Bueso 提交于 8月 21, 2018

rhashtable_init() may fail due to -ENOMEM, thus making the entire api
unusable.  This patch removes this scenario, however unlikely.  In order
to guarantee memory allocation, this patch always ends up doing
GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as
alloc_bucket_spinlocks().

Upon the first table allocation failure, we shrink the size to the
smallest value that makes sense and retry with __GFP_NOFAIL semantics.
With the defaults, this means that from 64 buckets, we retry with only 4.
Any later issues regarding performance due to collisions or larger table
resizing (when more memory becomes available) is the least of our
problems.

Link: http://lkml.kernel.org/r/20180712185241.4017-9-manfred@colorfullife.comSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d22ecf6

lib/rhashtable: simplify bucket_table_alloc() · 93f976b5

由 Davidlohr Bueso 提交于 8月 21, 2018

As of ce91f6ee ("mm: kvmalloc does not fallback to vmalloc for
incompatible gfp flags") we can simplify the caller and trust kvzalloc()
to just do the right thing.  For the case of the GFP_ATOMIC context, we
can drop the __GFP_NORETRY flag for obvious reasons, and for the
__GFP_NOWARN case, however, it is changed such that the caller passes the
flag instead of making bucket_table_alloc() handle it.

This slightly changes the gfp flags passed on to nested_table_alloc() as
it will now also use GFP_ATOMIC | __GFP_NOWARN.  However, I consider this
a positive consequence as for the same reasons we want nowarn semantics in
bucket_table_alloc().

[manfred@colorfullife.com: commit id extended to 12 digits, line wraps updated]
Link: http://lkml.kernel.org/r/20180712185241.4017-8-manfred@colorfullife.comSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93f976b5

21 8月, 2018 1 次提交

rhashtable: remove duplicated include from rhashtable.c · ab08dcd7

由 Yue Haibing 提交于 8月 21, 2018

Remove duplicated include.
Signed-off-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab08dcd7

19 7月, 2018 1 次提交

lib/rhashtable: consider param->min_size when setting initial table size · 107d01f5

由 Davidlohr Bueso 提交于 7月 16, 2018

rhashtable_init() currently does not take into account the user-passed
min_size parameter unless param->nelem_hint is set as well. As such,
the default size (number of buckets) will always be HASH_DEFAULT_SIZE
even if the smallest allowed size is larger than that. Remediate this
by unconditionally calling into rounded_hashtable_size() and handling
things accordingly.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

107d01f5

10 7月, 2018 1 次提交

rhashtable: add restart routine in rhashtable_free_and_destroy() · 0026129c

由 Taehee Yoo 提交于 7月 08, 2018

rhashtable_free_and_destroy() cancels re-hash deferred work
then walks and destroys elements. at this moment, some elements can be
still in future_tbl. that elements are not destroyed.

test case:
nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
all elements of sets before destroying sets and chains.
But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
so that splat occurred.

test script:
   %cat test.nft
   table ip aa {
	   map map1 {
		   type ipv4_addr : verdict;
		   elements = {
			   0 : jump a0,
			   1 : jump a0,
			   2 : jump a0,
			   3 : jump a0,
			   4 : jump a0,
			   5 : jump a0,
			   6 : jump a0,
			   7 : jump a0,
			   8 : jump a0,
			   9 : jump a0,
		}
	   }
	   chain a0 {
	   }
   }
   flush ruleset
   table ip aa {
	   map map1 {
		   type ipv4_addr : verdict;
		   elements = {
			   0 : jump a0,
			   1 : jump a0,
			   2 : jump a0,
			   3 : jump a0,
			   4 : jump a0,
			   5 : jump a0,
			   6 : jump a0,
			   7 : jump a0,
			   8 : jump a0,
			   9 : jump a0,
		   }
	   }
	   chain a0 {
	   }
   }
   flush ruleset

   %while :; do nft -f test.nft; done

Splat looks like:
[  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[  200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[  200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
[  200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
[  200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
[  200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
[  200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
[  200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
[  200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
[  200.906354] FS:  00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
[  200.915533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
[  200.930353] Call Trace:
[  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[  200.959532]  ? nla_parse+0xab/0x230
[  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[  200.975525]  ? debug_show_all_locks+0x290/0x290
[  200.980363]  ? debug_show_all_locks+0x290/0x290
[  200.986356]  ? sched_clock_cpu+0x132/0x170
[  200.990352]  ? find_held_lock+0x39/0x1b0
[  200.994355]  ? sched_clock_local+0x10d/0x130
[  200.999531]  ? memset+0x1f/0x40

V2:
 - free all tables requested by Herbert Xu
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0026129c

03 7月, 2018 1 次提交

lib: rhashtable: Correct self-assignment in rhashtable.c · c643ecf3

由 Rishabh Bhatnagar 提交于 7月 02, 2018

In file lib/rhashtable.c line 777, skip variable is assigned to
itself. The following error was observed:

lib/rhashtable.c:777:41: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign] error, forbidden
warning: rhashtable.c:777
This error was found when compiling with Clang 6.0. Change it to iter->skip.
Signed-off-by: NRishabh Bhatnagar <rishabhb@codeaurora.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c643ecf3

22 6月, 2018 6 次提交

rhashtable: clean up dereference of ->future_tbl. · c0690016

由 NeilBrown 提交于 6月 18, 2018

Using rht_dereference_bucket() to dereference
->future_tbl looks like a type error, and could be confusing.
Using rht_dereference_rcu() to test a pointer for NULL
adds an unnecessary barrier - rcu_access_pointer() is preferred
for NULL tests when no lock is held.

This uses 3 different ways to access ->future_tbl.
- if we know the mutex is held, use rht_dereference()
- if we don't hold the mutex, and are only testing for NULL,
  use rcu_access_pointer()
- otherwise (using RCU protection for true dereference),
  use rht_dereference_rcu().

Note that this includes a simplification of the call to
rhashtable_last_table() - we don't do an extra dereference
before the call any more.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0690016

rhashtable: use cmpxchg() to protect ->future_tbl. · 0ad66449

由 NeilBrown 提交于 6月 18, 2018

Rather than borrowing one of the bucket locks to
protect ->future_tbl updates, use cmpxchg().
This gives more freedom to change how bucket locking
is implemented.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad66449

rhashtable: simplify nested_table_alloc() and rht_bucket_nested_insert() · 5af68ef7

由 NeilBrown 提交于 6月 18, 2018

Now that we don't use the hash value or shift in nested_table_alloc()
there is room for simplification.
We only need to pass a "is this a leaf" flag to nested_table_alloc(),
and don't need to track as much information in
rht_bucket_nested_insert().

Note there is another minor cleanup in nested_table_alloc() here.
The number of elements in a page of "union nested_tables" is most naturally

  PAGE_SIZE / sizeof(ntbl[0])

The previous code had

  PAGE_SIZE / sizeof(ntbl[0].bucket)

which happens to be the correct value only because the bucket uses all
the space in the union.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5af68ef7

rhashtable: simplify INIT_RHT_NULLS_HEAD() · 9b4f64a2

由 NeilBrown 提交于 6月 18, 2018

The 'ht' and 'hash' arguments to INIT_RHT_NULLS_HEAD() are
no longer used - so drop them.  This allows us to also
remove the nhash argument from nested_table_alloc().
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b4f64a2

rhashtable: remove nulls_base and related code. · 9f9a7077

由 NeilBrown 提交于 6月 18, 2018

This "feature" is unused, undocumented, and untested and so doesn't
really belong.  A patch is under development to properly implement
support for detecting when a search gets diverted down a different
chain, which the common purpose of nulls markers.

This patch actually fixes a bug too.  The table resizing allows a
table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
any growth beyond 2^27 is wasteful an ineffective.

This patch results in NULLS_MARKER(0) being used for all chains,
and leaves the use of rht_is_a_null() to test for it.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9a7077

rhashtable: split rhashtable.h · 0eb71a9d

由 NeilBrown 提交于 6月 18, 2018

Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code.  rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eb71a9d

25 4月, 2018 3 次提交

rhashtable: improve rhashtable_walk stability when stop/start used. · 5d240a89

由 NeilBrown 提交于 4月 24, 2018

When a walk of an rhashtable is interrupted with rhastable_walk_stop()
and then rhashtable_walk_start(), the location to restart from is based
on a 'skip' count in the current hash chain, and this can be incorrect
if insertions or deletions have happened.  This does not happen when
the walk is not stopped and started as iter->p is a placeholder which
is safe to use while holding the RCU read lock.

In rhashtable_walk_start() we can revalidate that 'p' is still in the
same hash chain.  If it isn't then the current method is still used.

With this patch, if a rhashtable walker ensures that the current
object remains in the table over a stop/start period (possibly by
elevating the reference count if that is sufficient), it can be sure
that a walk will not miss objects that were in the hashtable for the
whole time of the walk.

rhashtable_walk_start() may not find the object even though it is
still in the hashtable if a rehash has moved it to a new table.  In
this case it will (eventually) get -EAGAIN and will need to proceed
through the whole table again to be sure to see everything at least
once.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d240a89

rhashtable: reset iter when rhashtable_walk_start sees new table · b41cc04b

由 NeilBrown 提交于 4月 24, 2018

The documentation claims that when rhashtable_walk_start_check()
detects a resize event, it will rewind back to the beginning
of the table.  This is not true.  We need to set ->slot and
->skip to be zero for it to be true.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b41cc04b

rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() · 82266e98

由 NeilBrown 提交于 4月 24, 2018

Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
they do take a spinlock without irq protection.
So revise the comments to accurately state the contexts in which
these functions can be called.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82266e98

01 4月, 2018 1 次提交

rhashtable: add schedule points · ae6da1f5

由 Eric Dumazet 提交于 3月 31, 2018

Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae6da1f5

07 3月, 2018 1 次提交

rhashtable: Fix rhlist duplicates insertion · d3dcf8eb

由 Paul Blakey 提交于 3月 04, 2018

When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Issue: 1241076
Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
Fixes: ca26893f ('rhashtable: Add rhlist interface')
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3dcf8eb

11 12月, 2017 3 次提交

rhashtable: Call library function alloc_bucket_locks · 64e0cd0d

由 Tom Herbert 提交于 12月 04, 2017

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks. This function is
based on the old alloc_bucket_locks in rhashtable and should
produce the same effect.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64e0cd0d

rhashtable: Add rhastable_walk_peek · 2db54b47

由 Tom Herbert 提交于 12月 04, 2017

This function is like rhashtable_walk_next except that it only returns
the current element in the inter and does not advance the iter.

This patch also creates __rhashtable_walk_find_next. It finds the next
element in the table when the entry cached in iter is NULL or at the end
of a slot. __rhashtable_walk_find_next is called from
rhashtable_walk_next and rhastable_walk_peek.

end_of_table is an added field to the iter structure. This indicates
that the end of table was reached (walker.tbl being NULL is not a
sufficient condition for end of table).
Signed-off-by: NTom Herbert <tom@quantonium.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2db54b47

rhashtable: Change rhashtable_walk_start to return void · 97a6ec4a

由 Tom Herbert 提交于 12月 04, 2017

Most callers of rhashtable_walk_start don't care about a resize event
which is indicated by a return value of -EAGAIN. So calls to
rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
like this is common:

       ret = rhashtable_walk_start(rhiter);
       if (ret && ret != -EAGAIN)
               goto out;

Since zero and -EAGAIN are the only possible return values from the
function this check is pointless. The condition never evaluates to true.

This patch changes rhashtable_walk_start to return void. This simplifies
code for the callers that ignore -EAGAIN. For the few cases where the
caller cares about the resize event, particularly where the table can be
walked in mulitple parts for netlink or seq file dump, the function
rhashtable_walk_start_check has been added that returns -EAGAIN on a
resize event.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97a6ec4a

20 9月, 2017 1 次提交

rhashtable: Documentation tweak · 0647169c

由 Andreas Gruenbacher 提交于 9月 19, 2017

Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0647169c

11 7月, 2017 1 次提交

lib/rhashtable.c: use kvzalloc() in bucket_table_alloc() when possible · 12e8fd6f

由 Michal Hocko 提交于 7月 10, 2017

bucket_table_alloc() can be currently called with GFP_KERNEL or
GFP_ATOMIC. For the former we basically have an open coded kvzalloc()
while the later only uses kzalloc(). Let's simplify the code a bit by
the dropping the open coded path and replace it with kvzalloc().

Link: http://lkml.kernel.org/r/20170531155145.17111-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12e8fd6f

20 6月, 2017 1 次提交

rhashtable: use get_random_u32 for hash_rnd · d48ad080

由 Jason A. Donenfeld 提交于 6月 07, 2017

This is much faster and just as secure. It also has the added benefit of
probably returning better randomness at early-boot on systems with
architectural RNGs.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d48ad080

09 5月, 2017 1 次提交

lib/rhashtable.c: simplify a strange allocation pattern · 43ca5bc4

由 Michal Hocko 提交于 5月 08, 2017

alloc_bucket_locks allocation pattern is quite unusual.  We are
preferring vmalloc when CONFIG_NUMA is enabled.  The rationale is that
vmalloc will respect the memory policy of the current process and so the
backing memory will get distributed over multiple nodes if the requester
is configured properly.  At least that is the intention, in reality
rhastable is shrunk and expanded from a kernel worker so no mempolicy
can be assumed.

Let's just simplify the code and use kvmalloc helper, which is a
transparent way to use kmalloc with vmalloc fallback, if the caller is
allowed to block and use the flag otherwise.

Link: http://lkml.kernel.org/r/20170306103032.2540-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

43ca5bc4

02 5月, 2017 1 次提交

rhashtable: compact struct rhashtable_params · 48e75b43

由 Florian Westphal 提交于 5月 01, 2017

By using smaller datatypes this (rather large) struct shrinks considerably
(80 -> 48 bytes on x86_64).

As this is embedded in other structs, this also rerduces size of several
others, e.g. cls_fl_head or nft_hash.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48e75b43

28 4月, 2017 1 次提交

rhashtable: Do not lower max_elems when max_size is zero · 2d2ab658

由 Herbert Xu 提交于 4月 28, 2017

The commit 6d684e54 ("rhashtable: Cap total number of entries
to 2^31") breaks rhashtable users that do not set max_size.  This
is because when max_size is zero max_elems is also incorrectly set
to zero instead of 2^31.

This patch fixes it by only lowering max_elems when max_size is not
zero.

Fixes: 6d684e54 ("rhashtable: Cap total number of entries to 2^31")
Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reported-by: Nkernel test robot <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d2ab658

27 4月, 2017 2 次提交

rhashtable: Cap total number of entries to 2^31 · 6d684e54

由 Herbert Xu 提交于 4月 27, 2017

When max_size is not set or if it set to a sufficiently large
value, the nelems counter can overflow.  This would cause havoc
with the automatic shrinking as it would then attempt to fit a
huge number of entries into a tiny hash table.

This patch fixes this by adding max_elems to struct rhashtable
to cap the number of elements.  This is set to 2^31 as nelems is
not a precise count.  This is sufficiently smaller than UINT_MAX
that it should be safe.

When max_size is set max_elems will be lowered to at most twice
max_size as is the status quo.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d684e54

rhashtable: remove insecure_max_entries param · 038a3e85

由 Florian Westphal 提交于 4月 25, 2017

no users in the tree, insecure_max_entries is always set to
ht->p.max_size * 2 in rhtashtable_init().

Replace only spot that uses it with a ht->p.max_size check.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038a3e85

19 4月, 2017 1 次提交

rhashtable: remove insecure_elasticity · 5f8ddeab

由 Florian Westphal 提交于 4月 16, 2017

commit 83e7e4ce ("mac80211: Use rhltable instead of rhashtable")
removed the last user that made use of 'insecure_elasticity' parameter,
i.e. the default of 16 is used everywhere.

Replace it with a constant.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f8ddeab

02 3月, 2017 1 次提交

sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> · b2d09103

由 Ingo Molnar 提交于 2月 04, 2017

We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

b2d09103

27 2月, 2017 2 次提交

rhashtable: Fix RCU dereference annotation in rht_bucket_nested · c4d2603d

由 Herbert Xu 提交于 2月 25, 2017

The current annotation is wrong as it says that we're only called
under spinlock.  In fact it should be marked as under either
spinlock or RCU read lock.

Fixes: da20420f ("rhashtable: Add nested tables")
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4d2603d

rhashtable: Fix use before NULL check in bucket_table_free · ca435407

由 Herbert Xu 提交于 2月 25, 2017

Dan Carpenter reported a use before NULL check bug in the function
bucket_table_free.  In fact we don't need the NULL check at all as
no caller can provide a NULL argument.  So this patch fixes this by
simply removing it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca435407

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功