提交 · 0647169cf9aa441700eb8f23ea49be060626534b · openeuler / Kernel

20 9月, 2017 1 次提交

rhashtable: Documentation tweak · 0647169c

Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0647169c

11 7月, 2017 1 次提交

lib/rhashtable.c: use kvzalloc() in bucket_table_alloc() when possible · 12e8fd6f

由 Michal Hocko 提交于 7年前

bucket_table_alloc() can be currently called with GFP_KERNEL or
GFP_ATOMIC. For the former we basically have an open coded kvzalloc()
while the later only uses kzalloc(). Let's simplify the code a bit by
the dropping the open coded path and replace it with kvzalloc().

Link: http://lkml.kernel.org/r/20170531155145.17111-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12e8fd6f

20 6月, 2017 1 次提交

rhashtable: use get_random_u32 for hash_rnd · d48ad080

由 Jason A. Donenfeld 提交于 7年前

This is much faster and just as secure. It also has the added benefit of
probably returning better randomness at early-boot on systems with
architectural RNGs.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d48ad080

09 5月, 2017 1 次提交

lib/rhashtable.c: simplify a strange allocation pattern · 43ca5bc4

由 Michal Hocko 提交于 7年前

alloc_bucket_locks allocation pattern is quite unusual.  We are
preferring vmalloc when CONFIG_NUMA is enabled.  The rationale is that
vmalloc will respect the memory policy of the current process and so the
backing memory will get distributed over multiple nodes if the requester
is configured properly.  At least that is the intention, in reality
rhastable is shrunk and expanded from a kernel worker so no mempolicy
can be assumed.

Let's just simplify the code and use kvmalloc helper, which is a
transparent way to use kmalloc with vmalloc fallback, if the caller is
allowed to block and use the flag otherwise.

Link: http://lkml.kernel.org/r/20170306103032.2540-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

43ca5bc4

02 5月, 2017 1 次提交

rhashtable: compact struct rhashtable_params · 48e75b43

由 Florian Westphal 提交于 7年前

By using smaller datatypes this (rather large) struct shrinks considerably
(80 -> 48 bytes on x86_64).

As this is embedded in other structs, this also rerduces size of several
others, e.g. cls_fl_head or nft_hash.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48e75b43

28 4月, 2017 1 次提交

rhashtable: Do not lower max_elems when max_size is zero · 2d2ab658

由 Herbert Xu 提交于 7年前

The commit 6d684e54 ("rhashtable: Cap total number of entries
to 2^31") breaks rhashtable users that do not set max_size.  This
is because when max_size is zero max_elems is also incorrectly set
to zero instead of 2^31.

This patch fixes it by only lowering max_elems when max_size is not
zero.

Fixes: 6d684e54 ("rhashtable: Cap total number of entries to 2^31")
Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reported-by: Nkernel test robot <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d2ab658

27 4月, 2017 2 次提交

rhashtable: Cap total number of entries to 2^31 · 6d684e54

由 Herbert Xu 提交于 7年前

When max_size is not set or if it set to a sufficiently large
value, the nelems counter can overflow.  This would cause havoc
with the automatic shrinking as it would then attempt to fit a
huge number of entries into a tiny hash table.

This patch fixes this by adding max_elems to struct rhashtable
to cap the number of elements.  This is set to 2^31 as nelems is
not a precise count.  This is sufficiently smaller than UINT_MAX
that it should be safe.

When max_size is set max_elems will be lowered to at most twice
max_size as is the status quo.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d684e54

rhashtable: remove insecure_max_entries param · 038a3e85

由 Florian Westphal 提交于 7年前

no users in the tree, insecure_max_entries is always set to
ht->p.max_size * 2 in rhtashtable_init().

Replace only spot that uses it with a ht->p.max_size check.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038a3e85

19 4月, 2017 1 次提交

rhashtable: remove insecure_elasticity · 5f8ddeab

由 Florian Westphal 提交于 7年前

commit 83e7e4ce ("mac80211: Use rhltable instead of rhashtable")
removed the last user that made use of 'insecure_elasticity' parameter,
i.e. the default of 16 is used everywhere.

Replace it with a constant.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f8ddeab

02 3月, 2017 1 次提交

sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> · b2d09103

由 Ingo Molnar 提交于 7年前

We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

b2d09103

27 2月, 2017 2 次提交

rhashtable: Fix RCU dereference annotation in rht_bucket_nested · c4d2603d

由 Herbert Xu 提交于 7年前

The current annotation is wrong as it says that we're only called
under spinlock.  In fact it should be marked as under either
spinlock or RCU read lock.

Fixes: da20420f ("rhashtable: Add nested tables")
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4d2603d

rhashtable: Fix use before NULL check in bucket_table_free · ca435407

由 Herbert Xu 提交于 7年前

Dan Carpenter reported a use before NULL check bug in the function
bucket_table_free.  In fact we don't need the NULL check at all as
no caller can provide a NULL argument.  So this patch fixes this by
simply removing it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca435407

18 2月, 2017 1 次提交

rhashtable: Add nested tables · da20420f

由 Herbert Xu 提交于 7年前

This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da20420f

16 2月, 2017 1 次提交

rhashtable: Revert nested table changes. · bf3f14d6

由 David S. Miller 提交于 7年前

This reverts commits:

6a254780
9dbbfb0a
40137906

It's too risky to put in this late in the release
cycle.  We'll put these changes into the next merge
window instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf3f14d6

14 2月, 2017 1 次提交

rhashtable: Add nested tables · 40137906

由 Herbert Xu 提交于 7年前

This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40137906

20 9月, 2016 1 次提交

rhashtable: Add rhlist interface · ca26893f

由 Herbert Xu 提交于 8年前

The insecure_elasticity setting is an ugly wart brought out by
users who need to insert duplicate objects (that is, distinct
objects with identical keys) into the same table.

In fact, those users have a much bigger problem.  Once those
duplicate objects are inserted, they don't have an interface to
find them (unless you count the walker interface which walks
over the entire table).

Some users have resorted to doing a manual walk over the hash
table which is of course broken because they don't handle the
potential existence of multiple hash tables.  The result is that
they will break sporadically when they encounter a hash table
resize/rehash.

This patch provides a way out for those users, at the expense
of an extra pointer per object.  Essentially each object is now
a list of objects carrying the same key.  The hash table will
only see the lists so nothing changes as far as rhashtable is
concerned.

To use this new interface, you need to insert a struct rhlist_head
into your objects instead of struct rhash_head.  While the hash
table is unchanged, for type-safety you'll need to use struct
rhltable instead of struct rhashtable.  All the existing interfaces
have been duplicated for rhlist, including the hash table walker.

One missing feature is nulls marking because AFAIK the only potential
user of it does not need duplicate objects.  Should anyone need
this it shouldn't be too hard to add.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca26893f

27 8月, 2016 1 次提交

rhashtable: fix a memory leak in alloc_bucket_locks() · 9dbeea7f

由 Eric Dumazet 提交于 8年前

If vmalloc() was successful, do not attempt a kmalloc_array()

Fixes: 4cf0b354 ("rhashtable: avoid large lock-array allocations")
Reported-by: NCAI Qian <caiqian@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NCAI Qian <caiqian@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dbeea7f

26 8月, 2016 1 次提交

rhashtable: add rhashtable_lookup_get_insert_key() · 5ca8cc5b

由 Pablo Neira Ayuso 提交于 8年前

This patch modifies __rhashtable_insert_fast() so it returns the
existing object that clashes with the one that you want to insert.
In case the object is successfully inserted, NULL is returned.
Otherwise, you get an error via ERR_PTR().

This patch adapts the existing callers of __rhashtable_insert_fast()
so they handle this new logic, and it adds a new
rhashtable_lookup_get_insert_key() interface to fetch this existing
object.

nf_tables needs this change to improve handling of EEXIST cases via
honoring the NLM_F_EXCL flag and by checking if the data part of the
mapping matches what we have.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>

5ca8cc5b

20 8月, 2016 1 次提交

rhashtable: Remove GFP flag from rhashtable_walk_init · 246779dd

由 Herbert Xu 提交于 8年前

The commit 8f6fd83c ("rhashtable:
accept GFP flags in rhashtable_walk_init") added a GFP flag argument
to rhashtable_walk_init because some users wish to use the walker
in an unsleepable context.

In fact we don't need to allocate memory in rhashtable_walk_init
at all.  The walker is always paired with an iterator so we could
just stash ourselves there.

This patch does that by introducing a new enter function to replace
the existing init function.  This way we don't have to churn all
the existing users again.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

246779dd

16 8月, 2016 1 次提交

rhashtable: fix shift by 64 when shrinking · 12311959

由 Vegard Nossum 提交于 8年前

I got this:

    ================================================================================
    UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
    shift exponent 64 is too large for 64-bit type 'long unsigned int'
    CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    Workqueue: events rht_deferred_worker
     0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
     ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
     0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
    Call Trace:
     [<ffffffff82344f50>] dump_stack+0xac/0xfc
     [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4
     [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a
     [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
     [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180
     [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0
     [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160
     [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380
     [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790
     [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790
     [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370
     [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
     [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970
     [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
     [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0
     [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340
     [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340
     [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40
     [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
     [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
     [<ffffffff813642f7>] kthread+0x237/0x390
     [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
     [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50
     [<ffffffff845f95df>] ret_from_fork+0x1f/0x40
     [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
    ================================================================================

roundup_pow_of_two() is undefined when called with an argument of 0, so
let's avoid the call and just fall back to ht->p.min_size (which should
never be smaller than HASH_MIN_SIZE).

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12311959

15 8月, 2016 1 次提交

rhashtable: avoid large lock-array allocations · 4cf0b354

由 Florian Westphal 提交于 8年前

Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:

swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
 [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
 [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
 [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
 [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
 [<ffffffff811aec19>] __kmalloc+0x129/0x140
 [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
 [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
 [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400

The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.

Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.

In commit 095dc8e0 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu.  So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.

Resulting size of the lock-array (sizeof(spinlock) == 4):

cpus:    1   2   4   8   16   32   64
old:    1k  1k  4k  8k  16k  16k  16k
new:   128 256 512  1k   2k   4k   8k

8k allocation should have decent chance of success even
with GFP_ATOMIC, and should not fail with GFP_KERNEL.

With 72-byte spinlock (LOCKDEP):
cpus :   1   2
old:    9k 18k
new:   ~2k ~4k
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Suggested-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cf0b354

05 4月, 2016 1 次提交

rhashtable: accept GFP flags in rhashtable_walk_init · 8f6fd83c

由 Bob Copeland 提交于 8年前

In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NBob Copeland <me@bobcopeland.com>
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8f6fd83c

19 12月, 2015 1 次提交

rhashtable: Kill harmless RCU warning in rhashtable_walk_init · 179ccc0a

由 Herbert Xu 提交于 9年前

The commit c6ff5268 ("rhashtable:
Fix walker list corruption") causes a suspicious RCU usage warning
because we no longer hold ht->mutex when we dereference ht->tbl.

However, this is a false positive because we now hold ht->lock
which also guarantees that ht->tbl won't disppear from under us.

This patch kills the warning by using rcu_dereference_protected.
Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

179ccc0a

17 12月, 2015 1 次提交

rhashtable: Fix walker list corruption · c6ff5268

由 Herbert Xu 提交于 9年前

The commit ba7c95ea ("rhashtable:
Fix sleeping inside RCU critical section in walk_stop") introduced
a new spinlock for the walker list.  However, it did not convert
all existing users of the list over to the new spin lock.  Some
continued to use the old mutext for this purpose.  This obviously
led to corruption of the list.

The fix is to use the spin lock everywhere where we touch the list.

This also allows us to do rcu_rad_lock before we take the lock in
rhashtable_walk_start.  With the old mutex this would've deadlocked
but it's safe with the new spin lock.

Fixes: ba7c95ea ("rhashtable: Fix sleeping inside RCU...")
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6ff5268

16 12月, 2015 1 次提交

rhashtable: Enforce minimum size on initial hash table · 3a324606

由 Herbert Xu 提交于 9年前

William Hua <william.hua@canonical.com> wrote:
>
> I wasn't aware there was an enforced minimum size. I simply set the
> nelem_hint in the rhastable_params struct to 1, expecting it to grow as
> needed. This caused a segfault afterwards when trying to insert an
> element.

OK we're doing the size computation before we enforce the limit
on min_size.

---8<---
We need to do the initial hash table size computation after we
have obtained the correct min_size/max_size parameters.  Otherwise
we may end up with a hash table whose size is outside the allowed
envelope.

Fixes: a998f712 ("rhashtable: Round up/down min/max_size to...")
Reported-by: NWilliam Hua <william.hua@canonical.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a324606

09 12月, 2015 1 次提交

rhashtable: Remove unnecessary wmb for future_tbl · 46c749ea

由 Herbert Xu 提交于 9年前

The patch 9497df88 ("rhashtable:
Fix reader/rehash race") added a pair of barriers.  In fact the
wmb is superfluous because every subsequent write to the old or
new hash table uses rcu_assign_pointer, which itself carriers a
full barrier prior to the assignment.

Therefore we may remove the explicit wmb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46c749ea

06 12月, 2015 1 次提交

Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation" · a90099d9

由 David S. Miller 提交于 9年前

This reverts commit d3716f18.

vmalloc cannot be used in BH disabled contexts, even
with GFP_ATOMIC.  And we certainly want to support
rhashtable users inserting entries with software
interrupts disabled.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a90099d9

05 12月, 2015 2 次提交

rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation · d3716f18

由 Herbert Xu 提交于 9年前

When an rhashtable user pounds rhashtable hard with back-to-back
insertions we may end up growing the table in GFP_ATOMIC context.
Unfortunately when the table reaches a certain size this often
fails because we don't have enough physically contiguous pages
to hold the new table.

Eric Dumazet suggested (and in fact wrote this patch) using
__vmalloc instead which can be used in GFP_ATOMIC context.
Reported-by: NPhil Sutter <phil@nwl.cc>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3716f18

rhashtable: Prevent spurious EBUSY errors on insertion · 3cf92222

由 Herbert Xu 提交于 9年前

Thomas and Phil observed that under stress rhashtable insertion
sometimes failed with EBUSY, even though this error should only
ever been seen when we're under attack and our hash chain length
has grown to an unacceptable level, even after a rehash.

It turns out that the logic for detecting whether there is an
existing rehash is faulty.  In particular, when two threads both
try to grow the same table at the same time, one of them may see
the newly grown table and thus erroneously conclude that it had
been rehashed.  This is what leads to the EBUSY error.

This patch fixes this by remembering the current last table we
used during insertion so that rhashtable_insert_rehash can detect
when another thread has also done a resize/rehash.  When this is
detected we will give up our resize/rehash and simply retry the
insertion with the new table.
Reported-by: NThomas Graf <tgraf@suug.ch>
Reported-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cf92222

23 9月, 2015 1 次提交

lib: fix data race in rhashtable_rehash_one · 7def0f95

由 Dmitriy Vyukov 提交于 9年前

rhashtable_rehash_one() uses complex logic to update entry->next field,
after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:

entry->next = 1 | ((base + off) << 1)

This can be compiled along the lines of:

entry->next = base + off
entry->next <<= 1
entry->next |= 1

Which will break concurrent readers.

NULLS value recomputation is not needed here, so just remove
the complex logic.

The data race was found with KernelThreadSanitizer (KTSAN).
Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7def0f95

09 7月, 2015 1 次提交

rhashtable: fix for resize events during table walk · 142b942a

由 Phil Sutter 提交于 9年前

If rhashtable_walk_next detects a resize operation in progress, it jumps
to the new table and continues walking that one. But it misses to drop
the reference to it's current item, leading it to continue traversing
the new table's bucket in which the current item is sorted into, and
after reaching that bucket's end continues traversing the new table's
second bucket instead of the first one, thereby potentially missing
items.

This fixes the rhashtable runtime test for me. Bug probably introduced
by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during
rehash") although not explicitly tested.

Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash")
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

142b942a

07 6月, 2015 1 次提交

rhashtable: add missing import <linux/export.h> · 6d795413

由 Hauke Mehrtens 提交于 9年前

rhashtable uses EXPORT_SYMBOL_GPL() without importing linux/export.h
directly it is only imported indirectly through some other includes.
Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d795413

17 5月, 2015 1 次提交

rhashtable: Add cap on number of elements in hash table · 07ee0722

由 Herbert Xu 提交于 9年前

We currently have no limit on the number of elements in a hash table.
This is a problem because some users (tipc) set a ceiling on the
maximum table size and when that is reached the hash table may
degenerate.  Others may encounter OOM when growing and if we allow
insertions when that happens the hash table perofrmance may also
suffer.

This patch adds a new paramater insecure_max_entries which becomes
the cap on the table.  If unset it defaults to max_size * 2.  If
it is also zero it means that there is no cap on the number of
elements in the table.  However, the table will grow whenever the
utilisation hits 100% and if that growth fails, you will get ENOMEM
on insertion.

As allowing oversubscription is potentially dangerous, the name
contains the word insecure.

Note that the cap is not a hard limit.  This is done for performance
reasons as enforcing a hard limit will result in use of atomic ops
that are heavier than the ones we currently use.

The reasoning is that we're only guarding against a gross over-
subscription of the table, rather than a small breach of the limit.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07ee0722

06 5月, 2015 1 次提交

rhashtable: Simplify iterator code · c936a79f

由 Thomas Graf 提交于 9年前

Remove useless obj variable and goto logic.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c936a79f

23 4月, 2015 2 次提交

rhashtable: Do not schedule more than one rehash if we can't grow further · a87b9ebf

由 Thomas Graf 提交于 9年前

The current code currently only stops inserting rehashes into the
chain when no resizes are currently scheduled. As long as resizes
are scheduled and while inserting above the utilization watermark,
more and more rehashes will be scheduled.

This lead to a perfect DoS storm with thousands of rehashes
scheduled which lead to thousands of spinlocks to be taken
sequentially.

Instead, only allow either a series of resizes or a single rehash.
Drop any further rehashes and return -EBUSY.

Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a87b9ebf

rhashtable: Schedule async resize when sync realloc fails · e2307ed6

由 Thomas Graf 提交于 9年前

When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
we can't allocate the necessary memory in the current context but the
limits as set by the user would still allow to grow.

Thus attempt an async resize in the background where we can allocate
using GFP_KERNEL which is more likely to succeed. The insertion itself
will still fail to indicate pressure.

This fixes a bug where the table would never continue growing once the
utilization is above 100%.

Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2307ed6

26 3月, 2015 1 次提交

rhashtable: provide len to obj_hashfn · 49f7b33e

由 Patrick McHardy 提交于 9年前

nftables sets will be converted to use so called setextensions, moving
the key to a non-fixed position. To hash it, the obj_hashfn must be used,
however it so far doesn't receive the length parameter.

Pass the key length to obj_hashfn() and convert existing users.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

49f7b33e

25 3月, 2015 3 次提交

rhashtable: Add rhashtable_free_and_destroy() · 6b6f302c

由 Thomas Graf 提交于 9年前

rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b6f302c

rhashtable: Disable automatic shrinking by default · b5e2c150

由 Thomas Graf 提交于 9年前

Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5e2c150

rhashtable: Use 'unsigned int' consistently · 299e5c32

由 Thomas Graf 提交于 9年前

Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

299e5c32

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功