提交 · c95c2d328cd051484bea161e66dfa715c02a7d7e · openeuler / Kernel

17 4月, 2021 1 次提交

lib: remove "expecting prototype" kernel-doc warnings · c95c2d32

由 Randy Dunlap 提交于 4月 16, 2021

Fix various kernel-doc warnings in lib/ due to missing or erroneous
function names.

Add kernel-doc for some function parameters that was missing. Use
kernel-doc "Return:" notation in earlycpio.c.

Quietens the following warnings:

lib/earlycpio.c:61: warning: expecting prototype for cpio_data find_cpio_data(). Prototype was for find_cpio_data() instead

lib/lru_cache.c:640: warning: expecting prototype for lc_dump(). Prototype was for lc_seq_dump_details() instead
lru_cache.c:90: warning: Function parameter or member 'cache' not described in 'lc_create'

lib/parman.c:368: warning: expecting prototype for parman_item_del(). Prototype was for parman_item_remove() instead
parman.c:309: warning: Excess function parameter 'prority' description in 'parman_prio_init'

lib/radix-tree.c:703: warning: expecting prototype for __radix_tree_insert(). Prototype was for radix_tree_insert() instead
radix-tree.c:180: warning: Excess function parameter 'addr' description in 'radix_tree_find_next_bit'
radix-tree.c:180: warning: Excess function parameter 'size' description in 'radix_tree_find_next_bit'
radix-tree.c:931: warning: Function parameter or member 'iter' not described in 'radix_tree_iter_replace'

Link: https://lkml.kernel.org/r/20210411221756.15461-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c95c2d32

17 10月, 2020 1 次提交

lib: radix-tree: delete duplicated words · e0656501

由 Randy Dunlap 提交于 10月 15, 2020

Drop the repeated word "be".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200823040508.26086-1-rdunlap@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0656501

07 10月, 2020 1 次提交

radix tree test suite: Fix compilation · dd841a74

由 Matthew Wilcox (Oracle) 提交于 6月 14, 2020

Introducing local_lock broke compilation; fix it all up.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

dd841a74

17 7月, 2020 1 次提交

treewide: Remove uninitialized_var() usage · 3f649ab7

由 Kees Cook 提交于 6月 03, 2020

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: NKees Cook <keescook@chromium.org>

3f649ab7

28 5月, 2020 1 次提交

radix-tree: Use local_lock for protection · cfa6705d

由 Sebastian Andrzej Siewior 提交于 5月 27, 2020

The radix-tree and idr preload mechanisms use preempt_disable() to protect
the complete operation between xxx_preload() and xxx_preload_end().

As the code inside the preempt disabled section acquires regular spinlocks,
which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and
eventually calls into a memory allocator, this conflicts with the RT
semantics.

Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to preempt_disable() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-3-bigeasy@linutronix.de

cfa6705d

01 2月, 2020 1 次提交

ida: remove abandoned macros · 3a00e7c4

由 Alex Shi 提交于 1月 21, 2020

3 IDA_ started macros aren't used from commit f32f004c ("ida: Convert
to XArray"). so better to remove them.
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

3a00e7c4

03 11月, 2019 1 次提交

idr: Fix idr_alloc_u32 on 32-bit systems · b7e9728f

由 Matthew Wilcox (Oracle) 提交于 11月 02, 2019

Attempting to allocate an entry at 0xffffffff when one is already
present would succeed in allocating one at 2^32, which would confuse
everything. Return -ENOSPC in this case, as expected.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

b7e9728f

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 153 · de6cc651

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 or at your option any
  later version this program is distributed in the hope that it will
  be useful but without any warranty without even the implied warranty
  of merchantability or fitness for a particular purpose see the gnu
  general public license for more details you should have received a
  copy of the gnu general public license along with this program if
  not write to the free software foundation inc 675 mass ave cambridge
  ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 77 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NArmijn Hemel <armijn@tjaldur.nl>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.837555891@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

de6cc651

06 12月, 2018 1 次提交

radix tree: Don't return retry entries from lookup · eff3860b

由 Matthew Wilcox 提交于 12月 06, 2018

Commit 66ee620f ("idr: Permit any valid kernel pointer to be stored")
changed the radix tree lookup so that it stops when reaching the bottom
of the tree.  However, the condition was added in the wrong place,
making it possible to return retry entries to the caller.  Reorder the
tests to check for the retry entry before checking whether we're at the
bottom of the tree.  The retry entry should never be found in the tree
root, so it's safe to defer the check until the end of the loop.

Add a regression test to the test-suite to be sure this doesn't come
back.

Fixes: 66ee620f ("idr: Permit any valid kernel pointer to be stored")
Reported-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

eff3860b

21 10月, 2018 13 次提交

radix tree: Remove multiorder support · 3a08cd52

由 Matthew Wilcox 提交于 9月 22, 2018

All users have now been converted to the XArray. Removing the support
reduces code size and ensures new users will use the XArray instead.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

3a08cd52

radix tree test suite: Convert tag_tagged_items to XArray · 372266ba

由 Matthew Wilcox 提交于 8月 18, 2018

The tag_tagged_items() function is supposed to test the page-writeback
tagging code. Since that has been converted to the XArray, there's
not much point in testing the radix tree's tagging code. This requires
using the pthread mutex embedded in the xarray instead of an external
lock, so remove the pthread mutexes which protect xarrays/radix trees.
Also remove radix_tree_iter_tag_set() as this was the last user.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

372266ba

radix tree: Remove radix_tree_clear_tags · adb9d9c4

由 Matthew Wilcox 提交于 4月 09, 2018

The page cache was the only user of this interface and it has now
been converted to the XArray.  Transform the test into a test of
xas_init_marks().
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

adb9d9c4

radix tree: Remove radix_tree_maybe_preload_order · 8cf2f984

由 Matthew Wilcox 提交于 5月 19, 2018

This function was only used by the page cache which is now converted
to the XArray.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

8cf2f984

radix tree: Remove split/join code · 2956c664

由 Matthew Wilcox 提交于 5月 19, 2018

radix_tree_split and radix_tree_join were never used upstream.  Remove
them; if they're needed in future they will be replaced by XArray
equivalents.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

2956c664

radix tree: Remove radix_tree_update_node_t · 1cf56f9d

由 Matthew Wilcox 提交于 4月 09, 2018

The only user of this functionality was the workingset code, and it's
now been converted to the XArray. Remove __radix_tree_delete_node()
entirely as it was also only used by the workingset code.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

1cf56f9d

shmem: Convert shmem_alloc_hugepage to XArray · 7b8d046f

由 Matthew Wilcox 提交于 12月 01, 2017

xa_find() is a slightly easier API to use than
radix_tree_gang_lookup_slot() because it contains its own RCU locking.
This commit removes the last user of radix_tree_gang_lookup_slot()
so remove the function too.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

7b8d046f

page cache: Add and replace pages using the XArray · 74d60958

由 Matthew Wilcox 提交于 11月 17, 2017

Use the XArray APIs to add and replace pages in the page cache.  This
removes two uses of the radix tree preload API and is significantly
shorter code.  It also removes the last user of __radix_tree_create()
outside radix-tree.c itself, so make it static.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

74d60958

ida: Convert to XArray · f32f004c

由 Matthew Wilcox 提交于 7月 04, 2018

Use the XA_TRACK_FREE ability to track which entries have a free bit,
similarly to how it uses the radix tree's IDR_FREE tag. This eliminates
the per-cpu ida_bitmap preload, and fixes the memory consumption
regression I introduced when making the IDR able to store any pointer.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

f32f004c

xarray: Add XArray unconditional store operations · 58d6ea30

由 Matthew Wilcox 提交于 11月 10, 2017

xa_store() differs from radix_tree_insert() in that it will overwrite an
existing element in the array rather than returning an error. This is
the behaviour which most users want, and those that want more complex
behaviour generally want to use the xas family of routines anyway.

For memory allocation, xa_store() will first attempt to request memory
from the slab allocator; if memory is not immediately available, it will
drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
It does not use the per-CPU cache, although those will continue to exist
until all radix tree users are converted to the xarray.

This patch also includes xa_erase() and __xa_erase() for a streamlined
way to store NULL. Since there is no need to allocate memory in order
to store a NULL in the XArray, we do not need to trouble the user with
deciding what memory allocation flags to use.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

58d6ea30

xarray: Add XArray load operation · ad3d6c72

由 Matthew Wilcox 提交于 11月 07, 2017

The xa_load function brings with it a lot of infrastructure; xa_empty(),
xa_is_err(), and large chunks of the XArray advanced API that are used
to implement xa_load.

As the test-suite demonstrates, it is possible to use the XArray functions
on a radix tree.  The radix tree functions depend on the GFP flags being
stored in the root of the tree, so it's not possible to use the radix
tree functions on an XArray.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

ad3d6c72

xarray: Define struct xa_node · 01959dfe

由 Matthew Wilcox 提交于 11月 09, 2017

This is a direct replacement for struct radix_tree_node.  A couple of
struct members have changed name, so convert those.  Use a #define so
that radix tree users continue to work without change.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

01959dfe

xarray: Add definition of struct xarray · f8d5d0cc

由 Matthew Wilcox 提交于 11月 07, 2017

This is a direct replacement for struct radix_tree_root.  Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

f8d5d0cc

30 9月, 2018 3 次提交

xarray: Change definition of sibling entries · 02c02bf1

由 Matthew Wilcox 提交于 11月 03, 2017

Instead of storing a pointer to the slot containing the canonical entry,
store the offset of the slot.  Produces slightly more efficient code
(~300 bytes) and simplifies the implementation.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

02c02bf1

xarray: Replace exceptional entries · 3159f943

由 Matthew Wilcox 提交于 11月 03, 2017

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries.  This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry).  It is also a change in emphasis; exceptional entries are
intimidating and different.  As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>

3159f943

idr: Permit any valid kernel pointer to be stored · 66ee620f

由 Matthew Wilcox 提交于 6月 25, 2018

An upcoming change to the encoding of internal entries will set the bottom
two bits to 0b10. Unfortunately, m68k only aligns some data structures
to 2 bytes, so the IDR will interpret them as internal entries and things
will go badly wrong.

Change the radix tree so that it stops either when the node indicates
that it's the bottom of the tree (shift == 0) or when the entry is not an
internal entry. This means we cannot insert an arbitrary kernel pointer
as a multiorder entry, but the IDR does not permit multiorder entries.

Annoyingly, this means the IDR can no longer take advantage of the radix
tree's ability to store a single entry at offset 0 without allocating
memory. A pointer which is 2-byte aligned cannot be stored directly in
the root as it would be indistinguishable from a node, so we must allocate
a node in order to store a 2-byte pointer at index 0. The idr_replace()
function does not take a GFP flags argument, so cannot allocate memory.
If a user inserts a 4-byte aligned pointer at index 0 and then replaces
it with a 2-byte aligned pointer, we must be able to store it.

Arbitrary pointer values are still not permitted; pointers of the
form 2 + (i * 4) for values of i between 0 and 1023 are reserved for
the implementation. These are not valid kernel pointers as they would
point into the zero page.

This change does cause a runtime memory consumption regression for
the IDA. I will recover that later.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NGuenter Roeck <linux@roeck-us.net>

66ee620f

22 8月, 2018 2 次提交

ida: Remove old API · b03f8e43

由 Matthew Wilcox 提交于 6月 18, 2018

Delete ida_pre_get(), ida_get_new(), ida_get_new_above() and ida_remove()
from the public API. Some of these functions still exist as internal
helpers, but they should not be called by consumers.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

b03f8e43

radix-tree: Fix UBSAN warning · 76f070b4

由 Matthew Wilcox 提交于 8月 18, 2018

get_slot_offset() can be called with a NULL 'parent' argument.
In this case, the calculated value will not be used, but calculating
it is undefined.  Rather than fixing the caller (__radix_tree_delete)
to not call get_slot_offset(), make get_slot_offset() robust against
being called with a NULL parent.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>

76f070b4

26 5月, 2018 1 次提交

idr: fix invalid ptr dereference on item delete · 7a4deea1

由 Matthew Wilcox 提交于 5月 25, 2018

If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen.  This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.

Roman said:

  The syzcaller test boils down to opening /dev/kvm, creating an
  eventfd, and calling a couple of KVM ioctls. None of this requires
  superuser. And the result is dereferencing an uninitialized pointer
  which is likely a crash. The specific path caught by syzbot is via
  KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
  other user-triggerable paths, so cc:stable is probably justified.

Matthew added:

  We have around 250 calls to idr_remove() in the kernel today. Many of
  them pass an ID which is embedded in the object they're removing, so
  they're safe. Picking a few likely candidates:

  drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
  drivers/atm/nicstar.c could be taken down by a handcrafted packet

Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a4deea1

19 5月, 2018 1 次提交

radix tree: fix multi-order iteration race · 9f418224

由 Ross Zwisler 提交于 5月 18, 2018

Fix a race in the multi-order iteration code which causes the kernel to
hit a GP fault.  This was first seen with a production v4.15 based
kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
order 9 PMD DAX entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember for example that
an order 2 entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

We fix this race by fixing the way that skip_siblings() detects sibling
nodes.  Instead of testing against the preceding slot we instead look
for siblings via is_sibling_entry() which compares against the position
of the struct radix_tree_node.slots[] array.  This ensures that sibling
entries are properly identified, even if they are no longer contiguous
with the 'entry' they point to.

Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reported-by: NCR, Sapthagirish <sapthagirish.cr@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f418224

12 4月, 2018 1 次提交

radix tree: use GFP_ZONEMASK bits of gfp_t for flags · fa290cda

由 Matthew Wilcox 提交于 4月 10, 2018

Patch series "XArray", v9.  (First part thereof).

This patchset is, I believe, appropriate for merging for 4.17.  It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.

This conversion keeps the radix tree and XArray data structures in sync
at all times.  That allows us to convert the page cache one function at
a time and should allow for easier bisection.  Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines.  I have changes planned to the XArray data
structure, but those will happen in future patches.

Improvements the XArray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'. But what most users really want is an automatically
   resizing array, and so it makes more sense to give users an API that
   is like an array -- 'load' and 'store'. We still have an 'insert'
   operation for users that really want that semantic.

 - The XArray considers locking as part of its API. This simplifies a
   lot of users who formerly had to manage their own locking just for
   the radix tree. It also improves code generation as we can now tell
   RCU that we're holding a lock and it doesn't need to generate as much
   fencing code. The other advantage is that tree nodes can be moved
   (not yet implemented).

 - GFP flags are now parameters to calls which may need to allocate
   memory. The radix tree forced users to decide what the allocation
   flags would be at creation time. It's much clearer to specify them at
   allocation time.

 - Memory is not preloaded; we don't tie up dozens of pages on the off
   chance that the slab allocator fails. Instead, we drop the lock,
   allocate a new node and retry the operation. We have to convert all
   the radix tree, IDA and IDR preload users before we can realise this
   benefit, but I have not yet found a user which cannot be converted.

 - The XArray provides a cmpxchg operation. The radix tree forces users
   to roll their own (and at least four have).

 - Iterators take a 'max' parameter. That simplifies many users and will
   reduce the amount of iteration done.

 - Iteration can proceed backwards. We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.

 - RCU-protected pointers are not exposed as part of the API. There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.

 - Value entries gain an extra bit compared to radix tree exceptional
   entries. That gives us the extra bit we need to put huge page swap
   entries in the page cache.

 - Some iterators now take a 'filter' argument instead of having
   separate iterators for tagged/untagged iterations.

The page cache is improved by this:

 - Shorter, easier to read code

 - More efficient iterations

 - Reduction in size of struct address_space

 - Fewer walks from the top of the data structure; the XArray API
   encourages staying at the leaf node and conducting operations there.

This patch (of 8):

None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.

Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Acked-by: NJeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa290cda

22 2月, 2018 1 次提交

ida: do zeroing in ida_pre_get() · b1a8a7a7

由 Rasmus Villemoes 提交于 2月 21, 2018

As far as I can tell, the only place the per-cpu ida_bitmap is populated
is in ida_pre_get.  The pre-allocated element is stolen in two places in
ida_get_new_above, in both cases immediately followed by a memset(0).

Since ida_get_new_above is called with locks held, do the zeroing in
ida_pre_get, or rather let kmalloc() do it.  Also, apparently gcc
generates ~44 bytes of code to do a memset(, 0, 128):

  $ scripts/bloat-o-meter vmlinux.{0,1}
  add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83)
  Function                                     old     new   delta
  ida_pre_get                                  115     119      +4
  vermagic                                      27      28      +1
  ida_get_new_above                            715     627     -88

Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1a8a7a7

07 2月, 2018 1 次提交

idr: Remove idr_alloc_ext · 460488c5

由 Matthew Wilcox 提交于 11月 28, 2017

It has no more users, so remove it. Move idr_alloc() back into idr.c,
move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
wrappers around idr_get_free_cmn() and rename it to idr_get_free().
While there is now no interface to allocate IDs larger than a u32,
the IDR internals remain ready to handle a larger ID should a need arise.

These changes make it possible to provide the guarantee that, if the
nextid pointer points into the object, the object's ID will be initialised
before a concurrent lookup can find the object.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>

460488c5

16 11月, 2017 1 次提交

mm, truncate: do not check mapping for every page being truncated · c7df8ad2

由 Mel Gorman 提交于 11月 15, 2017

During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.

This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called.  The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                             oneirq-v1r1        pickhelper-v1r1
Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)

As you'd expect, the gain is marginal but it can be detected.  The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field.  The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.

Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7df8ad2

09 9月, 2017 1 次提交

radix-tree: must check __radix_tree_preload() return value · bc9ae224

由 Eric Dumazet 提交于 9月 08, 2017

__radix_tree_preload() only disables preemption if no error is returned.

So we really need to make sure callers always check the return value.

idr_preload() contract is to always disable preemption, so we need
to add a missing preempt_disable() if an error happened.

Similarly, ida_pre_get() only needs to call preempt_enable() in the
case no error happened.

Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com
Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
Fixes: 7ad3d4d8 ("ida: Move ida_bitmap to a percpu variable")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>    [4.11+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc9ae224

31 8月, 2017 1 次提交

idr: Add new APIs to support unsigned long · 388f79fd

由 Chris Mi 提交于 8月 30, 2017

The following new APIs are added:

int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
                  unsigned long start, unsigned long end, gfp_t gfp);
void *idr_remove_ext(struct idr *idr, unsigned long id);
void *idr_find_ext(const struct idr *idr, unsigned long id);
void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
Signed-off-by: NChris Mi <chrism@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

388f79fd

18 8月, 2017 1 次提交

drm/i915: Replace execbuf vma ht with an idr · d1b48c1e

由 Chris Wilson 提交于 8月 16, 2017

This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk

d1b48c1e

04 5月, 2017 1 次提交

lockdep: allow to disable reclaim lockup detection · 7e784422

由 Michal Hocko 提交于 5月 03, 2017

The current implementation of the reclaim lockup detection can lead to
false positives and those even happen and usually lead to tweak the code
to silence the lockdep by using GFP_NOFS even though the context can use
__GFP_FS just fine.

See

  http://lkml.kernel.org/r/20160512080321.GA18496@dastard

as an example.

  =================================
  [ INFO: inconsistent lock state ]
  4.5.0-rc2+ #4 Tainted: G           O
  ---------------------------------
  inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
  kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:

  (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]

  {RECLAIM_FS-ON-R} state was registered at:
    mark_held_locks+0x79/0xa0
    lockdep_trace_alloc+0xb3/0x100
    kmem_cache_alloc+0x33/0x230
    kmem_zone_alloc+0x81/0x120 [xfs]
    xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
    __xfs_refcount_find_shared+0x75/0x580 [xfs]
    xfs_refcount_find_shared+0x84/0xb0 [xfs]
    xfs_getbmap+0x608/0x8c0 [xfs]
    xfs_vn_fiemap+0xab/0xc0 [xfs]
    do_vfs_ioctl+0x498/0x670
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x12/0x6f

         CPU0
         ----
    lock(&xfs_nondir_ilock_class);
    <Interrupt>
      lock(&xfs_nondir_ilock_class);

   *** DEADLOCK ***

  3 locks held by kswapd0/543:

  stack backtrace:
  CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
  Call Trace:
   lock_acquire+0xd8/0x1e0
   down_write_nested+0x5e/0xc0
   xfs_ilock+0x177/0x200 [xfs]
   xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
   xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
   evict+0xc5/0x190
   dispose_list+0x39/0x60
   prune_icache_sb+0x4b/0x60
   super_cache_scan+0x14f/0x1a0
   shrink_slab.part.63.constprop.79+0x1e9/0x4e0
   shrink_zone+0x15e/0x170
   kswapd+0x4f1/0xa80
   kthread+0xf2/0x110
   ret_from_fork+0x3f/0x70

To quote Dave:
 "Ignoring whether reflink should be doing anything or not, that's a
  "xfs_refcountbt_init_cursor() gets called both outside and inside
  transactions" lockdep false positive case. The problem here is lockdep
  has seen this allocation from within a transaction, hence a GFP_NOFS
  allocation, and now it's seeing it in a GFP_KERNEL context. Also note
  that we have an active reference to this inode.

  So, because the reclaim annotations overload the interrupt level
  detections and it's seen the inode ilock been taken in reclaim
  ("interrupt") context, this triggers a reclaim context warning where
  it thinks it is unsafe to do this allocation in GFP_KERNEL context
  holding the inode ilock..."

This sounds like a fundamental problem of the reclaim lock detection.
It is really impossible to annotate such a special usecase IMHO unless
the reclaim lockup detection is reworked completely.  Until then it is
much better to provide a way to add "I know what I am doing flag" and
mark problematic places.  This would prevent from abusing GFP_NOFS flag
which has a runtime effect even on configurations which have lockdep
disabled.

Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
skip the current allocation request.

While we are at it also make sure that the radix tree doesn't
accidentaly override tags stored in the upper part of the gfp_mask.

Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e784422

08 3月, 2017 1 次提交

ida: Free correct IDA bitmap · 4ecd9542

由 Matthew Wilcox 提交于 3月 03, 2017

There's a relatively rare race where we look at the per-cpu preallocated
IDA bitmap, see it's NULL, allocate a new one, and atomically update it.
If the kmalloc() happened to sleep and we were rescheduled to a different
CPU, or an interrupt came in at the exact right time, another task
might have successfully allocated a bitmap and already deposited it.
I forgot what the semantics of cmpxchg() were and ended up freeing the
wrong bitmap leading to KASAN reporting a use-after-free.

Dmitry found the bug with syzkaller & wrote the patch. I wrote the test
case that will reproduce the bug without his patch being applied.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>

4ecd9542

14 2月, 2017 2 次提交

radix-tree: Fix __rcu annotations · d7b62727

由 Matthew Wilcox 提交于 2月 13, 2017

Many places were missing __rcu annotations.  A few places needed a few
lines of explanation about why it was safe to not use RCU accessors.
Add a custom CFLAGS setting to the Makefile to ensure that new patches
don't miss RCU annotations.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>

d7b62727

radix-tree: Add rcu_dereference and rcu_assign_pointer calls · 12320d0f

由 Matthew Wilcox 提交于 2月 13, 2017

Some of these have been missing for many years.  Others were recently
introduced by me.  Fortunately, we have tools that help us find such
things.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>

12320d0f

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功