提交 · c8c0c239d5ab1e3e8d2bb0453ce642fe2c6357ec · openeuler / Kernel

22 1月, 2022 1 次提交

fs: move dcache sysctls to its own file · c8c0c239

由 Luis Chamberlain 提交于 1月 21, 2022

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the dcache sysctl clutter out of kernel/sysctl.c.  This is a
small one-off entry, perhaps later we can simplify this representation,
but for now we use the helpers we have.  We won't know how we can
simplify this further untl we're fully done with the cleanup.

[arnd@arndb.de: avoid unused-function warning]
  Link: https://lkml.kernel.org/r/20211203190123.874239-2-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211129205548.605569-4-mcgrof@kernel.orgSigned-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8c0c239

16 4月, 2021 1 次提交
- A
  useful constants: struct qstr for ".." · 80e5d1ff
  由 Al Viro 提交于 4月 15, 2021
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  80e5d1ff
25 2月, 2021 1 次提交

fs: delete repeated words in comments · 3d742d4b

由 Randy Dunlap 提交于 2月 24, 2021

Delete duplicate words in fs/*.c.
The doubled words that are being dropped are:
  that, be, the, in, and, for

Link: https://lkml.kernel.org/r/20201224052810.25315-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d742d4b

22 1月, 2021 1 次提交

fs: fix kernel-doc markups · 961f3c89

由 Mauro Carvalho Chehab 提交于 1月 14, 2021

Two markups are at the wrong place. Kernel-doc only
support having the comment just before the identifier.

Also, some identifiers have different names between their
prototypes and the kernel-doc markup.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/96b1e1b388600ab092331f6c4e88ff8e8779ce6c.1610610937.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

961f3c89

17 1月, 2021 1 次提交

new helper: d_find_alias_rcu() · bca585d2

由 Al Viro 提交于 1月 05, 2021

similar to d_find_alias(inode), except that
	* the caller must be holding rcu_read_lock()
	* inode must not be freed until matching rcu_read_unlock()
	* result is *NOT* pinned and can only be dereferenced until
the matching rcu_read_unlock().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bca585d2

11 12月, 2020 1 次提交

fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set · 77573fa3

由 Hao Li 提交于 12月 08, 2020

If DCACHE_REFERENCED is set, fast_dput() will return true, and then
retain_dentry() have no chance to check DCACHE_DONTCACHE. As a result,
the dentry won't be killed and the corresponding inode can't be evicted.
In the following example, the DAX policy can't take effects unless we
do a drop_caches manually.

  # DCACHE_LRU_LIST will be set
  echo abcdefg > test.txt

  # DCACHE_REFERENCED will be set and DCACHE_DONTCACHE can't do anything
  xfs_io -c 'chattr +x' test.txt

  # Drop caches to make DAX changing take effects
  echo 2 > /proc/sys/vm/drop_caches

What this patch does is preventing fast_dput() from returning true if
DCACHE_DONTCACHE is set. Then retain_dentry() will detect the
DCACHE_DONTCACHE and will return false. As a result, the dentry will be
killed and the inode will be evicted. In this way, if we change per-file
DAX policy, it will take effects automatically after this file is closed
by all processes.

I also add some comments to make the code more clear.
Signed-off-by: NHao Li <lihao2018.fnst@cn.fujitsu.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77573fa3

29 7月, 2020 1 次提交

vfs: Use sequence counter with associated spinlock · 26475371

由 Ahmed S. Darwish 提交于 7月 20, 2020

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-19-a.darwish@linutronix.de

26475371

13 5月, 2020 1 次提交

fs: Introduce DCACHE_DONTCACHE · 2c567af4

由 Ira Weiny 提交于 4月 30, 2020

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

2c567af4

27 4月, 2020 1 次提交

sysctl: pass kernel pointers to ->proc_handler · 32927393

由 Christoph Hellwig 提交于 4月 24, 2020

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

32927393

16 11月, 2019 2 次提交

fs/namei.c: fix missing barriers when checking positivity · 2fa6b1e0

由 Al Viro 提交于 11月 12, 2019

Pinned negative dentries can, generally, be made positive
by another thread.  Conditions that prevent that are
	* ->d_lock on dentry in question
	* parent directory held at least shared
	* nobody else could have observed the address of dentry
Most of the places working with those fall into one of those
categories; however, d_lookup() and friends need to be used
with some care.  Fortunately, there's not a lot of call sites,
and with few exceptions all of those fall under one of the
cases above.

Exceptions are all in fs/namei.c - in lookup_fast(), lookup_dcache()
and mountpoint_last().  Another one is lookup_slow() - there
dcache lookup is done with parent held shared, but the result
is used after we'd drop the lock.  The same happens in do_last() -
the lookup (in lookup_one()) is done with parent locked, but
result is used after unlocking.

lookup_fast(), do_last() and mountpoint_last() flat-out reject
negatives.

Most of lookup_dcache() calls are made with parent locked at least
shared; the only exception is lookup_one_len_unlocked().  It might
return pinned negative, needs serious care from callers.  Fortunately,
almost nobody calls it directly anymore; all but two callers have
converted to lookup_positive_unlocked(), which rejects negatives.

lookup_slow() is called by the same lookup_one_len_unlocked() (see
above), mountpoint_last() and walk_component().  In those two negatives
are rejected.

In other words, there is a small set of places where we need to
check carefully if a pinned potentially negative dentry is, in
fact, positive.  After that check we want to be sure that both
->d_inode and type bits in ->d_flags are stable and observed.
The set consists of follow_managed() (where the rejection happens
for lookup_fast(), walk_component() and do_last()), last_mountpoint()
and lookup_positive_unlocked().

Solution:
	1) transition from negative to positive (in __d_set_inode_and_type())
stores ->d_inode, then uses smp_store_release() to set ->d_flags type bits.
	2) aforementioned 3 places in fs/namei.c fetch ->d_flags with
smp_load_acquire() and bugger off if it type bits say "negative".
That way anyone downstream of those checks has dentry know positive pinned,
with ->d_inode and type bits of ->d_flags stable and observed.

I considered splitting off d_lookup_positive(), so that the checks could
be done right there, under ->d_lock.  However, that leads to massive
duplication of rather subtle code in fs/namei.c and fs/dcache.c.  It's
worse than it might seem, thanks to autofs ->d_manage() getting involved ;-/
No matter what, autofs_d_manage()/autofs_d_automount() must live with
the possibility of pinned negative dentry passed their way, becoming
positive under them - that's the intended behaviour when lookup comes
in the middle of automount in progress, so we can't keep them out of
the area that has to deal with those, more's the pity...
Reported-by: NRitesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2fa6b1e0

fix dget_parent() fastpath race · e8400933

由 Al Viro 提交于 10月 31, 2019

We are overoptimistic about taking the fast path there; seeing
the same value in ->d_parent after having grabbed a reference
to that parent does *not* mean that it has remained our parent
all along.

That wouldn't be a big deal (in the end it is our parent and
we have grabbed the reference we are about to return), but...
the situation with barriers is messed up.

We might have hit the following sequence:

d is a dentry of /tmp/a/b
CPU1:					CPU2:
parent = d->d_parent (i.e. dentry of /tmp/a)
					rename /tmp/a/b to /tmp/b
					rmdir /tmp/a, making its dentry negative
grab reference to parent,
end up with cached parent->d_inode (NULL)
					mkdir /tmp/a, rename /tmp/b to /tmp/a/b
recheck d->d_parent, which is back to original
decide that everything's fine and return the reference we'd got.

The trouble is, caller (on CPU1) will observe dget_parent()
returning an apparently negative dentry.  It actually is positive,
but CPU1 has stale ->d_inode cached.

Use d->d_seq to see if it has been moved instead of rechecking ->d_parent.
NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch;
we just go into the slow path in such case.  We don't wait for ->d_seq
to become even either - again, if we are racing with renames, we
can bloody well go to slow path anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e8400933

26 10月, 2019 1 次提交

make __d_alloc() static · 5c8b0dfc

由 Al Viro 提交于 10月 25, 2019

no users outside of fs/dcache.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5c8b0dfc

09 10月, 2019 1 次提交

locking/lockdep: Remove unused @nested argument from lock_release() · 5facae4f

由 Qian Cai 提交于 9月 19, 2019

Since the following commit:

  b4adfe8e ("locking/lockdep: Remove unused argument in __lock_release")

@nested is no longer used in lock_release(), so remove it from all
lock_release() calls and friends.
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: alexander.levin@microsoft.com
Cc: daniel@iogearbox.net
Cc: davem@davemloft.net
Cc: dri-devel@lists.freedesktop.org
Cc: duyuyang@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: hannes@cmpxchg.org
Cc: intel-gfx@lists.freedesktop.org
Cc: jack@suse.com
Cc: jlbec@evilplan.or
Cc: joonas.lahtinen@linux.intel.com
Cc: joseph.qi@linux.alibaba.com
Cc: jslaby@suse.com
Cc: juri.lelli@redhat.com
Cc: maarten.lankhorst@linux.intel.com
Cc: mark@fasheh.com
Cc: mhocko@kernel.org
Cc: mripard@kernel.org
Cc: ocfs2-devel@oss.oracle.com
Cc: rodrigo.vivi@intel.com
Cc: sean@poorly.run
Cc: st@kernel.org
Cc: tj@kernel.org
Cc: tytso@mit.edu
Cc: vdavydov.dev@gmail.com
Cc: vincent.guittot@linaro.org
Cc: viro@zeniv.linux.org.uk
Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pwSigned-off-by: NIngo Molnar <mingo@kernel.org>

5facae4f

10 7月, 2019 1 次提交

Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists · 9bdebc2b

由 Al Viro 提交于 6月 29, 2019

Currently, running into a shrink list that contains dentries from different
filesystems can cause several unpleasant things for shrink_dcache_parent()
and for umount(2).

The first problem is that there's a window during shrink_dentry_list() between
__dentry_kill() takes a victim out and dropping reference to its parent.  During
that window the parent looks like a genuine busy dentry.  shrink_dcache_parent()
(or, worse yet, shrink_dcache_for_umount()) coming at that time will see no
eviction candidates and no indication that it needs to wait for some
shrink_dentry_list() to proceed further.

That applies for any shrink list that might intersect with the subtree we are
trying to shrink; the only reason it does not blow on umount(2) in the mainline
is that we unregister the memory shrinker before hitting shrink_dcache_for_umount().

Another problem happens if something in a mixed-filesystem shrink list gets
be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for
the stuck shrinker to get around to our dentries.

Solution:
        1) have shrink_dentry_list() decrement the parent's refcount and
make sure it's on a shrink list (ours unless it already had been on some
other) before calling __dentry_kill().  That eliminates the window when
shrink_dcache_parent() would've blown past the entire subtree without
noticing anything with zero refcount not on shrink lists.
	2) when shrink_dcache_parent() has found no eviction candidates,
but some dentries are still sitting on shrink lists, rather than
repeating the scan in hope that shrinkers have progressed, scan looking
for something on shrink lists with zero refcount.  If such a thing is
found, grab rcu_read_lock() and stop the scan, with caller locking
it for eviction, dropping out of RCU and doing __dentry_kill(), with
the same treatment for parent as shrink_dentry_list() would do.

Note that right now mixed-filesystem shrink lists do not occur, so this
is not a mainline bug.  Howevere, there's a bunch of uses for such
beasts (e.g. the "try and evict everything we can out of given page"
patches; there are potential uses in mount-related code, considerably
simplifying the life in fs/namespace.c, etc.)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9bdebc2b

20 6月, 2019 1 次提交

fsnotify: move fsnotify_nameremove() hook out of d_delete() · 49246466

由 Amir Goldstein 提交于 5月 26, 2019

d_delete() was piggy backed for the fsnotify_nameremove() hook when
in fact not all callers of d_delete() care about fsnotify events.

For all callers of d_delete() that may be interested in fsnotify events,
we made sure to call one of fsnotify_{unlink,rmdir}() hooks before
calling d_delete().

Now we can move the fsnotify_nameremove() call from d_delete() to the
fsnotify_{unlink,rmdir}() hooks.

Two explicit calls to fsnotify_nameremove() from nfs/afs sillyrename
are also removed. This will cause a change of behavior - nfs/afs will
NOT generate an fsnotify delete event when renaming over a positive
dentry.  This change is desirable, because it is consistent with the
behavior of all other filesystems.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

49246466

21 5月, 2019 1 次提交

treewide: Add SPDX license identifier for missed files · 457c8996

由 Thomas Gleixner 提交于 5月 19, 2019

Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

457c8996

27 4月, 2019 1 次提交
- A
  ovl_lookup_real_one(): don't bother with strlen() · 230c6402
  由 Al Viro 提交于 4月 26, 2019
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  230c6402
17 4月, 2019 1 次提交

fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory · 0bf3d5c1

由 Eric Biggers 提交于 3月 20, 2019

Make __d_move() clear DCACHE_ENCRYPTED_NAME on the source dentry.  This
is needed for when d_splice_alias() moves a directory's encrypted alias
to its decrypted alias as a result of the encryption key being added.

Otherwise, the decrypted alias will incorrectly be invalidated on the
next lookup, causing problems such as unmounting a mount the user just
mount()ed there.

Note that we don't have to support arbitrary moves of this flag because
fscrypt doesn't allow dentries with DCACHE_ENCRYPTED_NAME to be the
source or target of a rename().

Fixes: 28b4c263 ("ext4 crypto: revalidate dentry after adding or removing the key")
Reported-by: NSarthak Kukreti <sarthakkukreti@chromium.org>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0bf3d5c1

10 4月, 2019 2 次提交

unexport d_alloc_pseudo() · ab1152dd

由 Al Viro 提交于 3月 15, 2019

No modular uses since introducion of alloc_file_pseudo(),
and the only non-modular user not in alloc_file_pseudo()
had actually been wrong - should've been d_alloc_anon().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ab1152dd

dcache: sort the freeing-without-RCU-delay mess for good. · 5467a68c

由 Al Viro 提交于 3月 15, 2019

For lockless accesses to dentries we don't have pinned we rely
(among other things) upon having an RCU delay between dropping
the last reference and actually freeing the memory.

On the other hand, for things like pipes and sockets we neither
do that kind of lockless access, nor want to deal with the
overhead of an RCU delay every time a socket gets closed.

So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
made sure it would happen.  We tried to avoid setting it unless
we knew we need it.  Unfortunately, that had led to recurring
class of bugs, in which we missed the need to set it.

We only really need it for dentries that are created by
d_alloc_pseudo(), so let's not bother with trying to be smart -
just make having an RCU delay the default.  The ones that do
*not* get it set the replacement flag (DCACHE_NORCU) and we'd
better use that sparingly.  d_alloc_pseudo() is the only
such user right now.

FWIW, the race that finally prompted that switch had been
between __lock_parent() of immediate subdirectory of what's
currently the root of a disconnected tree (e.g. from
open-by-handle in progress) racing with d_splice_alias()
elsewhere picking another alias for the same inode, either
on outright corrupted fs image, or (in case of open-by-handle
on NFS) that subdirectory having been just moved on server.
It's not easy to hit, so the sky is not falling, but that's
not the first race on similar missed cases and the logics
for settinf DCACHE_RCUACCESS has gotten ridiculously
convoluted.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5467a68c

31 1月, 2019 2 次提交

fs/dcache: Track & report number of negative dentries · af0c9af1

由 Waiman Long 提交于 1月 30, 2019

The current dentry number tracking code doesn't distinguish between
positive & negative dentries.  It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file.  The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.

The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.

Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension.  They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero.  IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af0c9af1

fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() · 1dbd449c

由 Waiman Long 提交于 1月 30, 2019

The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused.  This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().

To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.

Fixes: 4e717f5c ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: NWaiman Long <longman@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1dbd449c

31 10月, 2018 1 次提交

mm: remove include/linux/bootmem.h · 57c8a661

由 Mike Rapoport 提交于 10月 30, 2018

Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57c8a661

27 10月, 2018 1 次提交

dcache: allocate external names from reclaimable kmalloc caches · 2e03b4bc

由 Vlastimil Babka 提交于 10月 26, 2018

We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
external names in dcache, which will take care of the proper accounting
automatically, and also improve anti-fragmentation page grouping.

This effectively reverts commit f1782c9b ("dcache: account external
names as indirectly reclaimable memory") and instead passes
__GFP_RECLAIMABLE to kmalloc().  The accounting thus moves from
NR_INDIRECTLY_RECLAIMABLE_BYTES to NR_SLAB_RECLAIMABLE, which is also
considered in MemAvailable calculation and overcommit decisions.

Link: http://lkml.kernel.org/r/20180731090649.16028-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e03b4bc

18 8月, 2018 1 次提交

fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot() · 6cd00a01

由 Tetsuo Handa 提交于 8月 17, 2018

Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes
are initialized at __d_alloc(), we can't copy the whole size
unconditionally.

 WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50)
 636f6e66696766732e746d70000000000010000000000000020000000188ffff
  i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u
                                  ^
 RIP: 0010:take_dentry_name_snapshot+0x28/0x50
 RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246
 RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002
 RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60
 RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001
 R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00
 R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000
 FS:  00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0
  take_dentry_name_snapshot+0x28/0x50
  vfs_rename+0x128/0x870
  SyS_rename+0x3b2/0x3d0
  entry_SYSCALL_64_fastpath+0x1a/0xa4
  0xffffffffffffffff

Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6cd00a01

10 8月, 2018 1 次提交

make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 4c0d7cd5

由 Al Viro 提交于 8月 09, 2018

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: N"Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4c0d7cd5

06 8月, 2018 2 次提交

root dentries need RCU-delayed freeing · 90bad5e0

由 Al Viro 提交于 8月 06, 2018

Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.

It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock().  If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.

Fixes: 48a066e7 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

90bad5e0

fs: dcache: Use true and false for boolean values · 7964410f

由 Gustavo A. R. Silva 提交于 8月 01, 2018

Return statements in functions returning bool should use true or false
instead of an integer value.

This issue was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7964410f

04 8月, 2018 1 次提交

new primitive: discard_new_inode() · c2b6d621

由 Al Viro 提交于 6月 28, 2018

	We don't want open-by-handle picking half-set-up in-core
struct inode from e.g. mkdir() having failed halfway through.
In other words, we don't want such inodes returned by iget_locked()
on their way to extinction.  However, we can't just have them
unhashed - otherwise open-by-handle immediately *after* that would've
ended up creating a new in-core inode over the on-disk one that
is in process of being freed right under us.

	Solution: new flag (I_CREATING) set by insert_inode_locked() and
removed by unlock_new_inode() and a new primitive (discard_new_inode())
to be used by such halfway-through-setup failure exits instead of
unlock_new_inode() / iput() combinations.  That primitive unlocks new
inode, but leaves I_CREATING in place.

	iget_locked() treats finding an I_CREATING inode as failure
(-ESTALE, once we sort out the error propagation).
	insert_inode_locked() treats the same as instant -EBUSY.
	ilookup() treats those as icache miss.

[Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c2b6d621

02 8月, 2018 1 次提交

kill d_instantiate_no_diralias() · c971e6a0

由 Al Viro 提交于 5月 28, 2018

The only user is fuse_create_new_entry(), and there it's used to
mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
The same solution applies - unhash the mkdir argument, then
call d_splice_alias() and if that returns a reference to preexisting
alias, dput() and report success.  ->mkdir() argument left unhashed
negative with the preexisting alias moved in the right place is just
fine from the ->mkdir() callers point of view.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c971e6a0

24 6月, 2018 1 次提交

kill dentry_update_name_case() · 63a67a92

由 Al Viro 提交于 6月 23, 2018

the last user is gone
Spotted-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

63a67a92

14 5月, 2018 1 次提交

get rid of dead code in d_find_alias() · 61fec493

由 Al Viro 提交于 4月 25, 2018

All "try disconnected alias if nothing else fits" logics in d_find_alias()
got accidentally disabled by Neil a while ago; for most of the callers it
was the right thing to do, so fixes belong in few callers that *do* want
disconnected aliases.  This just takes the now-dead code in d_find_alias()
out.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

61fec493

12 5月, 2018 1 次提交

do d_instantiate/unlock_new_inode combinations safely · 1e2e547a

由 Al Viro 提交于 5月 04, 2018

For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: NMike Marshall <hubcap@omnibond.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1e2e547a

20 4月, 2018 1 次提交
- A
  restore cond_resched() in shrink_dcache_parent() · 4fb48871
  由 Al Viro 提交于 4月 19, 2018
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  4fb48871
16 4月, 2018 4 次提交

dput(): turn into explicit while() loop · 1088a640

由 Al Viro 提交于 4月 15, 2018

No need to mess with gotos when the code yielded by straight while()
isn't any worse...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1088a640

A
dcache: move cond_resched() into the end of __dentry_kill() · 9c5f1d30
由 Al Viro 提交于 4月 15, 2018
```
cond_resched() in shrink_dentry_list() is too early
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9c5f1d30
A
d_walk(): kill 'finish' callback · 3a8e3611
由 Al Viro 提交于 4月 15, 2018
```
no users left
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3a8e3611

d_invalidate(): unhash immediately · ff17fa56

由 Al Viro 提交于 4月 15, 2018

Once that is done, we can just hunt mountpoints down one by one;
no new mountpoints can be added from now on, so we don't need
anything tricky in finish() callback, etc.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff17fa56

12 4月, 2018 2 次提交

fs/dcache.c: add cond_resched() in shrink_dentry_list() · 32785c05

由 Nikolay Borisov 提交于 4月 10, 2018

As previously reported (https://patchwork.kernel.org/patch/8642031/)
it's possible to call shrink_dentry_list with a large number of dentries
(> 10000).  This, in turn, could trigger the softlockup detector and
possibly trigger a panic.  In addition to the unmount path being
vulnerable to this scenario, at SuSE we've observed similar situation
happening during process exit on processes that touch a lot of dentries.
Here is an excerpt from a crash dump.  The number after the colon are
the number of dentries on the list passed to shrink_dentry_list:

PID 99760: 10722
PID 107530: 215
PID 108809: 24134
PID 108877: 21331
PID 141708: 16487

So we want to kill between 15k-25k dentries without yielding.

And one possible call stack looks like:

4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8
5 [ffff8839ece41db0] evict at ffffffff811c3026
6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258
7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593
8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830
9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61
10 [ffff8839ece41ec0] release_task at ffffffff81059ebd
11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce
12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53
13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909

While some of the callers of shrink_dentry_list do use cond_resched,
this is not sufficient to prevent softlockups.  So just move
cond_resched into shrink_dentry_list from its callers.

David said: I've found hundreds of occurrences of warnings that we emit
when need_resched stays set for a prolonged period of time with the
stack trace that is included in the change log.

Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32785c05

dcache: account external names as indirectly reclaimable memory · f1782c9b

由 Roman Gushchin 提交于 4月 10, 2018

I received a report about suspicious growth of unreclaimable slabs on
some machines.  I've found that it happens on machines with low memory
pressure, and these unreclaimable slabs are external names attached to
dentries.

External names are allocated using generic kmalloc() function, so they
are accounted as unreclaimable.  But they are held by dentries, which
are reclaimable, and they will be reclaimed under the memory pressure.

In particular, this breaks MemAvailable calculation, as it doesn't take
unreclaimable slabs into account.  This leads to a silly situation, when
a machine is almost idle, has no memory pressure and therefore has a big
dentry cache.  And the resulting MemAvailable is too low to start a new
workload.

To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is
used to track the amount of memory, consumed by external names.  The
counter is increased in the dentry allocation path, if an external name
structure is allocated; and it's decreased in the dentry freeing path.

To reproduce the problem I've used the following Python script:

  import os

  for iter in range (0, 10000000):
      try:
          name = ("/some_long_name_%d" % iter) + "_" * 220
          os.stat(name)
      except Exception:
          pass

Without this patch:
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7811688 kB
  $ python indirect.py
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    2753052 kB

With the patch:
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7809516 kB
  $ python indirect.py
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7749144 kB

[guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB]
  Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com
[guro@fb.com: fix indirectly reclaimable memory accounting]
  Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com
Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1782c9b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功