提交 · b29b234801407a3e46ef64dafbe720116307d9f4 · openeuler / Kernel

18 11月, 2022 1 次提交

fs: Add missing umask strip in vfs_tmpfile · b29b2348

由 Yang Xu 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 60a8f0e62aeb1a50383ab228f2281047bceadd9a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=60a8f0e62aeb1a50383ab228f2281047bceadd9a

--------------------------------

commit ac6800e2 upstream.

All creation paths except for O_TMPFILE handle umask in the vfs directly
if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
does then umask handling is deferred until posix_acl_create().
Because, O_TMPFILE misses umask handling in the vfs it will not honor
umask settings. Fix this by adding the missing umask handling.

Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fujitsu.com
Fixes: 60545d0d ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-and-Tested-by: NJeff Layton <jlayton@kernel.org>
Acked-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NYang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

b29b2348

28 6月, 2022 1 次提交

fs, mm: fix race in unlinking swapfile · d1fb1e79

由 Hugh Dickins 提交于 6月 28, 2022

mainline inclusion
from mainline-v5.15-rc1
commit 51cc3a66
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CANU
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/namei.c?id=51cc3a6620a6ca934d468bda345678768493f5d8

--------------------------------

We had a recurring situation in which admin procedures setting up
swapfiles would race with test preparation clearing away swapfiles; and
just occasionally that got stuck on a swapfile "(deleted)" which could
never be swapped off.  That is not supposed to be possible.

2.6.28 commit f9454548 ("don't unlink an active swapfile") admitted
that it was leaving a race window open: now close it.

may_delete() makes the IS_SWAPFILE check (amongst many others) before
inode_lock has been taken on target: now repeat just that simple check in
vfs_unlink() and vfs_rename(), after taking inode_lock.

Which goes most of the way to fixing the race, but swapon() must also
check after it acquires inode_lock, that the file just opened has not
already been unlinked.

Link: https://lkml.kernel.org/r/e17b91ad-a578-9a15-5e3-4989e0f999b5@google.com
Fixes: f9454548 ("don't unlink an active swapfile")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d1fb1e79

10 5月, 2022 1 次提交

fsnotify: invalidate dcache before IN_DELETE event · 05f5779c

由 Amir Goldstein 提交于 5月 10, 2022

stable inclusion
from stable-v5.10.96
commit 0b4e82403c84c88fb42972687774ae3a699d047d
bugzilla: https://gitee.com/openeuler/kernel/issues/I55NWB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0b4e82403c84c88fb42972687774ae3a699d047d

--------------------------------

commit a37d9a17 upstream.

Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.

Commit 49246466 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

To fix the regression, create a new hook fsnotify_delete() that takes
the unlinked inode as an argument and use a helper d_delete_notify() to
pin the inode, so we can pass it to fsnotify_delete() after d_delete().

Backporting hint: this regression is from v5.3. Although patch will
apply with only trivial conflicts to v5.4 and v5.10, it won't build,
because fsnotify_delete() implementation is different in each of those
versions (see fsnotify_link()).

A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
filesystem that do not need to call d_delete().

Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.comReported-by: NIvan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

05f5779c

19 10月, 2021 2 次提交

take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space · 42d3b3b2

由 Al Viro 提交于 10月 19, 2021

mainline inclusion
from mainline-5.14-rc1
commit bcba1e7d
category: bugfix
bugzilla: 181657 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcba1e7d0d520adba895d9e0800a056f734b0a6a

---------------------------

Separate field in nameidata (nd->state) holding the flags that
should be internal-only - that way we both get some spare bits
in LOOKUP_... and get simpler rules for nd->root lifetime rules,
since we can set the replacement of LOOKUP_ROOT (ND_ROOT_PRESET)
at the same time we set nd->root.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/namei.c
	[ Bugfix 7d01ef75("Make sure nd->path.mnt and nd->path.dentry
	  are always valid pointers") is not applid, the problem to be
	  fixed not exists.
	  Feature 6c6ec2b0("fs: add support for LOOKUP_CACHED") is
	  not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

42d3b3b2

switch file_open_root() to struct path · d16f801d

由 Al Viro 提交于 10月 19, 2021

mainline inclusion
from mainline-5.14-rc1
commit ffb37ca3
category: bugfix
bugzilla: 181657 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffb37ca3bd16ce6ea2df2f87fde9a31e94ebb54b

---------------------------

... and provide file_open_root_mnt(), using the root of given mount.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	Documentation/filesystems/porting.rst
	[ Non-bugfix 14e43bf4("vfs: don't unnecessarily clone
	  write access for writable fd") is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
[Roberto Sassu: Adjust file_open_root() called by load_digest_list()]
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d16f801d

26 4月, 2021 1 次提交

LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late · c16e60f1

由 Al Viro 提交于 4月 21, 2021

stable inclusion
from stable-5.10.30
commit 43908139368e03d1ceda49ef2296e396605dfefd
bugzilla: 51791

--------------------------------

commit 4f0ed93f upstream.

That (and traversals in case of umount .) should be done before
complete_walk().  Either a braino or mismerge damage on queue
reorders - either way, I should've spotted that much earlier.
Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk>
X-Paperbag: Brown
Fixes: 161aff1d "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c16e60f1

09 4月, 2021 1 次提交

fs: make unlazy_walk() error handling consistent · 6be5093d

由 Jens Axboe 提交于 3月 18, 2021

stable inclusion
from stable-5.10.21
commit 01fd84a436b501243a8031d19041efcd7fe80ba9
bugzilla: 50609

--------------------------------

[ Upstream commit e36cffed ]

Most callers check for non-zero return, and assume it's -ECHILD (which
it always will be). One caller uses the actual error return. Clean this
up and make it fully consistent, by having unlazy_walk() return a bool
instead. Rename it to try_to_unlazy() and return true on success, and
failure on error. That's easier to read.

No functional changes in this patch.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6be5093d

25 9月, 2020 1 次提交

fs: remove the unused SB_I_MULTIROOT flag · 402dd2cf

由 Christoph Hellwig 提交于 9月 24, 2020

The last user of SB_I_MULTIROOT is disappeared with commit f2aedb71
("NFS: Add fs_context support.")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

402dd2cf

28 8月, 2020 1 次提交

Add a "nosymfollow" mount option. · dab741e0

由 Mattias Nissler 提交于 8月 27, 2020

For mounts that have the new "nosymfollow" option, don't follow symlinks
when resolving paths. The new option is similar in spirit to the
existing "nodev", "noexec", and "nosuid" options, as well as to the
LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
variants have been supporting the "nosymfollow" mount option for a long
time with equivalent implementations.

Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.

Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.

More information on the history and motivation for this patch can be
found here:

https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversalSigned-off-by: NMattias Nissler <mnissler@chromium.org>
Signed-off-by: NRoss Zwisler <zwisler@google.com>
Reviewed-by: NAleksa Sarai <cyphar@cyphar.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dab741e0

15 8月, 2020 1 次提交

exec: restore EACCES of S_ISDIR execve() · fc4177be

由 Kees Cook 提交于 8月 14, 2020

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
Reported-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NGreg Kroah-Hartman <gregkh@android.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc4177be

13 8月, 2020 3 次提交

exec: move path_noexec() check earlier · 0fd338b2

由 Kees Cook 提交于 8月 11, 2020

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s.  Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0fd338b2

exec: move S_ISREG() check earlier · 633fb6ac

由 Kees Cook 提交于 8月 11, 2020

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late.  This was
fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]).  Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
		    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

633fb6ac

fix breakage in do_rmdir() · 24fb33d4

由 Al Viro 提交于 8月 12, 2020

syzbot reported and bisected a use-after-free due to the recent init
cleanups.

The putname() should happen only after we'd *not* branched to retry,
same as it's done in do_unlinkat().

Reported-by: syzbot+bbeb1c88016c7db4aa24@syzkaller.appspotmail.com
Fixes: e24ab0ef "fs: push the getname from do_rmdir into the callers"
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24fb33d4

31 7月, 2020 5 次提交

init: add an init_mknod helper · 5fee64fc

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to mknod with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_mknod.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5fee64fc

init: add an init_mkdir helper · 83ff98c3

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to mkdir with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_mkdir.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

83ff98c3

init: add an init_symlink helper · cd3acb6a

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to symlink with a kernel space file name and switch
the early init code over to it. Remove the now unused ksys_symlink.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cd3acb6a

init: add an init_link helper · 812931d6

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to link with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_link.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

812931d6

fs: push the getname from do_rmdir into the callers · e24ab0ef

由 Christoph Hellwig 提交于 7月 21, 2020

This mirrors do_unlinkat and will make life a little easier for
the init code to reuse the whole function with a kernel filename.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e24ab0ef

09 6月, 2020 2 次提交

vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK · 63d72b93

由 Linus Torvalds 提交于 6月 07, 2020

posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
the permission logic internally must not check that bit (it's only for
upper layers to decide whether they can block to do IO to look up the
acl information or not).

But the way the code was written, it _looked_ like it cared, since the
function explicitly did not mask that bit off.

But it has exactly two callers: one for when that bit is set, which
first clears the bit before calling posix_acl_permission(), and the
other call site when that bit was clear.

So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
used for the actual permission test, and that currently is pointlessly
cleared by the callers when the function itself should just not care.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

63d72b93

vfs: do not do group lookup when not necessary · 5fc475b7

由 Linus Torvalds 提交于 6月 05, 2020

Rasmus Villemoes points out that the 'in_group_p()' tests can be a
noticeable expense, and often completely unnecessary.  A common
situation is that the 'group' bits are the same as the 'other' bits
wrt the permissions we want to test.

So rewrite 'acl_permission_check()' to not bother checking for group
ownership when the permission check doesn't care.

For example, if we're asking for read permissions, and both 'group' and
'other' allow reading, there's really no reason to check if we're part
of the group or not: either way, we'll allow it.

Rasmus says:
 "On a bog-standard Ubuntu 20.04 install, a workload consisting of
  compiling lots of userspace programs (i.e., calling lots of
  short-lived programs that all need to get their shared libs mapped in,
  and the compilers poking around looking for system headers - lots of
  /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
  0.1% according to perf top.

  System-installed files are almost always 0755 (directories and
  binaries) or 0644, so in most cases, we can avoid the binary search
  and the cost of pulling the cred->groups array and in_group_p() .text
  into the cpu cache"
Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fc475b7

14 5月, 2020 1 次提交

vfs: allow unprivileged whiteout creation · a3c751a5

由 Miklos Szeredi 提交于 5月 14, 2020

Whiteouts, unlike real device node should not require privileges to create.

The general concern with device nodes is that opening them can have side
effects.  The kernel already avoids zero major (see
Documentation/admin-guide/devices.txt).  To be on the safe side the patch
explicitly forbids registering a char device with 0/0 number (see
cdev_add()).

This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
i.e. it won't have any side effect.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a3c751a5

06 4月, 2020 1 次提交

fix a braino in legitimize_path() · 5bd73286

由 Al Viro 提交于 4月 05, 2020

brown paperbag time... wrong order of arguments ended up confusing
the values to check dentry and mount_lock seqcounts against.
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Fixes: 2aa38470 ("non-RCU analogue of the previous commit")
Tested-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5bd73286

02 4月, 2020 18 次提交

lookup_open(): don't bother with fallbacks to lookup+create · 99a4a90c

由 Al Viro 提交于 3月 12, 2020

We fall back to lookup+create (instead of atomic_open) in several cases:
	1) we don't have write access to filesystem and O_TRUNC is
present in the flags.  It's not something we want ->atomic_open() to
see - it just might go ahead and truncate the file.  However, we can
pass it the flags sans O_TRUNC - eventually do_open() will call
handle_truncate() anyway.
	2) we have O_CREAT | O_EXCL and we can't write to parent.
That's going to be an error, of course, but we want to know _which_
error should that be - might be EEXIST (if file exists), might be
EACCES or EROFS.  Simply stripping O_CREAT (and checking if we see
ENOENT) would suffice, if not for O_EXCL.  However, we used to have
->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL
on existing file and just stripping O_CREAT would've disarmed
those checks.  With nothing downstream to catch the problem -
FMODE_OPENED used to be "don't bother with EEXIST checks,
->atomic_open() has done those".  Now EEXIST checks downstream
are skipped only if FMODE_CREATED is set - FMODE_OPENED alone
is not enough.  That has eliminated the need to fall back onto
lookup+create path in this case.
	3) O_WRONLY or O_RDWR when we have no write access to
filesystem, with nothing else objectionable.  Fallback is
(and had always been) pointless.

IOW, we don't really need that fallback; all we need in such
cases is to trim O_TRUNC and O_CREAT properly.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99a4a90c

atomic_open(): no need to pass struct open_flags anymore · d489cf9a

由 Al Viro 提交于 3月 11, 2020

argument had been unused since 1643b43f (lookup_open(): lift the
"fallback to !O_CREAT" logics from atomic_open()) back in 2016
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d489cf9a

A
open_last_lookups(): move complete_walk() into do_open() · ff326a32
由 Al Viro 提交于 3月 10, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ff326a32

open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() · b94e0b32

由 Al Viro 提交于 3月 10, 2020

Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one
of the ways out of open_last_lookups().  There are 4 cases:
	1) the last component is . or ..; check is not done.
	2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open();
check is not done.
	3) symlink to be traversed is found; check is not done (nor
should it be)
	4) everything else: check done (before complete_walk(), even).

In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's
	open("/tmp/.", O_CREAT|O_EXCL, 0600)
Note that in the same conditions
	open("/tmp", O_CREAT|O_EXCL, 0600)
would have yielded EEXIST.  Either error is allowed, switching to -EEXIST
in these cases would've been more consistent.

Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the
object hadn't existed prior to the call.  The check should not be done in
such a case.  The rest is problematic, though - we have
	FMODE_OPENED set (i.e. it went through ->atomic_open() and got
successfully opened there)
	FMODE_CREATED is *NOT* set
	O_CREAT and O_EXCL are both set.
Any such case is a bug - either we failed to set FMODE_CREATED when we
had, in fact, created an object (no such instances in the tree) or
we have opened a pre-existing file despite having had both O_CREAT and
O_EXCL passed.  One of those was, in fact caught (and fixed) while
sorting out this mess (gfs2 on cold dcache).  And in such situations
we should fail with EEXIST.

Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing
in handle_dots() to set it, for (4) we'd explicitly checked that.

And (1), (2) and (4) are exactly the cases when we leave the loop in
the caller, with do_open() called immediately after that loop.  IOW, we
can move the check over there, and make it

	If we have O_CREAT|O_EXCL and after successful pathname resolution
FMODE_CREATED is *not* set, we must have run into a preexisting file and
should fail with EEXIST.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b94e0b32

A
open_last_lookups(): don't abuse complete_walk() when all we want is unlazy · 72287417
由 Al Viro 提交于 3月 10, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
72287417
A
open_last_lookups(): consolidate fsnotify_create() calls · f7bb959d
由 Al Viro 提交于 3月 05, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f7bb959d

take post-lookup part of do_last() out of loop · c5971b8c

由 Al Viro 提交于 3月 05, 2020

now we can have open_last_lookups() directly from the loop in
path_openat() - the rest of do_last() never returns a symlink
to follow, so we can bloody well leave the loop first.

Rename the rest of that thing from do_last() to do_open() and
make it return an int.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c5971b8c

A
link_path_walk(): sample parent's i_uid and i_mode for the last component · 0f705953
由 Al Viro 提交于 3月 05, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0f705953

__nd_alloc_stack(): make it return bool · 60ef60c7

由 Al Viro 提交于 3月 03, 2020

... and adjust the caller (reserve_stack()).  Rename to nd_alloc_stack(),
while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

60ef60c7

reserve_stack(): switch to __nd_alloc_stack() · 4542576b

由 Al Viro 提交于 3月 03, 2020

expand the call of nd_alloc_stack() into it (and don't
recheck the depth on the second call)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4542576b

A
pick_link(): take reserving space on stack into a new helper · 49055906
由 Al Viro 提交于 3月 03, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
49055906

pick_link(): more straightforward handling of allocation failures · aef9404d

由 Al Viro 提交于 3月 02, 2020

pick_link() needs to push onto stack; we start with using two-element
array embedded into struct nameidata and the first time we need
more than that we switch to separately allocated array.

Allocation can fail, of course, and handling of that would be simple
enough - we need to drop 'link' and bugger off.  However, the things
get more complicated in RCU mode.  There we must do GFP_ATOMIC
allocation.  If that fails, we try to switch to non-RCU mode and
repeat the allocation.

To switch to non-RCU mode we need to grab references to 'link' and
to everything in nameidata.  The latter done by unlazy_walk();
the former - legitimize_path().  'link' must go first - after
unlazy_walk() we are out of RCU-critical period and it's too
late to call legitimize_path() since the references in link->mnt
and link->dentry might be pointing to freed and reused memory.

So we do legitimize_path(), then unlazy_walk().  And that's where
it gets too subtle: what to do if the former fails?  We MUST
do path_put(link) to avoid leaks.  And we can't do that under
rcu_read_lock().  Solution in mainline was to empty then nameidata
manually, drop out of RCU mode and then do put_path().

In effect, we open-code the things eventual terminate_walk()
would've done on error in RCU mode.  That looks badly out of place
and confusing.  We could add a comment along the lines of the
explanation above, but... there's a simpler solution.  Call
unlazy_walk() even if legitimaze_path() fails.  It will take
us out of RCU mode, so we'll be able to do path_put(link).

Yes, it will do unnecessary work - attempt to grab references
on the stuff in nameidata, only to have them dropped as soon
as we return the error to upper layer and get terminate_walk()
called there.  So what?  We are thoroughly off the fast path
by that point - we had GFP_ATOMIC allocation fail, we had
->d_seq or mount_lock mismatch and we are about to try walking
the same path from scratch in non-RCU mode.  Which will need
to do the same allocation, this time with GFP_KERNEL, so it will
be able to apply memory pressure for blocking stuff.

Compared to that the cost of several lockref_get_not_dead()
is noise.  And the logics become much easier to understand
that way.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aef9404d

A
fold path_to_nameidata() into its only remaining caller · c99687a0
由 Al Viro 提交于 3月 03, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c99687a0

pick_link(): pass it struct path already with normal refcounting rules · 84f0cd9e

由 Al Viro 提交于 3月 03, 2020

step_into() tries to avoid grabbing and dropping mount references
on the steps that do not involve crossing mountpoints (which is
obviously the majority of cases).  So it uses a local struct path
with unusual refcounting rules - path.mnt is pinned if and only if
it's not equal to nd->path.mnt.

We used to have similar beasts all over the place and we had quite
a few bugs crop up in their handling - it's easy to get confused
when changing e.g. cleanup on failure exits (or adding a new check,
etc.)

Now that's mostly gone - the step_into() instance (which is what
we need them for) is the only one left.  It is exposed to mount
traversal and it's (shortly) seen by pick_link().  Since pick_link()
needs to store it in link stack, where the normal rules apply,
it has to make sure that mount is pinned regardless of nd->path.mnt
value.  That's done on all calls of pick_link() and very early
in those.  Let's do that in the caller (step_into()) instead -
that way the fewer places need to be aware of such struct path
instances.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

84f0cd9e

fs/namei.c: kill follow_mount() · 19f6028a

由 Al Viro 提交于 2月 26, 2020

The only remaining caller (path_pts()) should be using follow_down()
anyway.  And clean path_pts() a bit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

19f6028a

non-RCU analogue of the previous commit · 2aa38470

由 Al Viro 提交于 2月 26, 2020

new helper: choose_mountpoint(). Wrapper around choose_mountpoint_rcu(),
similar to lookup_mnt() vs. __lookup_mnt(). follow_dotdot() switched to
it. Now we don't grab mount_lock exclusive anymore; note that the
primitive used non-RCU mount traversals in other direction (lookup_mnt())
doesn't bother with that either - it uses mount_lock seqcount instead.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2aa38470

helper for mount rootwards traversal · 7ef482fa

由 Al Viro 提交于 2月 26, 2020

The loops in follow_dotdot{_rcu()} are doing the same thing:
we have a mount and we want to find out how far up the chain
of mounts do we need to go.

We follow the chain of mount until we find one that is not
directly overmounting the root of another mount.  If such
a mount is found, we want the location it's mounted upon.
If we run out of chain (i.e. get to a mount that is not
mounted on anything else) or run into process' root, we
report failure.

On success, we want (in RCU case) d_seq of resulting location
sampled or (in non-RCU case) references to that location
acquired.

This commit introduces such primitive for RCU case and
switches follow_dotdot_rcu() to it; non-RCU case will be
go in the next commit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ef482fa

follow_dotdot(): be lazy about changing nd->path · 165200d6

由 Al Viro 提交于 2月 28, 2020

Change nd->path only after the loop is done and only in case we hadn't
ended up finding ourselves in root.  Same for NO_XDEV check.

That separates the "check how far back do we need to go through the
mount stack" logics from the rest of .. traversal.

NOTE: path_get/path_put introduced here are temporary.  They will
go away later in the series.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

165200d6

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功