提交 · e7849e33cf5d785568b181e3c15236e32c7dfdb2 · openeuler / Kernel

08 4月, 2021 4 次提交

namei: make sure nd->depth is always valid · 7962c7d1

由 Al Viro 提交于 4月 03, 2021

Zero it in set_nameidata() rather than in path_init().  That way
it always matches the number of valid nd->stack[] entries.
Since terminate_walk() does zero it (after having emptied the
stack), we don't need to reinitialize it in subsequent path_init().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7962c7d1

teach set_nameidata() to handle setting the root as well · 06422964

由 Al Viro 提交于 4月 01, 2021

That way we don't need the callers to mess with manually setting any fields
of nameidata instances.  Old set_nameidata() gets renamed (__set_nameidata()),
new becomes an inlined helper that takes a struct path pointer and deals
with setting nd->root and putting ND_ROOT_PRESET in nd->state when new
argument is non-NULL.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

06422964

take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space · bcba1e7d

由 Al Viro 提交于 4月 01, 2021

Separate field in nameidata (nd->state) holding the flags that
should be internal-only - that way we both get some spare bits
in LOOKUP_... and get simpler rules for nd->root lifetime rules,
since we can set the replacement of LOOKUP_ROOT (ND_ROOT_PRESET)
at the same time we set nd->root.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bcba1e7d

switch file_open_root() to struct path · ffb37ca3

由 Al Viro 提交于 4月 01, 2021

... and provide file_open_root_mnt(), using the root of given mount.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ffb37ca3

07 4月, 2021 2 次提交

LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late · 4f0ed93f

由 Al Viro 提交于 4月 06, 2021

That (and traversals in case of umount .) should be done before
complete_walk().  Either a braino or mismerge damage on queue
reorders - either way, I should've spotted that much earlier.
Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk>
X-Paperbag: Brown
Fixes: 161aff1d "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4f0ed93f

Make sure nd->path.mnt and nd->path.dentry are always valid pointers · 7d01ef75

由 Al Viro 提交于 4月 06, 2021

Initialize them in set_nameidata() and make sure that terminate_walk() clears them
once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
in non-RCU one).  Currently we have "path_init() always initializes them and nobody
accesses them outside of path_init()/terminate_walk() segments", which is asking
for trouble.

With that change we would have nd->path.{mnt,dentry}
	1) always valid - NULL or pointing to currently allocated objects.
	2) non-NULL while we are successfully walking
	3) NULL when we are not walking at all
	4) contributing to refcounts whenever non-NULL outside of RCU mode.

Fixes: 6c6ec2b0 ("fs: add support for LOOKUP_CACHED")
Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
Tested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7d01ef75

23 3月, 2021 3 次提交

namei: fix kernel-doc for struct renamedata and more · 2111c3c0

由 Randy Dunlap 提交于 2月 15, 2021

Fix kernel-doc warnings in namei.c:

../fs/namei.c:1149: warning: Excess function parameter 'dir_mode' description in 'may_create_in_sticky'
../fs/namei.c:1149: warning: Excess function parameter 'dir_uid' description in 'may_create_in_sticky'
../fs/namei.c:3396: warning: Function parameter or member 'open_flag' not described in 'vfs_tmpfile'
../fs/namei.c:3396: warning: Excess function parameter 'open_flags' description in 'vfs_tmpfile'
../fs/namei.c:4460: warning: Function parameter or member 'rd' not described in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'old_mnt_userns' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'old_dir' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'old_dentry' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'new_mnt_userns' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'new_dir' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'new_dentry' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'delegated_inode' description in 'vfs_rename'
../fs/namei.c:4460: warning: Excess function parameter 'flags' description in 'vfs_rename'

Link: https://lore.kernel.org/r/20210216042929.8931-3-rdunlap@infradead.org
Fixes: 9fe61450 ("namei: introduce struct renamedata")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

2111c3c0

fs: introduce fsuidgid_has_mapping() helper · 8e538913

由 Christian Brauner 提交于 3月 20, 2021

Don't open-code the checks and instead move them into a clean little
helper we can call. This also reduces the risk that if we ever change
something we forget to change all locations.

Link: https://lore.kernel.org/r/20210320122623.599086-4-christian.brauner@ubuntu.comInspired-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

8e538913

fs: document and rename fsid helpers · a65e58e7

由 Christian Brauner 提交于 3月 20, 2021

Vivek pointed out that the fs{g,u}id_into_mnt() naming scheme can be
misleading as it could be understood as implying they do the exact same
thing as i_{g,u}id_into_mnt(). The original motivation for this naming
scheme was to signal to callers that the helpers will always take care
to map the k{g,u}id such that the ownership is expressed in terms of the
mnt_users.
Get rid of the confusion by renaming those helpers to something more
sensible. Al suggested mapped_fs{g,u}id() which seems a really good fit.
Usually filesystems don't need to bother with these helpers directly
only in some cases where they allocate objects that carry {g,u}ids which
are either filesystem specific (e.g. xfs quota objects) or don't have a
clean set of helpers as inodes have.

Link: https://lore.kernel.org/r/20210320122623.599086-3-christian.brauner@ubuntu.comInspired-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

a65e58e7

21 2月, 2021 1 次提交

fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* · eacd9aa8

由 Al Viro 提交于 2月 15, 2021

After switching to non-RCU mode, we want nd->depth to match the number
of entries in nd->stack[] that need eventual path_put().
legitimize_links() takes care of that on failures; unfortunately,
failure exits added for LOOKUP_CACHED do not.

We could add the logics for that into those failure exits, both in
try_to_unlazy() and in try_to_unlazy_next(), but since both checks
are immediately followed by legitimize_links() and there's no calls
of legitimize_links() other than those two...  It's easier to
move the check (and required handling of nd->depth on failure) into
legitimize_links() itself.

[caught by Jens: ... and since we are zeroing ->depth here, we need
to do drop_links() first]

Fixes: 6c6ec2b0 "fs: add support for LOOKUP_CACHED"
Tested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eacd9aa8

24 1月, 2021 9 次提交

ima: handle idmapped mounts · a2d2329e

由 Christian Brauner 提交于 1月 21, 2021

IMA does sometimes access the inode's i_uid and compares it against the
rules' fowner. Enable IMA to handle idmapped mounts by passing down the
mount's user namespace. We simply make use of the helpers we introduced
before. If the initial user namespace is passed nothing changes so
non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

a2d2329e

fs: make helpers idmap mount aware · 549c7297

由 Christian Brauner 提交于 1月 21, 2021

Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

549c7297

open: handle idmapped mounts in do_truncate() · 643fe55a

由 Christian Brauner 提交于 1月 21, 2021

When truncating files the vfs will verify that the caller is privileged
over the inode. Extend it to handle idmapped mounts. If the inode is
accessed through an idmapped mount it is mapped according to the mount's
user namespace. Afterwards the permissions checks are identical to
non-idmapped mounts. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-16-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

643fe55a

namei: prepare for idmapped mounts · 6521f891

由 Christian Brauner 提交于 1月 21, 2021

The various vfs_*() helpers are called by filesystems or by the vfs
itself to perform core operations such as create, link, mkdir, mknod, rename,
rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the
inode is accessed through an idmapped mount map it into the
mount's user namespace and pass it down. Afterwards the checks and
operations are identical to non-idmapped mounts. If the initial user
namespace is passed nothing changes so non-idmapped mounts will see
identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

6521f891

namei: introduce struct renamedata · 9fe61450

由 Christian Brauner 提交于 1月 21, 2021

In order to handle idmapped mounts we will extend the vfs rename helper
to take two new arguments in follow up patches. Since this operations
already takes a bunch of arguments add a simple struct renamedata and
make the current helper use it before we extend it.

Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

9fe61450

namei: handle idmapped mounts in may_*() helpers · ba73d987

由 Christian Brauner 提交于 1月 21, 2021

The may_follow_link(), may_linkat(), may_lookup(), may_open(),
may_o_create(), may_create_in_sticky(), may_delete(), and may_create()
helpers determine whether the caller is privileged enough to perform the
associated operations. Let them handle idmapped mounts by mapping the
inode or fsids according to the mount's user namespace. Afterwards the
checks are identical to non-idmapped inodes. The patch takes care to
retrieve the mount's user namespace right before performing permission
checks and passing it down into the fileystem so the user namespace
can't change in between by someone idmapping a mount that is currently
not idmapped. If the initial user namespace is passed nothing changes so
non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-13-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

ba73d987

inode: make init and permission helpers idmapped mount aware · 21cb47be

由 Christian Brauner 提交于 1月 21, 2021

The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

21cb47be

namei: make permission helpers idmapped mount aware · 47291baa

由 Christian Brauner 提交于 1月 21, 2021

The two helpers inode_permission() and generic_permission() are used by
the vfs to perform basic permission checking by verifying that the
caller is privileged over an inode. In order to handle idmapped mounts
we extend the two helpers with an additional user namespace argument.
On idmapped mounts the two helpers will make sure to map the inode
according to the mount's user namespace and then peform identical
permission checks to inode_permission() and generic_permission(). If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

47291baa

capability: handle idmapped mounts · 0558c1bf

由 Christian Brauner 提交于 1月 21, 2021

In order to determine whether a caller holds privilege over a given
inode the capability framework exposes the two helpers
privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
verifies that the inode has a mapping in the caller's user namespace and
the latter additionally verifies that the caller has the requested
capability in their current user namespace.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped inodes. If the initial user namespace is passed all
operations are a nop so non-idmapped mounts will not see a change in
behavior.

Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

0558c1bf

05 1月, 2021 3 次提交

fs: add support for LOOKUP_CACHED · 6c6ec2b0

由 Jens Axboe 提交于 12月 17, 2020

io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU based lookups, which we know will not block. If
we can do a cached path resolution of the filename, then we don't have
to always punt lookups for a worker.

During path resolution, we always do LOOKUP_RCU first. If that fails and
we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6c6ec2b0

saner calling conventions for unlazy_child() · ae66db45

由 Al Viro 提交于 1月 04, 2021

same as for the previous commit - instead of 0/-ECHILD make
it return true/false, rename to try_to_unlazy_child().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ae66db45

fs: make unlazy_walk() error handling consistent · e36cffed

由 Jens Axboe 提交于 12月 17, 2020

Most callers check for non-zero return, and assume it's -ECHILD (which
it always will be). One caller uses the actual error return. Clean this
up and make it fully consistent, by having unlazy_walk() return a bool
instead. Rename it to try_to_unlazy() and return true on success, and
failure on error. That's easier to read.

No functional changes in this patch.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cffed

04 1月, 2021 2 次提交

fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast() · 26ddb45e

由 Steven Rostedt (VMware) 提交于 12月 09, 2020

Running my yearly branch profiling code, it detected a 100% wrong branch
condition in name.c for lookup_fast(). The code in question has:

		status = d_revalidate(dentry, nd->flags);
		if (likely(status > 0))
			return dentry;
		if (unlazy_child(nd, dentry, seq))
			return ERR_PTR(-ECHILD);
		if (unlikely(status == -ECHILD))
			/* we'd been told to redo it in non-rcu mode */
			status = d_revalidate(dentry, nd->flags);

If the status of the d_revalidate() is greater than zero, then the function
finishes. Otherwise, if it is an "unlazy_child" it returns with -ECHILD.
After the above two checks, the status is compared to -ECHILD, as that is
what is returned if the original d_revalidate() needed to be done in a
non-rcu mode.

Especially this path is called in a condition of:

	if (nd->flags & LOOKUP_RCU) {

And most of the d_revalidate() functions have:

	if (flags & LOOKUP_RCU)
		return -ECHILD;

It appears that that is the only case that this if statement is triggered
on two of my machines, running in production.

As it is dependent on what filesystem mix is configured in the running
kernel, simply remove the unlikely() from the if statement.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

26ddb45e

A
do_tmpfile(): don't mess with finish_open() · 1e8f44f1
由 Al Viro 提交于 3月 11, 2020
```
use vfs_open() instead
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1e8f44f1

11 12月, 2020 1 次提交

Make sure that make_create_in_sticky() never sees uninitialized value of dir_mode · 1a97d899

由 Al Viro 提交于 9月 19, 2020

make sure nd->dir_mode is always initialized after success exit from
link_path_walk(); in case of empty path it did not happen.
Reported-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
Tested-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a97d899

10 12月, 2020 1 次提交

fs: make do_renameat2() take struct filename · e886663c

由 Jens Axboe 提交于 9月 26, 2020

Pass in the struct filename pointers instead of the user string, and
update the three callers to do the same.

This behaves like do_unlinkat(), which also takes a filename struct and
puts it when it is done. Converting callers is then trivial.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e886663c

25 9月, 2020 1 次提交

fs: remove the unused SB_I_MULTIROOT flag · 402dd2cf

由 Christoph Hellwig 提交于 9月 24, 2020

The last user of SB_I_MULTIROOT is disappeared with commit f2aedb71
("NFS: Add fs_context support.")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

402dd2cf

28 8月, 2020 1 次提交

Add a "nosymfollow" mount option. · dab741e0

由 Mattias Nissler 提交于 8月 27, 2020

For mounts that have the new "nosymfollow" option, don't follow symlinks
when resolving paths. The new option is similar in spirit to the
existing "nodev", "noexec", and "nosuid" options, as well as to the
LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
variants have been supporting the "nosymfollow" mount option for a long
time with equivalent implementations.

Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.

Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.

More information on the history and motivation for this patch can be
found here:

https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversalSigned-off-by: NMattias Nissler <mnissler@chromium.org>
Signed-off-by: NRoss Zwisler <zwisler@google.com>
Reviewed-by: NAleksa Sarai <cyphar@cyphar.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dab741e0

15 8月, 2020 1 次提交

exec: restore EACCES of S_ISDIR execve() · fc4177be

由 Kees Cook 提交于 8月 14, 2020

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
Reported-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NGreg Kroah-Hartman <gregkh@android.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc4177be

13 8月, 2020 3 次提交

exec: move path_noexec() check earlier · 0fd338b2

由 Kees Cook 提交于 8月 11, 2020

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s.  Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0fd338b2

exec: move S_ISREG() check earlier · 633fb6ac

由 Kees Cook 提交于 8月 11, 2020

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late.  This was
fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]).  Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
		    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

633fb6ac

fix breakage in do_rmdir() · 24fb33d4

由 Al Viro 提交于 8月 12, 2020

syzbot reported and bisected a use-after-free due to the recent init
cleanups.

The putname() should happen only after we'd *not* branched to retry,
same as it's done in do_unlinkat().

Reported-by: syzbot+bbeb1c88016c7db4aa24@syzkaller.appspotmail.com
Fixes: e24ab0ef "fs: push the getname from do_rmdir into the callers"
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24fb33d4

31 7月, 2020 5 次提交

init: add an init_mknod helper · 5fee64fc

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to mknod with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_mknod.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5fee64fc

init: add an init_mkdir helper · 83ff98c3

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to mkdir with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_mkdir.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

83ff98c3

init: add an init_symlink helper · cd3acb6a

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to symlink with a kernel space file name and switch
the early init code over to it. Remove the now unused ksys_symlink.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cd3acb6a

init: add an init_link helper · 812931d6

由 Christoph Hellwig 提交于 7月 22, 2020

Add a simple helper to link with a kernel space file name and switch
the early init code over to it.  Remove the now unused ksys_link.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

812931d6

fs: push the getname from do_rmdir into the callers · e24ab0ef

由 Christoph Hellwig 提交于 7月 21, 2020

This mirrors do_unlinkat and will make life a little easier for
the init code to reuse the whole function with a kernel filename.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e24ab0ef

09 6月, 2020 2 次提交

vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK · 63d72b93

由 Linus Torvalds 提交于 6月 07, 2020

posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
the permission logic internally must not check that bit (it's only for
upper layers to decide whether they can block to do IO to look up the
acl information or not).

But the way the code was written, it _looked_ like it cared, since the
function explicitly did not mask that bit off.

But it has exactly two callers: one for when that bit is set, which
first clears the bit before calling posix_acl_permission(), and the
other call site when that bit was clear.

So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
used for the actual permission test, and that currently is pointlessly
cleared by the callers when the function itself should just not care.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

63d72b93

vfs: do not do group lookup when not necessary · 5fc475b7

由 Linus Torvalds 提交于 6月 05, 2020

Rasmus Villemoes points out that the 'in_group_p()' tests can be a
noticeable expense, and often completely unnecessary.  A common
situation is that the 'group' bits are the same as the 'other' bits
wrt the permissions we want to test.

So rewrite 'acl_permission_check()' to not bother checking for group
ownership when the permission check doesn't care.

For example, if we're asking for read permissions, and both 'group' and
'other' allow reading, there's really no reason to check if we're part
of the group or not: either way, we'll allow it.

Rasmus says:
 "On a bog-standard Ubuntu 20.04 install, a workload consisting of
  compiling lots of userspace programs (i.e., calling lots of
  short-lived programs that all need to get their shared libs mapped in,
  and the compilers poking around looking for system headers - lots of
  /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
  0.1% according to perf top.

  System-installed files are almost always 0755 (directories and
  binaries) or 0644, so in most cases, we can avoid the binary search
  and the cost of pulling the cred->groups array and in_group_p() .text
  into the cpu cache"
Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fc475b7

14 5月, 2020 1 次提交

vfs: allow unprivileged whiteout creation · a3c751a5

由 Miklos Szeredi 提交于 5月 14, 2020

Whiteouts, unlike real device node should not require privileges to create.

The general concern with device nodes is that opening them can have side
effects.  The kernel already avoids zero major (see
Documentation/admin-guide/devices.txt).  To be on the safe side the patch
explicitly forbids registering a char device with 0/0 number (see
cdev_add()).

This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
i.e. it won't have any side effect.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a3c751a5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功