提交 · 867a23eab52847d41a0a6eae41a64d76de7782a8 · openeuler / Kernel

20 8月, 2020 4 次提交

io_uring: kill extra iovec=NULL in import_iovec() · 867a23ea

由 Pavel Begunkov 提交于 8月 20, 2020

If io_import_iovec() returns an error, return iovec is undefined and
must not be used, so don't set it to NULL when failing.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

867a23ea

io_uring: comment on kfree(iovec) checks · f261c168

由 Pavel Begunkov 提交于 8月 20, 2020

kfree() handles NULL pointers well, but io_{read,write}() checks it
because of performance reasons. Leave a comment there for those who are
tempted to patch it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f261c168

io_uring: fix racy req->flags modification · bb175342

由 Pavel Begunkov 提交于 8月 20, 2020

Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
io_cqring_overflow_flush() are racy, because they might be called
asynchronously.

REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
be guaranteed that requests _currently_ marked inflight can't be
overflown, the problem will be solved with removing the flag
altogether.

That's how the patch works, it removes inflight status of a request
in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
list. That's Ok to do, because no opcode specific handling can be done
after io_cqring_fill_event(), the same assumption as with "struct
io_completion" patches.
And it already have a good place for such cleanups, which is
io_clean_op(). A nice side effect of this is removing this inflight
check from the hot path.

note on synchronisation: now __io_cqring_fill_event() may be taking two
spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
because we never do that in reverse order, and CQ-overflow of inflight
requests shouldn't happen often.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb175342

io_uring: use system_unbound_wq for ring exit work · fc666777

由 Jens Axboe 提交于 8月 19, 2020

We currently use system_wq, which is unbounded in terms of number of
workers. This means that if we're exiting tons of rings at the same
time, then we'll briefly spawn tons of event kworkers just for a very
short blocking time as the rings exit.

Use system_unbound_wq instead, which has a sane cap on the concurrency
level.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc666777

19 8月, 2020 1 次提交

io_uring: cleanup io_import_iovec() of pre-mapped request · 8452fd0c

由 Jens Axboe 提交于 8月 18, 2020

io_rw_prep_async() goes through a dance of clearing req->io, calling
the iovec import, then re-setting req->io. Provide an internal helper
that does the right thing without needing state tweaked to get there.

This enables further cleanups in io_read, io_write, and
io_resubmit_prep(), but that's left for another time.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8452fd0c

17 8月, 2020 2 次提交

io_uring: get rid of kiocb_wait_page_queue_init() · 3b2a4439

由 Jens Axboe 提交于 8月 16, 2020

The 5.9 merge moved this function io_uring, which means that we don't
need to retain the generic nature of it. Clean up this part by removing
redundant checks, and just inlining the small remainder in
io_rw_should_retry().

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b2a4439

io_uring: find and cancel head link async work on files exit · b711d4ea

由 Jens Axboe 提交于 8月 16, 2020

Commit f254ac04 ("io_uring: enable lookup of links holding inflight files")
only handled 2 out of the three head link cases we have, we also need to
lookup and cancel work that is blocked in io-wq if that work has a link
that's holding a reference to the files structure.

Put the "cancel head links that hold this request pending" logic into
io_attempt_cancel(), which will to through the motions of finding and
canceling head links that hold the current inflight files stable request
pending.

Cc: stable@vger.kernel.org
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b711d4ea

16 8月, 2020 2 次提交

io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56

由 Jens Axboe 提交于 8月 15, 2020

One case was missed in the short IO retry handling, and that's hitting
-EAGAIN on a blocking attempt read (eg from io-wq context). This is a
problem on sockets that are marked as non-blocking when created, they
don't carry any REQ_F_NOWAIT information to help us terminate them
instead of perpetually retrying.

Fixes: 227c0c96 ("io_uring: internally retry short reads")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f91daf56

io_uring: sanitize double poll handling · d4e7cd36

由 Jens Axboe 提交于 8月 15, 2020

There's a bit of confusion on the matching pairs of poll vs double poll,
depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
poll driven retry.

Add io_poll_get_double() that returns the double poll waitqueue, if any,
and io_poll_get_single() that returns the original poll waitqueue. With
that, remove the argument to io_poll_remove_double().

Finally ensure that wait->private is cleared once the double poll handler
has run, so that remove knows it's already been seen.

Cc: stable@vger.kernel.org # v5.8
Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4e7cd36

15 8月, 2020 2 次提交

fs: autofs: delete repeated words in comments · c734124c

由 Randy Dunlap 提交于 8月 14, 2020

Drop duplicated words {the, at} in comments.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NIan Kent <raven@themaw.net>
Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c734124c

exec: restore EACCES of S_ISDIR execve() · fc4177be

由 Kees Cook 提交于 8月 14, 2020

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
Reported-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NGreg Kroah-Hartman <gregkh@android.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc4177be

14 8月, 2020 3 次提交

SMB3: Fix mkdir when idsfromsid configured on mount · c8c412f9

由 Steve French 提交于 8月 13, 2020

mkdir uses a compounded create operation which was not setting
the security descriptor on create of a directory. Fix so
mkdir now sets the mode and owner info properly when idsfromsid
and modefromsid are configured on the mount.
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.8
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

c8c412f9

io_uring: internally retry short reads · 227c0c96

由 Jens Axboe 提交于 8月 13, 2020

We've had a few application cases of not handling short reads properly,
and it is understandable as short reads aren't really expected if the
application isn't doing non-blocking IO.

Now that we retain the iov_iter over retries, we can implement internal
retry pretty trivially. This ensures that we don't return a short read,
even for buffered reads on page cache conflicts.

Cleanup the deep nesting and hard to read nature of io_read() as well,
it's much more straight forward now to read and understand. Added a
few comments explaining the logic as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

227c0c96

io_uring: retain iov_iter state over io_read/io_write calls · ff6165b2

由 Jens Axboe 提交于 8月 13, 2020

Instead of maintaining (and setting/remembering) iov_iter size and
segment counts, just put the iov_iter in the async part of the IO
structure.

This is mostly a preparation patch for doing appropriate internal retries
for short reads, but it also cleans up the state handling nicely and
simplifies it quite a bit.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff6165b2

13 8月, 2020 25 次提交

io_uring: enable lookup of links holding inflight files · f254ac04

由 Jens Axboe 提交于 8月 12, 2020

When a process exits, we cancel whatever requests it has pending that
are referencing the file table. However, if a link is holding a
reference, then we cannot find it by simply looking at the inflight
list.

Enable checking of the poll and timeout list to find the link, and
cancel it appropriately.

Cc: stable@vger.kernel.org
Reported-by: NJosef <josef.grieb@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f254ac04

mm/gup: remove task_struct pointer for all gup code · 64019a2e

由 Peter Xu 提交于 8月 11, 2020

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64019a2e

exec: move path_noexec() check earlier · 0fd338b2

由 Kees Cook 提交于 8月 11, 2020

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s.  Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0fd338b2

exec: move S_ISREG() check earlier · 633fb6ac

由 Kees Cook 提交于 8月 11, 2020

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late.  This was
fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]).  Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
		    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

633fb6ac

exec: change uselib(2) IS_SREG() failure to EACCES · db19c91c

由 Kees Cook 提交于 8月 11, 2020

Patch series "Relocate execve() sanity checks", v2.

While looking at the code paths for the proposed O_MAYEXEC flag, I saw
some things that looked like they should be fixed up.

  exec: Change uselib(2) IS_SREG() failure to EACCES
	This just regularizes the return code on uselib(2).

  exec: Move S_ISREG() check earlier
	This moves the S_ISREG() check even earlier than it was already.

  exec: Move path_noexec() check earlier
	This adds the path_noexec() check to the same place as the
	S_ISREG() check.

This patch (of 3):

Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so
the behavior matches execve(2), and the seemingly documented value.  The
"not a regular file" failure mode of execve(2) is explicitly
documented[1], but it is not mentioned in uselib(2)[2] which does,
however, say that open(2) and mmap(2) errors may apply.  The documentation
for open(2) does not include a "not a regular file" error[3], but mmap(2)
does[4], and it is EACCES.

[1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS
[2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS
[3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS
[4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORSSigned-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org
Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db19c91c

coredump: add %f for executable filename · f38c85f1

由 Lepton Wu 提交于 8月 11, 2020

The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code. Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document. And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename. We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.
Signed-off-by: NLepton Wu <ytht.net@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f38c85f1

fs/signalfd.c: fix inconsistent return codes for signalfd4 · a089e3fd

由 Helge Deller 提交于 8月 11, 2020

The kernel signalfd4() syscall returns different error codes when called
either in compat or native mode. This behaviour makes correct emulation
in qemu and testing programs like LTP more complicated.

Fix the code to always return -in both modes- EFAULT for unaccessible user
memory, and EINVAL when called with an invalid signal mask.
Signed-off-by: NHelge Deller <deller@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Laurent Vivier <laurent@vivier.eu>
Link: http://lkml.kernel.org/r/20200530100707.GA10159@ls3530.fritz.boxSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a089e3fd

fat: fix fat_ra_init() for data clusters == 0 · a090a5a7

由 OGAWA Hirofumi 提交于 8月 11, 2020

If data clusters == 0, fat_ra_init() calls the ->ent_blocknr() for the
cluster beyond ->max_clusters.

This checks the limit before initialization to suppress the warning.

Reported-by: syzbot+756199124937b31a9b7e@syzkaller.appspotmail.com
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/87mu462sv4.fsf@mail.parknet.co.jpSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a090a5a7

VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones · 4ecfed61

由 Alexander A. Klimov 提交于 8月 11, 2020

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `xmlns`:
        For each link, `http://[^# 	]*(?:\w|/)`:
	  If neither `gnu\.org/license`, nor `mozilla\.org/MPL`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.
Signed-off-by: NAlexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Link: http://lkml.kernel.org/r/20200708200409.22293-1-grandmaster@al2klimov.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ecfed61

fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes · e348e65a

由 Yubo Feng 提交于 8月 11, 2020

There is no need to hold write_lock in fat_ioctl_get_attributes.
write_lock may make an impact on concurrency of fat_ioctl_get_attributes.
Signed-off-by: NYubo Feng <fengyubo3@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Link: http://lkml.kernel.org/r/1593308053-12702-1-git-send-email-fengyubo3@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e348e65a

fs/ufs: avoid potential u32 multiplication overflow · 88b2e9b0

由 Colin Ian King 提交于 8月 11, 2020

The 64 bit ino is being compared to the product of two u32 values,
however, the multiplication is being performed using a 32 bit multiply so
there is a potential of an overflow.  To be fully safe, cast uspi->s_ncg
to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of
overflow.

Fixes: f3e2a520 ("ufs: NFS support")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/20200715170355.1081713-1-colin.king@canonical.com
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88b2e9b0

nilfs2: use a more common logging style · a1d0747a

由 Joe Perches 提交于 8月 11, 2020

Add macros for nilfs_<level>(sb, fmt, ...) and convert the uses of
'nilfs_msg(sb, KERN_<LEVEL>, ...)' to 'nilfs_<level>(sb, ...)' so nilfs2
uses a logging style more like the typical kernel logging style.

Miscellanea:

o Realign arguments for these uses
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1595860111-3920-4-git-send-email-konishi.ryusuke@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1d0747a

nilfs2: convert __nilfs_msg to integrate the level and format · 2987a4cf

由 Joe Perches 提交于 8月 11, 2020

Reduce object size a bit by removing the KERN_<LEVEL> as a separate
argument and adding it to the format string.

Reduce overall object size by about ~.5% (x86-64 defconfig w/ nilfs2)

old:
$ size -t fs/nilfs2/built-in.a | tail -1
 191738	   8676	     44	 200458	  30f0a	(TOTALS)

new:
$ size -t fs/nilfs2/built-in.a | tail -1
 190971	   8676	     44	 199691	  30c0b	(TOTALS)
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1595860111-3920-3-git-send-email-konishi.ryusuke@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2987a4cf

nilfs2: only call unlock_new_inode() if I_NEW · 1b0e3186