提交 · 919e2bb8b63c6084593d7e77d71351e02b941aab · openeuler / Kernel

05 5月, 2020 9 次提交

docs: filesystems: convert configfs.txt to ReST · 98264991

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Use copyright symbol;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Also, as this file is alone on its own dir, and it doesn't
seem too likely that other documents will follow it, let's
move it to the filesystems/ root documentation dir.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/c2424ec2ad4d735751434ff7f52144c44aa02d5a.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

98264991

docs: filesystems: convert mandatory-locking.txt to ReST · a02dcdf6

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Use notes markups;
- Add it to filesystems/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/aecd6259fe9f99b2c2b3440eab6a2b989125e00d.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

a02dcdf6

docs: filesystems: convert coda.txt to ReST · f476c6ed

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

This document has its own style. It seems to be print output
for the old matrixial printers where backspace were used to
do double prints.

For the conversion, I used several regex expressions to get
rid of some weird stuff. The patch also does almost all possible
conversions in order to get a nice output document, while keeping
it readable/editable as is:

- Add a SPDX header;
- Add a document title;
- Adjust document title;
- Adjust section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Adjust list markups;
- Mark some unumbered titles with bold font;
- Use footnoote markups;
- Add table markups;
- Use notes markups;
- Add it to filesystems/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/25c06c40c3d7b947a131c3be124ce0e93cc00ae3.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

f476c6ed

docs: filesystems: caching/backend-api.txt: convert it to ReST · 0e822145

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/caching/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5d0a61abaa87bfe913b9e2f321e74ef7af0f3dfc.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

0e822145

docs: filesystems: caching/cachefiles.txt: convert to ReST · d74802ad

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document title;
- Mark literal blocks as such;
- Add table markups;
- Comment out text ToC for html/pdf output;
- Add lists markups;
- Add it to filesystems/caching/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/eec0cfc268e8dca348f760224685100c9c2caba6.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

d74802ad

docs: filesystems: caching/operations.txt: convert it to ReST · 09eac7c5

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Comment out text ToC for html/pdf output;
- Mark literal blocks as such;
- Add it to filesystems/caching/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/97e71cc598a4f61df484ebda3ec06b63530ceb62.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

09eac7c5

docs: filesystems: caching/netfs-api.txt: convert it to ReST · efc930fa

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/caching/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cfe4cb1bf8e1f0093d44c30801ec42e74721e543.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

efc930fa

docs: filesystems: convert caching/fscache.txt to ReST format · fd299b2a

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Comment out text ToC for html/pdf output;
- Some whitespace fixes and new line breaks;
- Add table markups;
- Add it to filesystems/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/e33ec382a53cf10ffcbd802f6de3f384159cddba.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

fd299b2a

docs: filesystems: convert caching/object.txt to ReST · 67145c23

由 Mauro Carvalho Chehab 提交于 4月 27, 2020

- Add a SPDX header;
- Adjust document and section titles;
- Comment out text ToC for html/pdf output;
- Some whitespace fixes and new line breaks;
- Adjust the events list to make them look better for html output;
- Add it to filesystems/index.rst.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/49026a8ea7e714c2e0f003aa26b975b1025476b7.1588021877.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

67145c23

21 4月, 2020 3 次提交

fs: inode.c: get rid of docs warnings · 2b8e8b55

由 Mauro Carvalho Chehab 提交于 4月 14, 2020

Use *foo makes the toolchain to think that this is an emphasis, causing
those warnings:

	./fs/inode.c:1609: WARNING: Inline emphasis start-string without end-string.
	./fs/inode.c:1609: WARNING: Inline emphasis start-string without end-string.
	./fs/inode.c:1615: WARNING: Inline emphasis start-string without end-string.

So, use, instead, ``*foo``, in order to mark it as a literal block.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/e8da46a0e57f2af6d63a0c53665495075698e28a.1586881715.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

2b8e8b55

docs: filesystems: fix renamed references · 0c1bc6b8

由 Mauro Carvalho Chehab 提交于 4月 14, 2020

Some filesystem references got broken by a previous patch
series I submitted. Address those.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David Sterba <dsterba@suse.com> # fs/affs/Kconfig
Link: https://lore.kernel.org/r/57318c53008dbda7f6f4a5a9e5787f4d37e8565a.1586881715.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

0c1bc6b8

docs: fix broken references to text files · 72ef5e52

由 Mauro Carvalho Chehab 提交于 4月 14, 2020

Several references got broken due to txt to ReST conversion.

Several of them can be automatically fixed with:

scripts/documentation-file-ref-check --fix

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig
Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt
Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT
Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

72ef5e52

11 4月, 2020 7 次提交

pNFS: Fix RCU lock leakage · 27d231c0

由 Trond Myklebust 提交于 4月 11, 2020

Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
the RCU lock.

Fixes: a9901899 ("pNFS: Add infrastructure for cleaning up per-layout commit structures")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

27d231c0

fs/seq_file.c: seq_read(): add info message about buggy .next functions · 3bfa7e14

由 Vasily Averin 提交于 4月 10, 2020

Patch series "seq_file .next functions should increase position index".

In Aug 2018 NeilBrown noticed commit 1f4aace6 ("fs/seq_file.c:
simplify seq_file iteration code and interface")

"Some ->next functions do not increment *pos when they return NULL...
Note that such ->next functions are buggy and should be fixed.  A simple
demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size
larger than the size of /proc/swaps.  This will always show the whole
last line of /proc/swaps"

Described problem is still actual.  If you make lseek into middle of
last output line following read will output end of last line and whole
last line once again.

  $ dd if=/proc/swaps bs=1  # usual output
  Filename				Type		Size	Used	Priority
  /dev/dm-0                             partition	4194812	97536	-2
  104+0 records in
  104+0 records out
  104 bytes copied

  $ dd if=/proc/swaps bs=40 skip=1    # last line was generated twice
  dd: /proc/swaps: cannot skip to specified offset
  v/dm-0                                partition	4194812	97536	-2
  /dev/dm-0                             partition	4194812	97536	-2
  3+1 records in
  3+1 records out
  131 bytes copied

There are lot of other affected files, I've found 30+ including
/proc/net/ip_tables_matches and /proc/sysvipc/*

I've sent patches into maillists of affected subsystems already, this
patch-set fixes the problem in files related to pstore, tracing, gcov,
sysvipc and other subsystems processed via linux-kernel@ mailing list
directly

https://bugzilla.kernel.org/show_bug.cgi?id=206283

This patch (of 4):

Add debug code to seq_read() to detect missed or out-of-tree incorrect
.next seq_file functions.

[akpm@linux-foundation.org: s/pr_info/pr_info_ratelimited/, per Qian Cai]
https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/244674e5-760c-86bd-d08a-047042881748@virtuozzo.com
Link: http://lkml.kernel.org/r/7c24087c-e280-e580-5b0c-0cdaeb14cd18@virtuozzo.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3bfa7e14

change email address for Pali Rohár · 149ed3d4

由 Pali Rohár 提交于 4月 10, 2020

For security reasons I stopped using gmail account and kernel address is
now up-to-date alias to my personal address.

People periodically send me emails to address which they found in source
code of drivers, so this change reflects state where people can contact
me.

[ Added .mailmap entry as per Joe Perches  - Linus ]
Signed-off-by: NPali Rohár <pali@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/20200307104237.8199-1-pali@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

149ed3d4

fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() · 26c5d78c

由 Eric Biggers 提交于 4月 10, 2020

After request_module(), nothing is stopping the module from being
unloaded until someone takes a reference to it via try_get_module().

The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
running 'rmmod' concurrently.

Since WARN_ONCE() is for kernel bugs only, not for user-reachable
situations, downgrade this warning to pr_warn_once().

Keep it printed once only, since the intent of this warning is to detect
a bug in modprobe at boot time.  Printing the warning more than once
wouldn't really provide any useful extra information.

Fixes: 41124db8 ("fs: warn in case userspace lied about modprobe return")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJessica Yu <jeyu@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: NeilBrown <neilb@suse.com>
Cc: <stable@vger.kernel.org>		[4.13+]
Link: http://lkml.kernel.org/r/20200312202552.241885-3-ebiggers@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26c5d78c

ocfs2: no need try to truncate file beyond i_size · 783fda85

由 Changwei Ge 提交于 4月 10, 2020

Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can
exceed the inode size.  Ocfs2 now doesn't allow that offset beyond inode
size.  This restriction is not necessary and violates fallocate(2)
semantics.

If fallocate(2) offset is beyond inode size, just return success and do
nothing further.

Otherwise, ocfs2 will crash the kernel.

  kernel BUG at fs/ocfs2//alloc.c:7264!
   ocfs2_truncate_inline+0x20f/0x360 [ocfs2]
   ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2]
   __ocfs2_change_file_space+0x4a5/0x650 [ocfs2]
   ocfs2_fallocate+0x83/0xa0 [ocfs2]
   vfs_fallocate+0x148/0x230
   SyS_fallocate+0x48/0x80
   do_syscall_64+0x79/0x170
Signed-off-by: NChangwei Ge <chge@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

783fda85

hfsplus: fix crash and filesystem corruption when deleting files · 25efb2ff

由 Simon Gander 提交于 4月 10, 2020

When removing files containing extended attributes, the hfsplus driver may
remove the wrong entries from the attributes b-tree, causing major
filesystem damage and in some cases even kernel crashes.

To remove a file, all its extended attributes have to be removed as well.
The driver does this by looking up all keys in the attributes b-tree with
the cnid of the file. Each of these entries then gets deleted using the
key used for searching, which doesn't contain the attribute's name when it
should. Since the key doesn't contain the name, the deletion routine will
not find the correct entry and instead remove the one in front of it. If
parent nodes have to be modified, these become corrupt as well. This
causes invalid links and unsorted entries that not even macOS's fsck_hfs
is able to fix.

To fix this, modify the search key before an entry is deleted from the
attributes b-tree by copying the found entry's key into the search key,
therefore ensuring that the correct entry gets removed from the tree.
Signed-off-by: NSimon Gander <simon@tuxera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NAnton Altaparmakov <anton@tuxera.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200327155541.1521-1-simon@tuxera.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25efb2ff

smb3: enable swap on SMB3 mounts · 4e8aea30

由 Steve French 提交于 4月 09, 2020

Add experimental support for allowing a swap file to be on an SMB3
mount.  There are use cases where swapping over a secure network
filesystem is preferable. In some cases there are no local
block devices large enough, and network block devices can be
hard to setup and secure.  And in some cases there are no
local block devices at all (e.g. with the recent addition of
remote boot over SMB3 mounts).

There are various enhancements that can be added later e.g.:
- doing a mandatory byte range lock over the swapfile (until
the Linux VFS is modified to notify the file system that an open
is for a swapfile, when the file can be opened "DENY_ALL" to prevent
others from opening it).
- pinning more buffers in the underlying transport to minimize memory
allocations in the TCP stack under the fs
- documenting how to create ACLs (on the server) to secure the
swapfile (or adding additional tools to cifs-utils to make it easier)
Signed-off-by: NSteve French <stfrench@microsoft.com>
Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

4e8aea30

10 4月, 2020 3 次提交

io_uring: punt final io_ring_ctx wait-and-free to workqueue · 85faa7b8

由 Jens Axboe 提交于 4月 09, 2020

We can't reliably wait in io_ring_ctx_wait_and_kill(), since the
task_works list isn't ordered (in fact it's LIFO ordered). We could
either fix this with a separate task_works list for io_uring work, or
just punt the wait-and-free to async context. This ensures that
task_work that comes in while we're shutting down is processed
correctly. If we don't go async, we could have work past the fput()
work for the ring that depends on work that won't be executed until
after we're done with the wait-and-free. But as this operation is
blocking, it'll never get a chance to run.

This was reproduced with hundreds of thousands of sockets running
memcached, haven't been able to reproduce this synthetically.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85faa7b8

smb3: change noisy error message to FYI · 1dc94b73

由 Steve French 提交于 4月 09, 2020

The noisy posix error message in readdir was supposed
to be an FYI (not enabled by default)
  CIFS VFS: XXX dev 66306, reparse 0, mode 755
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

1dc94b73

proc: Use a dedicated lock in struct pid · 63f818f4

由 Eric W. Biederman 提交于 4月 07, 2020

syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
>  (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&pid->wait_pidfd);
>                                local_irq_disable();
>                                lock(tasklist_lock);
>                                lock(&pid->wait_pidfd);
>   <Interrupt>
>     lock(tasklist_lock);
>
>  *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:

The problem is that because wait_pidfd.lock is taken under the tasklist
lock.  It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.

Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock.  That should be safe and I think the code will eventually
do that.  It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.

It is also true that my pre-merge window testing was insufficient.  So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things.  I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.

It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals.  That will require taking pid->lock with irqs disabled.
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e5 ("proc: Use a list of inodes to flush from proc")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

63f818f4

09 4月, 2020 1 次提交

io_uring: fix fs cleanup on cqe overflow · c398ecb3

由 Pavel Begunkov 提交于 4月 09, 2020

If completion queue overflow occurs, __io_cqring_fill_event() will
update req->cflags, which is in a union with req->work and happens to
be aliased to req->work.fs. Following io_free_req() ->
io_req_work_drop_env() may get a bunch of different problems (miscount
fs->users, segfault, etc) on cleaning @fs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c398ecb3

08 4月, 2020 17 次提交

io_uring: don't read user-shared sqe flags twice · 9c280f90

由 Pavel Begunkov 提交于 4月 08, 2020

Don't re-read userspace-shared sqe->flags, it can be exploited.
sqe->flags are copied into req->flags in io_submit_sqe(), check them
there instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c280f90

io_uring: remove req init from io_get_req() · 0553b8bd

由 Pavel Begunkov 提交于 4月 08, 2020

io_get_req() do two different things: io_kiocb allocation and
initialisation. Move init part out of it and rename into
io_alloc_req(). It's simpler this way and also have better data
locality.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0553b8bd

io_uring: alloc req only after getting sqe · b1e50e54

由 Pavel Begunkov 提交于 4月 08, 2020

As io_get_sqe() split into 2 stage get/consume, get an sqe before
allocating io_kiocb, so no free_req*() for a failure case is needed,
and inline back __io_req_do_free(), which has only 1 user.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1e50e54

io_uring: simplify io_get_sqring · 709b302f

由 Pavel Begunkov 提交于 4月 08, 2020

Make io_get_sqring() care only about sqes themselves, not initialising
the io_kiocb. Also, split it into get + consume, that will be helpful in
the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

709b302f

io_uring: do not always copy iovec in io_req_map_rw() · 45097dae

由 Xiaoguang Wang 提交于 4月 08, 2020

In io_read_prep() or io_write_prep(), io_req_map_rw() takes
struct io_async_rw's fast_iov as argument to call io_import_iovec(),
and if io_import_iovec() uses struct io_async_rw's fast_iov as
valid iovec array, later indeed io_req_map_rw() does not need
to do the memcpy operation, because they are same pointers.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

45097dae

io_uring: ensure openat sets O_LARGEFILE if needed · 08a1d26e

由 Jens Axboe 提交于 4月 08, 2020

OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:

*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded

*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded

*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large

*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large

Ensure we set O_LARGEFILE, if force_o_largefile() is true.

Cc: stable@vger.kernel.org # v5.6
Fixes: 15b71abe ("io_uring: add support for IORING_OP_OPENAT")
Reported-by: NDmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08a1d26e

orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush · 0e393a9a

由 Mike Marshall 提交于 4月 08, 2020

Christoph Hellwig noticed that we were doing some unnecessary
work in orangefs_flush:

  orangefs_flush just writes out data on every close(2) call.  There is
  no need to change anything about the dirty state, especially as
  orangefs doesn't treat I_DIRTY_TIMES special in any way.  The code
  seems to come from partially open coding vfs_fsync.

He sent in a patch with the above commit message and also a
patch that was a reversion of another Orangefs patch I had
sent upstream a while ago. I had to fix his reversion patch
so that it would compile which caused his "don't mess with
I_DIRTY_TIMES" patch to fail to apply. So here I have just
remade his patch and applied it after the fixed reversion patch.
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

0e393a9a

orangefs: get rid of knob code... · ec95f1de

由 Mike Marshall 提交于 4月 08, 2020

Christoph Hellwig sent in a reversion of "orangefs: remember count
when reading." because:

  ->read_iter calls can race with each other and one or
  more ->flush calls. Remove the the scheme to store the read
  count in the file private data as is is completely racy and
  can cause use after free or double free conditions

Christoph's reversion caused Orangefs not to work or to compile. I
added a patch that fixed that, but intel's kbuild test robot pointed
out that sending Christoph's patch followed by my patch upstream, it
would break bisection because of the failure to compile. So I have
combined the reversion plus my patch... here's the commit message
that was in my patch:

  Logically, optimal Orangefs "pages" are 4 megabytes. Reading
  large Orangefs files 4096 bytes at a time is like trying to
  kick a dead whale down the beach. Before Christoph's "Revert
  orangefs: remember count when reading." I tried to give users
  a knob whereby they could, for example, use "count" in
  read(2) or bs with dd(1) to get whatever they considered an
  appropriate amount of bytes at a time from Orangefs and fill
  as many page cache pages as they could at once.

  Without the racy code that Christoph reverted Orangefs won't
  even compile, much less work. So this replaces the logic that
  used the private file data that Christoph reverted with
  a static number of bytes to read from Orangefs.

  I ran tests like the following to determine what a
  reasonable static number of bytes might be:

  dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304
  dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152
  dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576
                            .
                            .
                            .
  dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128

  Reads seem faster using the static number, so my "knob code"
  wasn't just racy, it wasn't even a good idea...
Signed-off-by: NMike Marshall <hubcap@omnibond.com>
Reported-by: Nkbuild test robot <lkp@intel.com>

ec95f1de

smb3: smbdirect support can be configured by default · 2bcb4fd6

由 Steve French 提交于 4月 07, 2020

smbdirect support (SMB3 over RDMA) should be enabled by
default in many configurations.

It is not experimental and is stable enough and has enough
performance benefits to recommend that it be configured by
default.  Change the  "If unsure N" to "If unsure Y" in
the description of the configuration parameter.
Acked-by: NAurelien Aptel <aaptel@suse.com>
Reviewed-by: NLong Li <longli@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

2bcb4fd6

reiserfs: clean up several indentation issues · 5404e7e0

由 Colin Ian King 提交于 4月 06, 2020

There are several places where code is indented incorrectly. Fix these.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200325135018.113431-1-colin.king@canonical.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5404e7e0

fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path · aa0d1564

由 Alexey Dobriyan 提交于 4月 06, 2020

Static executables don't need to free NULL pointer.

It doesn't matter really because static executable is not common scenario
but do it anyway out of pedantry.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa0d1564

fs/binfmt_elf.c: allocate less for static executable · 0693ffeb

由 Alexey Dobriyan 提交于 4月 06, 2020

PT_INTERP ELF header can be spared if executable is static.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0693ffeb

fs/binfmt_elf.c: delete "loc" variable · c69bcc93

由 Alexey Dobriyan 提交于 4月 06, 2020

"loc" variable became just a wrapper for PT_INTERP ELF header after main
ELF header was moved to "bprm->buf". Delete it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c69bcc93

fs/epoll: make nesting accounting safe for -rt kernel · efcdd350

由 Jason Baron 提交于 4月 06, 2020

Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
ep_poll_safewake() can take several non-raw spinlocks after disabling
interrupts.  Since a spinlock can block in the -rt kernel, we can't take a
spinlock after disabling interrupts.  So let's re-work how we determine
the nesting level such that it plays nicely with the -rt kernel.

Let's introduce a 'nests' field in struct eventpoll that records the
current nesting level during ep_poll_callback().  Then, if we nest again
we can find the previous struct eventpoll that we were called from and
increase our count by 1.  The 'nests' field is protected by
ep->poll_wait.lock.

I've also moved the visited field to reduce the size of struct eventpoll
from 184 bytes to 176 bytes on x86_64 for !CONFIG_DEBUG_LOCK_ALLOC, which
is typical for a production config.
Reported-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1582739816-13167-1-git-send-email-jbaron@akamai.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>