提交 · aaa3aef34d3ab9499a5c7633823429f7a24e6dff · openeuler / Kernel

01 6月, 2020 3 次提交

cifs: set up next DFS target before generic_ip_connect() · aaa3aef3

由 Paulo Alcantara 提交于 5月 19, 2020

If we mount a very specific DFS link

    \\FS0.FOO.COM\dfs\link -> \FS0\share1, \FS1\share2

where its target list contains NB names ("FS0" & "FS1") rather than
FQDN ones ("FS0.FOO.COM" & "FS1.FOO.COM"), we end up connecting to
\FOO\share1 but server->hostname will have "FOO.COM".  The reason is
because both "FS0" and "FS0.FOO.COM" resolve to same IP address and
they share same TCP server connection, but "FS0.FOO.COM" was the first
hostname set -- which is OK.

However, if the echo thread timeouts and we still have a good
connection to "FS0", in cifs_reconnect()

    rc = generic_ip_connect(server) -> success
    if (rc) {
            ...
            reconn_inval_dfs_target(server, cifs_sb, &tgt_list,
	                            &tgt_it);
            ...
     }
     ...

it successfully reconnects to "FS0" server but does not set up next
DFS target - which should be the same target server "\FS0\share1" -
and server->hostname remains set to "FS0.FOO.COM" rather than "FS0",
as reconn_inval_dfs_target() would have it set to "FS0" if called
earlier.

Finally, in __smb2_reconnect(), the reconnect of tcons would fail
because tcon->ses->server->hostname (FS0.FOO.COM) does not match DFS
target's hostname (FS0).

Fix that by calling reconn_inval_dfs_target() before
generic_ip_connect() so server->hostname will get updated correctly
prior to reconnecting its tcons in __smb2_reconnect().

With "cifs: handle hostnames that resolve to same ip in failover"
patch

    - The above problem would not occur.
    - We could save an DNS query to find out that they both resolve to
      the same ip address.
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

aaa3aef3

cifs: remove redundant initialization of variable rc · 136a5dc3

由 Colin Ian King 提交于 5月 27, 2020

The variable rc is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

136a5dc3

cifs: handle "nolease" option for vers=1.0 · 8fd6e1d6

由 Kenneth D'souza 提交于 5月 18, 2020

The "nolease" mount option is only supported for SMB2+ mounts.
Fail with appropriate error message if vers=1.0 option is passed.
Signed-off-by: NKenneth D'souza <kdsouza@redhat.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

8fd6e1d6

29 5月, 2020 2 次提交

gfs2: Even more gfs2_find_jhead fixes · 20be493b

由 Andreas Gruenbacher 提交于 5月 26, 2020

Fix several issues in the previous gfs2_find_jhead fix:
* When updating @blocks_submitted, @block refers to the first block block not
  submitted yet, not the last block submitted, so fix an off-by-one error.
* We want to ensure that @blocks_submitted is far enough ahead of @blocks_read
  to guarantee that there is in-flight I/O.  Otherwise, we'll eventually end up
  waiting for pages that haven't been submitted, yet.
* It's much easier to compare the number of blocks added with the number of
  blocks submitted to limit the maximum bio size.
* Even with bio chaining, we can keep adding blocks until we reach the maximum
  bio size, as long as we stop at a page boundary.  This simplifies the logic.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

20be493b

fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() · 1d605416

由 Alexander Potapenko 提交于 5月 27, 2020

KMSAN reported uninitialized data being written to disk when dumping
core.  As a result, several kilobytes of kmalloc memory may be written
to the core file and then read by a non-privileged user.
Reported-by: Nsam <sunhaoyl@outlook.com>
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com
Link: https://github.com/google/kmsan/issues/76Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d605416

28 5月, 2020 1 次提交

fanotify: turn off support for FAN_DIR_MODIFY · f1793699

由 Amir Goldstein 提交于 5月 27, 2020

FAN_DIR_MODIFY has been enabled by commit 44d705b0 ("fanotify:
report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are
planning further extensions to the fanotify API and during that we
realized that FAN_DIR_MODIFY may behave slightly differently to be more
consistent with extensions we plan. So until we finalize these
extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to
userland.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

f1793699

27 5月, 2020 1 次提交

ceph: flush release queue when handling caps for unknown inode · fb33c114

由 Jeff Layton 提交于 5月 20, 2020

It's possible for the VFS to completely forget about an inode, but for
it to still be sitting on the cap release queue. If the MDS sends the
client a cap message for such an inode, it just ignores it today, which
can lead to a stall of up to 5s until the cap release queue is flushed.

If we get a cap message for an inode that can't be located, then go
ahead and flush the cap release queue.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/45532
Fixes: 1e9c2eb6 ("ceph: delete stale dentry when last reference is dropped")
Reported-and-Tested-by: NAndrej Filipčič <andrej.filipcic@ijs.si>
Suggested-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fb33c114

23 5月, 2020 1 次提交

rxrpc: Fix a warning · 8a1d24e1

由 David Howells 提交于 5月 22, 2020

Fix a warning due to an uninitialised variable.

le included from ../fs/afs/fs_probe.c:11:
../fs/afs/fs_probe.c: In function 'afs_fileserver_probe_result':
../fs/afs/internal.h:1453:2: warning: 'rtt_us' may be used uninitialized in this function [-Wmaybe-uninitialized]
 1453 |  printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__)
      |  ^~~~~~
../fs/afs/fs_probe.c:35:15: note: 'rtt_us' was declared here
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8a1d24e1

22 5月, 2020 1 次提交

exfat: add the dummy mount options to be backward compatible with staging/exfat · 907fa893

由 Namjae Jeon 提交于 5月 22, 2020

As Ubuntu and Fedora release new version used kernel version equal to or
higher than v5.4, They started to support kernel exfat filesystem.

Linus reported a mount error with new version of exfat on Fedora:

        exfat: Unknown parameter 'namecase'

This is because there is a difference in mount option between old
staging/exfat and new exfat.  And utf8, debug, and codepage options as
well as namecase have been removed from new exfat.

This patch add the dummy mount options as deprecated option to be
backward compatible with old one.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

907fa893

21 5月, 2020 1 次提交

pipe: Fix pipe_full() test in opipe_prep(). · 566d1362

由 Tetsuo Handa 提交于 5月 20, 2020

syzbot is reporting that splice()ing from non-empty read side to
already-full write side causes unkillable task, for opipe_prep() is by
error not inverting pipe_full() test.

  CPU: 0 PID: 9460 Comm: syz-executor.5 Not tainted 5.6.0-rc3-next-20200228-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:rol32 include/linux/bitops.h:105 [inline]
  RIP: 0010:iterate_chain_key kernel/locking/lockdep.c:369 [inline]
  RIP: 0010:__lock_acquire+0x6a3/0x5270 kernel/locking/lockdep.c:4178
  Call Trace:
     lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4720
     __mutex_lock_common kernel/locking/mutex.c:956 [inline]
     __mutex_lock+0x156/0x13c0 kernel/locking/mutex.c:1103
     pipe_lock_nested fs/pipe.c:66 [inline]
     pipe_double_lock+0x1a0/0x1e0 fs/pipe.c:104
     splice_pipe_to_pipe fs/splice.c:1562 [inline]
     do_splice+0x35f/0x1520 fs/splice.c:1141
     __do_sys_splice fs/splice.c:1447 [inline]
     __se_sys_splice fs/splice.c:1427 [inline]
     __x64_sys_splice+0x2b5/0x320 fs/splice.c:1427
     do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot+b48daca8639150bc5e73@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=9386d051e11e09973d5a4cf79af5e8cedf79386d
Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

566d1362

20 5月, 2020 6 次提交

io_uring: reset -EBUSY error when io sq thread is waken up · d4ae271d

由 Xiaoguang Wang 提交于 5月 20, 2020

In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:

int main(int argc, char *argv[])
{
        struct io_uring ring;
        int i, fd, ret;
        struct io_uring_sqe *sqe;
        struct io_uring_cqe *cqe;
        struct iovec *iovecs;
        void *buf;
        struct io_uring_params p;

        if (argc < 2) {
                printf("%s: file\n", argv[0]);
                return 1;
        }

        memset(&p, 0, sizeof(p));
        p.flags = IORING_SETUP_SQPOLL;
        ret = io_uring_queue_init_params(4, &ring, &p);
        if (ret < 0) {
                fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                return 1;
        }

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0) {
                perror("open");
                return 1;
        }

        iovecs = calloc(10, sizeof(struct iovec));
        for (i = 0; i < 10; i++) {
                if (posix_memalign(&buf, 4096, 4096))
                        return 1;
                iovecs[i].iov_base = buf;
                iovecs[i].iov_len = 4096;
        }

        ret = io_uring_register_files(&ring, &fd, 1);
        if (ret < 0) {
                fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                return ret;
        }

        for (i = 0; i < 10; i++) {
                sqe = io_uring_get_sqe(&ring);
                if (!sqe)
                        break;

                io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                sqe->flags |= IOSQE_FIXED_FILE;

                ret = io_uring_submit(&ring);
                sleep(1);
                printf("submit %d\n", i);
        }

        for (i = 0; i < 10; i++) {
                io_uring_wait_cqe(&ring, &cqe);
                printf("receive: %d\n", i);
                if (cqe->res != 4096) {
                        fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                        ret = 1;
                }
                io_uring_cqe_seen(&ring, cqe);
        }

        close(fd);
        io_uring_queue_exit(&ring);
        return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4ae271d

io_uring: don't add non-IO requests to iopoll pending list · b532576e

由 Jens Axboe 提交于 5月 19, 2020

We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.

Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b532576e

fix multiplication overflow in copy_fdtable() · 4e89b721

由 Al Viro 提交于 5月 19, 2020

cpy and set really should be size_t; we won't get an overflow on that,
since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
so nr that would've managed to overflow size_t on that multiplication
won't get anywhere near copy_fdtable() - we'll fail with EMFILE
before that.

Cc: stable@kernel.org # v2.6.25+
Fixes: 9cfe015a (get rid of NR_OPEN and introduce a sysctl_nr_open)
Reported-by: NThiago Macieira <thiago.macieira@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e89b721

io_uring: don't use kiocb.private to store buf_index · 4f4eeba8

由 Bijan Mottahedeh 提交于 5月 19, 2020

kiocb.private is used in iomap_dio_rw() so store buf_index separately.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>

Move 'buf_index' to a hole in io_kiocb.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4f4eeba8

ext4: fix fiemap size checks for bitmap files · 959f7584

由 Christoph Hellwig 提交于 5月 05, 2020

Add an extra validation of the len parameter, as for ext4 some files
might have smaller file size limits than others. This also means the
redundant size check in ext4_ioctl_get_es_cache can go away, as all
size checking is done in the shared fiemap handler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

959f7584

ext4: fix EXT4_MAX_LOGICAL_BLOCK macro · 9f44eda1

由 Ritesh Harjani 提交于 5月 05, 2020

ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9f44eda1

19 5月, 2020 2 次提交

afs: Don't unlock fetched data pages until the op completes successfully · 9d1be4f4

由 David Howells 提交于 5月 17, 2020

Don't call req->page_done() on each page as we finish filling it with
the data coming from the network.  Whilst this might speed up the
application a bit, it's a problem if there's a network failure and the
operation has to be reissued.

If this happens, an oops occurs because afs_readpages_page_done() clears
the pointer to each page it unlocks and when a retry happens, the
pointers to the pages it wants to fill are now NULL (and the pages have
been unlocked anyway).

Instead, wait till the operation completes successfully and only then
release all the pages after clearing any terminal gap (the server can
give us less data than we requested as we're allowed to ask for more
than is available).

KASAN produces a bug like the following, and even without KASAN, it can
oops and panic.

    BUG: KASAN: wild-memory-access in _copy_to_iter+0x323/0x5f4
    Write of size 1404 at addr 0005088000000000 by task md5sum/5235

    CPU: 0 PID: 5235 Comm: md5sum Not tainted 5.7.0-rc3-fscache+ #250
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Call Trace:
     memcpy+0x39/0x58
     _copy_to_iter+0x323/0x5f4
     __skb_datagram_iter+0x89/0x2a6
     skb_copy_datagram_iter+0x129/0x135
     rxrpc_recvmsg_data.isra.0+0x615/0xd42
     rxrpc_kernel_recv_data+0x1e9/0x3ae
     afs_extract_data+0x139/0x33a
     yfs_deliver_fs_fetch_data64+0x47a/0x91b
     afs_deliver_to_call+0x304/0x709
     afs_wait_for_call_to_complete+0x1cc/0x4ad
     yfs_fs_fetch_data+0x279/0x288
     afs_fetch_data+0x1e1/0x38d
     afs_readpages+0x593/0x72e
     read_pages+0xf5/0x21e
     __do_page_cache_readahead+0x128/0x23f
     ondemand_readahead+0x36e/0x37f
     generic_file_buffered_read+0x234/0x680
     new_sync_read+0x109/0x17e
     vfs_read+0xe6/0x138
     ksys_read+0xd8/0x14d
     do_syscall_64+0x6e/0x8a
     entry_SYSCALL_64_after_hwframe+0x49/0xb3

Fixes: 196ee9cd ("afs: Make afs_fs_fetch_data() take a list of pages")
Fixes: 30062bd1 ("afs: Implement YFS support in the fs client")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d1be4f4

io_uring: cancel work if task_work_add() fails · e3aabf95

由 Jens Axboe 提交于 5月 18, 2020

We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.

Fixes: aa96bf8a ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3aabf95

18 5月, 2020 4 次提交

exfat: fix possible memory leak in exfat_find() · 94182167

由 Wei Yongjun 提交于 5月 06, 2020

'es' is malloced from exfat_get_dentry_set() in exfat_find() and should
be freed before leaving from the error handling cases, otherwise it will
cause memory leak.

Fixes: 5f2aa075 ("exfat: add inode operations")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

94182167

exfat: use iter_file_splice_write · 03577948

由 Eric Sandeen 提交于 5月 01, 2020

Doing copy_file_range() on exfat with a file opened for direct IO leads
to an -EFAULT:

# xfs_io -f -d -c "truncate 32768" \
       -c "copy_range -d 16384 -l 16384 -f 0" /mnt/test/junk
copy_range: Bad address

and the reason seems to be that we go through:

default_file_splice_write
 splice_from_pipe
  __splice_from_pipe
   write_pipe_buf
    __kernel_write
     new_sync_write
      generic_file_write_iter
       generic_file_direct_write
        exfat_direct_IO
         do_blockdev_direct_IO
          iov_iter_get_pages

and land in iterate_all_kinds(), which does "return -EFAULT" for our kvec
iter.

Setting exfat's splice_write to iter_file_splice_write fixes this and lets
fsx (which originally detected the problem) run to success from
the xfstests harness.
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

03577948

ubifs: fix wrong use of crypto_shash_descsize() · 3c3c32f8

由 Eric Biggers 提交于 5月 01, 2020

crypto_shash_descsize() returns the size of the shash_desc context
needed to compute the hash, not the size of the hash itself.

crypto_shash_digestsize() would be correct, or alternatively using
c->hash_len and c->hmac_desc_len which already store the correct values.
But actually it's simpler to just use stack arrays, so do that instead.

Fixes: 49525e5e ("ubifs: Add helper functions for authentication support")
Fixes: da8ef65f ("ubifs: Authenticate replayed journal")
Cc: <stable@vger.kernel.org> # v4.20+
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NSascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: NRichard Weinberger <richard@nod.at>

3c3c32f8

io_uring: remove dead check in io_splice() · 948a7749

由 Jens Axboe 提交于 5月 17, 2020

We checked for 'force_nonblock' higher up, so it's definitely false
at this point. Kill the check, it's a remnant of when we tried to do
inline splice without always punting to async context.

Fixes: 2fb3e822 ("io_uring: punt splice async because of inode mutex")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

948a7749

17 5月, 2020 4 次提交

exec: Move would_dump into flush_old_exec · f87d1c95

由 Eric W. Biederman 提交于 5月 16, 2020

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f87d1c95

io_uring: fix FORCE_ASYNC req preparation · bd2ab18a

由 Pavel Begunkov 提交于 5月 17, 2020

As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
so they can be prepared properly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd2ab18a

io_uring: don't prepare DRAIN reqs twice · 650b5481

由 Pavel Begunkov 提交于 5月 17, 2020

If req->io is not NULL, it's already prepared. Don't do it again,
it's dangerous.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

650b5481

io_uring: initialize ctx->sqo_wait earlier · 583863ed

由 Jens Axboe 提交于 5月 17, 2020

Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
instead of deferring it to the offload setup. This fixes a syzbot
reported lockdep complaint, which is really due to trying to wake_up
on an uninitialized wait queue:

RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
 io_poll_remove_all fs/io_uring.c:4357 [inline]
 io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
 io_uring_create fs/io_uring.c:7843 [inline]
 io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441319
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9

Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

583863ed

15 5月, 2020 3 次提交

cifs: fix leaked reference on requeued write · a4813799

由 Adam McCoy 提交于 5月 13, 2020

Failed async writes that are requeued may not clean up a refcount
on the file, which can result in a leaked open. This scenario arises
very reliably when using persistent handles and a reconnect occurs
while writing.

cifs_writev_requeue only releases the reference if the write fails
(rc != 0). The server->ops->async_writev operation will take its own
reference, so the initial reference can always be released.
Signed-off-by: NAdam McCoy <adam@forsedomani.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

a4813799

NFSv3: fix rpc receive buffer size for MOUNT call · 8eed292b

由 Olga Kornievskaia 提交于 5月 14, 2020

Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when
computing reply buffer size"), there was enough slack in the reply
buffer to commodate filehandles of size 60bytes. However, the real
problem was that the reply buffer size for the MOUNT operation was
not correctly calculated. Received buffer size used the filehandle
size for NFSv2 (32bytes) which is much smaller than the allowed
filehandle size for the v3 mounts.

Fix the reply buffer size (decode arguments size) for the MNT command.

Fixes: 2c94b8ec ("SUNRPC: Use au_rslack when computing reply buffer size")
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8eed292b

epoll: call final ep_events_available() check under the lock · 65759097

由 Roman Penyaev 提交于 5月 13, 2020

There is a possible race when ep_scan_ready_list() leaves ->rdllist and
->obflist empty for a short period of time although some events are
pending. It is quite likely that ep_events_available() observes empty
lists and goes to sleep.

Since commit 339ddb53 ("fs/epoll: remove unnecessary wakeups of
nested epoll") we are conservative in wakeups (there is only one place
for wakeup and this is ep_poll_callback()), thus ep_events_available()
must always observe correct state of two lists.

The easiest and correct way is to do the final check under the lock.
This does not impact the performance, since lock is taken anyway for
adding a wait entry to the wait queue.

The discussion of the problem can be found here:

https://lore.kernel.org/linux-fsdevel/a2f22c3c-c25a-4bda-8339-a7bdaf17849e@akamai.com/

In this patch barrierless __set_current_state() is used. This is safe
since waitqueue_active() is called under the same lock on wakeup side.

Short-circuit for fatal signals (i.e. fatal_signal_pending() check) is
moved to the line just before actual events harvesting routine. This is
fully compliant to what is said in the comment of the patch where the
actual fatal_signal_pending() check was added: c257a340 ("fs, epoll:
short circuit fetching events if thread has been killed").

Fixes: 339ddb53 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Reported-by: NJason Baron <jbaron@akamai.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200505145609.1865152-1-rpenyaev@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65759097

14 5月, 2020 2 次提交

cifs: Fix null pointer check in cifs_read · 9bd21d4b

由 Steve French 提交于 5月 13, 2020

Coverity scan noted a redundant null check

Coverity-id: 728517
Reported-by: NCoverity <scan-admin@coverity.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NShyam Prasad N <nspmangalore@gmail.com>

9bd21d4b

io_uring: polled fixed file must go through free iteration · 9d9e88a2

由 Jens Axboe 提交于 5月 13, 2020

When we changed the file registration handling, it became important to
iterate the bulk request freeing list for fixed files as well, or we
miss dropping the fixed file reference. If not, we're leaking references,
and we'll get a kworker stuck waiting for file references to disappear.

This also means we can remove the special casing of fixed vs non-fixed
files, we need to iterate for both and we can just rely on
__io_req_aux_free() doing io_put_file() instead of doing it manually.

Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9d9e88a2

13 5月, 2020 2 次提交

NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfs · 4fa7ef69

由 Trond Myklebust 提交于 5月 13, 2020

When we're doing pnfs then the credential being used for the RPC call
is not necessarily the same as the one used in the open context, so
don't use RPC_TASK_CRED_NOREF.

Fixes: 61296507 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

4fa7ef69

ovl: potential crash in ovl_fid_to_fh() · 9aafc1b0

由 Dan Carpenter 提交于 5月 05, 2020

The "buflen" value comes from the user and there is a potential that it
could be zero.  In do_handle_to_path() we know that "handle->handle_bytes"
is non-zero and we do:

	handle_dwords = handle->handle_bytes >> 2;

So values 1-3 become zero.  Then in ovl_fh_to_dentry() we do:

	int len = fh_len << 2;

So now len is in the "0,4-128" range and a multiple of 4.  But if
"buflen" is zero it will try to copy negative bytes when we do the
memcpy in ovl_fid_to_fh().

	memcpy(&fh->fb, fid, buflen - OVL_FH_WIRE_OFFSET);

And that will lead to a crash.  Thanks to Amir Goldstein for his help
with this patch.

Fixes: cbe7fba8 ("ovl: make sure that real fid is 32bit aligned in memory")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Cc: <stable@vger.kernel.org> # v5.5
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

9aafc1b0

12 5月, 2020 2 次提交

NFS: Don't use RPC_TASK_CRED_NOREF with delegreturn · f304a809

由 Trond Myklebust 提交于 5月 11, 2020

We are not guaranteed that the credential will remain pinned.

Fixes: 61296507 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f304a809

nfs: fix NULL deference in nfs4_get_valid_delegation · 29fe8399

由 J. Bruce Fields 提交于 5月 11, 2020

We add the new state to the nfsi->open_states list, making it
potentially visible to other threads, before we've finished initializing
it.

That wasn't a problem when all the readers were also taking the i_lock
(as we do here), but since we switched to RCU, there's now a possibility
that a reader could see the partially initialized state.

Symptoms observed were a crash when another thread called
nfs4_get_valid_delegation() on a NULL inode, resulting in an oops like:

	BUG: unable to handle page fault for address: ffffffffffffffb0 ...
	RIP: 0010:nfs4_get_valid_delegation+0x6/0x30 [nfsv4] ...
	Call Trace:
	 nfs4_open_prepare+0x80/0x1c0 [nfsv4]
	 __rpc_execute+0x75/0x390 [sunrpc]
	 ? finish_task_switch+0x75/0x260
	 rpc_async_schedule+0x29/0x40 [sunrpc]
	 process_one_work+0x1ad/0x370
	 worker_thread+0x30/0x390
	 ? create_worker+0x1a0/0x1a0
	 kthread+0x10c/0x130
	 ? kthread_park+0x80/0x80
	 ret_from_fork+0x22/0x30

Fixes: 9ae075fd "NFSv4: Convert open state lookup to use RCU"
Reviewed-by: NSeiichi Ikarashi <s.ikarashi@fujitsu.com>
Tested-by: NDaisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

29fe8399

11 5月, 2020 1 次提交

rxrpc: Fix the excessive initial retransmission timeout · c410bf01

由 David Howells 提交于 5月 11, 2020

rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled. This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.

Fix this by:

(1) Copying the RTT/RTO calculation code from Linux's TCP implementation
and altering it to fit rxrpc.

(2) Altering the various users of the RTT to make use of the new SRTT
value.

(3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
value instead (which is needed in jiffies), along with a backoff.

Notes:

(1) rxrpc provides RTT samples by matching the serial numbers on outgoing
DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
against the reference serial number in incoming REQUESTED ACK and
PING-RESPONSE ACK packets.

(2) Each packet that is transmitted on an rxrpc connection gets a new
per-connection serial number, even for retransmissions, so an ACK can
be cross-referenced to a specific trigger packet. This allows RTT
information to be drawn from retransmitted DATA packets also.

(3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
on an rxrpc_call because many RPC calls won't live long enough to
generate more than one sample.

(4) The calculated SRTT value is in units of 8ths of a microsecond rather
than nanoseconds.

The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

Fixes: 17926a79 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: NDavid Howells <dhowells@redhat.com>

c410bf01

10 5月, 2020 2 次提交

bdi: use bdi_dev_name() to get device name · d51cfc53

由 Yufen Yu 提交于 5月 04, 2020

Use the common interface bdi_dev_name() to get device name.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>

Add missing <linux/backing-dev.h> include BFQ
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d51cfc53

io_uring: fix zero len do_splice() · c9687426

由 Pavel Begunkov 提交于 5月 04, 2020

do_splice() doesn't expect len to be 0. Just always return 0 in this
case as splice(2) does.

Fixes: 7d67af2c ("io_uring: add splice(2) support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c9687426

09 5月, 2020 2 次提交

cachefiles: Fix race between read_waiter and read_copier involving op->to_do · 7bb0c533

由 Lei Xue 提交于 5月 07, 2020

There is a potential race in fscache operation enqueuing for reading and
copying multiple pages from cachefiles to netfs.  The problem can be seen
easily on a heavy loaded system (for example many processes reading files
continually on an NFS share covered by fscache triggered this problem within
a few minutes).

The race is due to cachefiles_read_waiter() adding the op to the monitor
to_do list and then then drop the object->work_lock spinlock before
completing fscache_enqueue_operation().  Once the lock is dropped,
cachefiles_read_copier() grabs the op, completes processing it, and
makes it through fscache_retrieval_complete() which sets the op->state to
the final state of FSCACHE_OP_ST_COMPLETE(4).  When cachefiles_read_waiter()
finally gets through the remainder of fscache_enqueue_operation()
it sees the invalid state, and hits the ASSERTCMP and the following
oops is seen:
[ 2259.612361] FS-Cache:
[ 2259.614785] FS-Cache: Assertion failed
[ 2259.618639] FS-Cache: 4 == 5 is false
[ 2259.622456] ------------[ cut here ]------------
[ 2259.627190] kernel BUG at fs/fscache/operation.c:70!
...
[ 2259.791675] RIP: 0010:[<ffffffffc061b4cf>]  [<ffffffffc061b4cf>] fscache_enqueue_operation+0xff/0x170 [fscache]
[ 2259.802059] RSP: 0000:ffffa0263d543be0  EFLAGS: 00010046
[ 2259.807521] RAX: 0000000000000019 RBX: ffffa01a4d390480 RCX: 0000000000000006
[ 2259.814847] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffa0263d553890
[ 2259.822176] RBP: ffffa0263d543be8 R08: 0000000000000000 R09: ffffa0263c2d8708
[ 2259.829502] R10: 0000000000001e7f R11: 0000000000000000 R12: ffffa01a4d390480
[ 2259.844483] R13: ffff9fa9546c5920 R14: ffffa0263d543c80 R15: ffffa0293ff9bf10
[ 2259.859554] FS:  00007f4b6efbd700(0000) GS:ffffa0263d540000(0000) knlGS:0000000000000000
[ 2259.875571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2259.889117] CR2: 00007f49e1624ff0 CR3: 0000012b38b38000 CR4: 00000000007607e0
[ 2259.904015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2259.918764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2259.933449] PKRU: 55555554
[ 2259.943654] Call Trace:
[ 2259.953592]  <IRQ>
[ 2259.955577]  [<ffffffffc03a7c12>] cachefiles_read_waiter+0x92/0xf0 [cachefiles]
[ 2259.978039]  [<ffffffffa34d3942>] __wake_up_common+0x82/0x120
[ 2259.991392]  [<ffffffffa34d3a63>] __wake_up_common_lock+0x83/0xc0
[ 2260.004930]  [<ffffffffa34d3510>] ? task_rq_unlock+0x20/0x20
[ 2260.017863]  [<ffffffffa34d3ab3>] __wake_up+0x13/0x20
[ 2260.030230]  [<ffffffffa34c72a0>] __wake_up_bit+0x50/0x70
[ 2260.042535]  [<ffffffffa35bdcdb>] unlock_page+0x2b/0x30
[ 2260.054495]  [<ffffffffa35bdd09>] page_endio+0x29/0x90
[ 2260.066184]  [<ffffffffa368fc81>] mpage_end_io+0x51/0x80

CPU1
cachefiles_read_waiter()
 20 static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
 21                                   int sync, void *_key)
 22 {
...
 61         spin_lock(&object->work_lock);
 62         list_add_tail(&monitor->op_link, &op->to_do);
 63         spin_unlock(&object->work_lock);
<begin race window>
 64
 65         fscache_enqueue_retrieval(op);
182 static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
183 {
184         fscache_enqueue_operation(&op->op);
185 }
 58 void fscache_enqueue_operation(struct fscache_operation *op)
 59 {
 60         struct fscache_cookie *cookie = op->object->cookie;
 61
 62         _enter("{OBJ%x OP%x,%u}",
 63                op->object->debug_id, op->debug_id, atomic_read(&op->usage));
 64
 65         ASSERT(list_empty(&op->pend_link));
 66         ASSERT(op->processor != NULL);
 67         ASSERT(fscache_object_is_available(op->object));
 68         ASSERTCMP(atomic_read(&op->usage), >, 0);
<end race window>

CPU2
cachefiles_read_copier()
168         while (!list_empty(&op->to_do)) {
...
202                 fscache_end_io(op, monitor->netfs_page, error);
203                 put_page(monitor->netfs_page);
204                 fscache_retrieval_complete(op, 1);

CPU1
 58 void fscache_enqueue_operation(struct fscache_operation *op)
 59 {
...
 69         ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS,
 70                     op->state, ==,  FSCACHE_OP_ST_CANCELLED);
Signed-off-by: NLei Xue <carmark.dlut@gmail.com>
Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

7bb0c533

NFSv4: Fix fscache cookie aux_data to ensure change_attr is included · 50eaa652

由 Dave Wysochanski 提交于 4月 16, 2020

Commit 402cb8dd ("fscache: Attach the index key and aux data to
the cookie") added the aux_data and aux_data_len to parameters to
fscache_acquire_cookie(), and updated the callers in the NFS client.
In the process it modified the aux_data to include the change_attr,
but missed adding change_attr to a couple places where aux_data was
used.  Specifically, when opening a file and the change_attr is not
added, the following attempt to lookup an object will fail inside
cachefiles_check_object_xattr() = -116 due to
nfs_fscache_inode_check_aux() failing memcmp on auxdata and returning
FSCACHE_CHECKAUX_OBSOLETE.

Fix this by adding nfs_fscache_update_auxdata() to set the auxdata
from all relevant fields in the inode, including the change_attr.

Fixes: 402cb8dd ("fscache: Attach the index key and aux data to the cookie")
Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

50eaa652

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功