1. 17 3月, 2020 13 次提交
  2. 14 3月, 2020 1 次提交
  3. 13 3月, 2020 4 次提交
    • M
      ovl: fix lockdep warning for async write · c8536804
      Miklos Szeredi 提交于
      Lockdep reports "WARNING: lock held when returning to user space!" due to
      async write holding freeze lock over the write.  Apparently aio.c already
      deals with this by lying to lockdep about the state of the lock.
      
      Do the same here.  No need to check for S_IFREG() here since these file ops
      are regular-only.
      
      Reported-by: syzbot+9331a354f4f624a52a55@syzkaller.appspotmail.com
      Fixes: 2406a307 ("ovl: implement async IO routines")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c8536804
    • A
      ovl: fix some xino configurations · 53afcd31
      Amir Goldstein 提交于
      Fix up two bugs in the coversion to xino_mode:
      1. xino=off does not always end up in disabled mode
      2. xino=auto on 32bit arch should end up in disabled mode
      
      Take a proactive approach to disabling xino on 32bit kernel:
      1. Disable XINO_AUTO config during build time
      2. Disable xino with a warning on mount time
      
      As a by product, xino=on on 32bit arch also ends up in disabled mode.
      We never intended to enable xino on 32bit arch and this will make the
      rest of the logic simpler.
      
      Fixes: 0f831ec8 ("ovl: simplify ovl_same_sb() helper")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      53afcd31
    • A
      cifs_atomic_open(): fix double-put on late allocation failure · d9a9f484
      Al Viro 提交于
      several iterations of ->atomic_open() calling conventions ago, we
      used to need fput() if ->atomic_open() failed at some point after
      successful finish_open().  Now (since 2016) it's not needed -
      struct file carries enough state to make fput() work regardless
      of the point in struct file lifecycle and discarding it on
      failure exits in open() got unified.  Unfortunately, I'd missed
      the fact that we had an instance of ->atomic_open() (cifs one)
      that used to need that fput(), as well as the stale comment in
      finish_open() demanding such late failure handling.  Trivially
      fixed...
      
      Fixes: fe9ec829 "do_last(): take fput() on error after opening to out:"
      Cc: stable@kernel.org # v4.7+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9a9f484
    • A
      gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache · 21039132
      Al Viro 提交于
      with the way fs/namei.c:do_last() had been done, ->atomic_open()
      instances needed to recognize the case when existing file got
      found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
      or failing themselves.  gfs2 one didn't.
      
      Fixes: 6d4ade98 (GFS2: Add atomic_open support)
      Cc: stable@kernel.org # v3.11
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      21039132
  4. 12 3月, 2020 1 次提交
  5. 09 3月, 2020 1 次提交
    • J
      io_uring: ensure RCU callback ordering with rcu_barrier() · 805b13ad
      Jens Axboe 提交于
      After more careful studying, Paul informs me that we cannot rely on
      ordering of RCU callbacks in the way that the the tagged commit did.
      The current construct looks like this:
      
      	void C(struct rcu_head *rhp)
      	{
      		do_something(rhp);
      		call_rcu(&p->rh, B);
      	}
      
      	call_rcu(&p->rh, A);
      	call_rcu(&p->rh, C);
      
      and we're relying on ordering between A and B, which isn't guaranteed.
      Make this explicit instead, and have a work item issue the rcu_barrier()
      to ensure that A has run before we manually execute B.
      
      While thorough testing never showed this issue, it's dependent on the
      per-cpu load in terms of RCU callbacks. The updated method simplifies
      the code as well, and eliminates the need to maintain an rcu_head in
      the fileset data.
      
      Fixes: c1e2148f ("io_uring: free fixed_file_data after RCU grace period")
      Reported-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      805b13ad
  6. 08 3月, 2020 1 次提交
    • E
      fscrypt: don't evict dirty inodes after removing key · 2b4eae95
      Eric Biggers 提交于
      After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the
      filesystem and tries to get and put all inodes that were unlocked by the
      key so that unused inodes get evicted via fscrypt_drop_inode().
      Normally, the inodes are all clean due to the sync.
      
      However, after the filesystem is sync'ed, userspace can modify and close
      one of the files.  (Userspace is *supposed* to close the files before
      removing the key.  But it doesn't always happen, and the kernel can't
      assume it.)  This causes the inode to be dirtied and have i_count == 0.
      Then, fscrypt_drop_inode() failed to consider this case and indicated
      that the inode can be dropped, causing the write to be lost.
      
      On f2fs, other problems such as a filesystem freeze could occur due to
      the inode being freed while still on f2fs's dirty inode list.
      
      Fix this bug by making fscrypt_drop_inode() only drop clean inodes.
      
      I've written an xfstest which detects this bug on ext4, f2fs, and ubifs.
      
      Fixes: b1c0ec35 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
      Cc: <stable@vger.kernel.org> # v5.4+
      Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      2b4eae95
  7. 07 3月, 2020 3 次提交
    • P
      io_uring: fix lockup with timeouts · f0e20b89
      Pavel Begunkov 提交于
      There is a recipe to deadlock the kernel: submit a timeout sqe with a
      linked_timeout (e.g.  test_single_link_timeout_ception() from liburing),
      and SIGKILL the process.
      
      Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
      isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
      during io_put_free() to cancel the linked timeout. Probably, the same
      can happen with another io_kill_timeout() call site, that is
      io_commit_cqring().
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f0e20b89
    • J
      io_uring: free fixed_file_data after RCU grace period · c1e2148f
      Jens Axboe 提交于
      The percpu refcount protects this structure, and we can have an atomic
      switch in progress when exiting. This makes it unsafe to just free the
      struct normally, and can trigger the following KASAN warning:
      
      BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
      Read of size 1 at addr ffff888181a19a30 by task swapper/0/0
      
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack+0x76/0xa0
       print_address_description.constprop.0+0x3b/0x60
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       __kasan_report.cold+0x1a/0x3d
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       rcu_core+0x370/0x830
       ? percpu_ref_exit+0x50/0x50
       ? rcu_note_context_switch+0x7b0/0x7b0
       ? run_rebalance_domains+0x11d/0x140
       __do_softirq+0x10a/0x3e9
       irq_exit+0xd5/0xe0
       smp_apic_timer_interrupt+0x86/0x200
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:default_idle+0x26/0x1f0
      
      Fix this by punting the final exit and free of the struct to RCU, then
      we know that it's safe to do so. Jann suggested the approach of using a
      double rcu callback to achieve this. It's important that we do a nested
      call_rcu() callback, as otherwise the free could be ordered before the
      atomic switch, even if the latter was already queued.
      
      Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
      Suggested-by: NJann Horn <jannh@google.com>
      Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c1e2148f
    • Y
      locks: fix a potential use-after-free problem when wakeup a waiter · 6d390e4b
      yangerkun 提交于
      '16306a61 ("fs/locks: always delete_block after waiting.")' add the
      logic to check waiter->fl_blocker without blocked_lock_lock. And it will
      trigger a UAF when we try to wakeup some waiter:
      
      Thread 1 has create a write flock a on file, and now thread 2 try to
      unlock and delete flock a, thread 3 try to add flock b on the same file.
      
      Thread2                         Thread3
                                      flock syscall(create flock b)
      	                        ...flock_lock_inode_wait
      				    flock_lock_inode(will insert
      				    our fl_blocked_member list
      				    to flock a's fl_blocked_requests)
      				   sleep
      flock syscall(unlock)
      ...flock_lock_inode_wait
          locks_delete_lock_ctx
          ...__locks_wake_up_blocks
              __locks_delete_blocks(
      	b->fl_blocker = NULL)
      	...
                                         break by a signal
      				   locks_delete_block
      				    b->fl_blocker == NULL &&
      				    list_empty(&b->fl_blocked_requests)
      	                            success, return directly
      				 locks_free_lock b
      	wake_up(&b->fl_waiter)
      	trigger UAF
      
      Fix it by remove this logic, and this patch may also fix CVE-2019-19769.
      
      Cc: stable@vger.kernel.org
      Fixes: 16306a61 ("fs/locks: always delete_block after waiting.")
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      6d390e4b
  8. 06 3月, 2020 2 次提交
  9. 03 3月, 2020 3 次提交
    • K
      fcntl: Distribute switch variables for initialization · 0a68ff5e
      Kees Cook 提交于
      Variables declared in a switch statement before any case statements
      cannot be automatically initialized with compiler instrumentation (as
      they are not part of any execution flow). With GCC's proposed automatic
      stack variable initialization feature, this triggers a warning (and they
      don't get initialized). Clang's automatic stack variable initialization
      (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
      doesn't initialize such variables[1]. Note that these warnings (or silent
      skipping) happen before the dead-store elimination optimization phase,
      so even when the automatic initializations are later elided in favor of
      direct initializations, the warnings remain.
      
      To avoid these problems, move such variables into the "case" where
      they're used or lift them up into the main function body.
      
      fs/fcntl.c: In function ‘send_sigio_to_task’:
      fs/fcntl.c:738:20: warning: statement will never be executed [-Wswitch-unreachable]
        738 |   kernel_siginfo_t si;
            |                    ^~
      
      [1] https://bugs.llvm.org/show_bug.cgi?id=44916Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      0a68ff5e
    • O
      btrfs: fix RAID direct I/O reads with alternate csums · e7a04894
      Omar Sandoval 提交于
      btrfs_lookup_and_bind_dio_csum() does pointer arithmetic which assumes
      32-bit checksums. If using a larger checksum, this leads to spurious
      failures when a direct I/O read crosses a stripe. This is easy
      to reproduce:
      
        # mkfs.btrfs -f --checksum blake2 -d raid0 /dev/vdc /dev/vdd
        ...
        # mount /dev/vdc /mnt
        # cd /mnt
        # dd if=/dev/urandom of=foo bs=1M count=1 status=none
        # dd if=foo of=/dev/null bs=1M iflag=direct status=none
        dd: error reading 'foo': Input/output error
        # dmesg | tail -1
        [  135.821568] BTRFS warning (device vdc): csum failed root 5 ino 257 off 421888 ...
      
      Fix it by using the actual checksum size.
      
      Fixes: 1e25a2e3 ("btrfs: don't assume ordered sums to be 4 bytes")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e7a04894
    • P
      io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL · 80ad8943
      Pavel Begunkov 提交于
      io_wq_flush() is buggy, during cancelation of a flush, the associated
      work may be passed to the caller's (i.e. io_uring) @match callback. That
      callback is expecting it to be embedded in struct io_kiocb. Cancelation
      of internal work probably doesn't make a lot of sense to begin with.
      
      As the flush helper is no longer used, just delete it and the associated
      work flag.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      80ad8943
  10. 02 3月, 2020 1 次提交
  11. 01 3月, 2020 2 次提交
    • D
      ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() · 37b0b6b8
      Dan Carpenter 提交于
      If sbi->s_flex_groups_allocated is zero and the first allocation fails
      then this code will crash.  The problem is that "i--" will set "i" to
      -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
      is type promoted to unsigned and becomes UINT_MAX.  Since UINT_MAX
      is more than zero, the condition is true so we call kvfree(new_groups[-1]).
      The loop will carry on freeing invalid memory until it crashes.
      
      Fixes: 7c990728 ("ext4: fix potential race between s_flex_groups online resizing and access")
      Reviewed-by: NSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountainSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      37b0b6b8
    • Q
      jbd2: fix data races at struct journal_head · 6c5d9112
      Qian Cai 提交于
      journal_head::b_transaction and journal_head::b_next_transaction could
      be accessed concurrently as noticed by KCSAN,
      
       LTP: starting fsync04
       /dev/zero: Can't open blockdev
       EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
       EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
       ==================================================================
       BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
      
       write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
        __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
        __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
        jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
        (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
        kjournald2+0x13b/0x450 [jbd2]
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
        jbd2_write_access_granted+0x1b2/0x250 [jbd2]
        jbd2_write_access_granted at fs/jbd2/transaction.c:1155
        jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
        __ext4_journal_get_write_access+0x50/0x90 [ext4]
        ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
        ext4_mb_new_blocks+0x54f/0xca0 [ext4]
        ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
        ext4_map_blocks+0x3b4/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block+0x3b/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       5 locks held by fsync04/25724:
        #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
        #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
        #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
        #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
       irq event stamp: 1407125
       hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The plain reads are outside of jh->b_state_lock critical section which result
      in data races. Fix them by adding pairs of READ|WRITE_ONCE().
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      6c5d9112
  12. 28 2月, 2020 1 次提交
  13. 27 2月, 2020 2 次提交
    • T
      io_uring: define and set show_fdinfo only if procfs is enabled · bebdb65e
      Tobias Klauser 提交于
      Follow the pattern used with other *_show_fdinfo functions and only
      define and use io_uring_show_fdinfo and its helper functions if
      CONFIG_PROC_FS is set.
      
      Fixes: 87ce955b ("io_uring: add ->show_fdinfo() for the io_uring file descriptor")
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bebdb65e
    • J
      io_uring: drop file set ref put/get on switch · dd3db2a3
      Jens Axboe 提交于
      Dan reports that he triggered a warning on ring exit doing some testing:
      
      percpu ref (io_file_data_ref_zero) <= 0 (0) after switching to atomic
      WARNING: CPU: 3 PID: 0 at lib/percpu-refcount.c:160 percpu_ref_switch_to_atomic_rcu+0xe8/0xf0
      Modules linked in:
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-rc3+ #5648
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:percpu_ref_switch_to_atomic_rcu+0xe8/0xf0
      Code: e7 ff 55 e8 eb d2 80 3d bd 02 d2 00 00 75 8b 48 8b 55 d8 48 c7 c7 e8 70 e6 81 c6 05 a9 02 d2 00 01 48 8b 75 e8 e8 3a d0 c5 ff <0f> 0b e9 69 ff ff ff 90 55 48 89 fd 53 48 89 f3 48 83 ec 28 48 83
      RSP: 0018:ffffc90000110ef8 EFLAGS: 00010292
      RAX: 0000000000000045 RBX: 7fffffffffffffff RCX: 0000000000000000
      RDX: 0000000000000045 RSI: ffffffff825be7a5 RDI: ffffffff825bc32c
      RBP: ffff8881b75eac38 R08: 000000042364b941 R09: 0000000000000045
      R10: ffffffff825beb40 R11: ffffffff825be78a R12: 0000607e46005aa0
      R13: ffff888107dcdd00 R14: 0000000000000000 R15: 0000000000000009
      FS:  0000000000000000(0000) GS:ffff8881b9d80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f49e6a5ea20 CR3: 00000001b747c004 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       rcu_core+0x1e4/0x4d0
       __do_softirq+0xdb/0x2f1
       irq_exit+0xa0/0xb0
       smp_apic_timer_interrupt+0x60/0x140
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:default_idle+0x23/0x170
      Code: ff eb ab cc cc cc cc 0f 1f 44 00 00 41 54 55 53 65 8b 2d 10 96 92 7e 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d 21 d0 51 00 fb f4 <65> 8b 2d f6 95 92 7e 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 e5 95
      
      Turns out that this is due to percpu_ref_switch_to_atomic() only
      grabbing a reference to the percpu refcount if it's not already in
      atomic mode. io_uring drops a ref and re-gets it when switching back to
      percpu mode. We attempt to protect against this with the FFD_F_ATOMIC
      bit, but that isn't reliable.
      
      We don't actually need to juggle these refcounts between atomic and
      percpu switch, we can just do them when we've switched to atomic mode.
      This removes the need for FFD_F_ATOMIC, which wasn't reliable.
      
      Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
      Reported-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dd3db2a3
  14. 26 2月, 2020 5 次提交