1. 25 11月, 2019 23 次提交
  2. 24 11月, 2019 1 次提交
  3. 23 11月, 2019 3 次提交
  4. 20 11月, 2019 1 次提交
    • D
      afs: Fix missing timeout reset · c74386d5
      David Howells 提交于
      In afs_wait_for_call_to_complete(), rather than immediately aborting an
      operation if a signal occurs, the code attempts to wait for it to
      complete, using a schedule timeout of 2*RTT (or min 2 jiffies) and a
      check that we're still receiving relevant packets from the server before
      we consider aborting the call.  We may even ping the server to check on
      the status of the call.
      
      However, there's a missing timeout reset in the event that we do
      actually get a packet to process, such that if we then get a couple of
      short stalls, we then time out when progress is actually being made.
      
      Fix this by resetting the timeout any time we get something to process.
      If it's the failure of the call then the call state will get changed and
      we'll exit the loop shortly thereafter.
      
      A symptom of this is data fetches and stores failing with EINTR when
      they really shouldn't.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c74386d5
  5. 16 11月, 2019 1 次提交
    • D
      afs: Fix race in commit bulk status fetch · a28f239e
      David Howells 提交于
      When a lookup is done, the afs filesystem will perform a bulk status-fetch
      operation on the requested vnode (file) plus the next 49 other vnodes from
      the directory list (in AFS, directory contents are downloaded as blobs and
      parsed locally).  When the results are received, it will speculatively
      populate the inode cache from the extra data.
      
      However, if the lookup races with another lookup on the same directory, but
      for a different file - one that's in the 49 extra fetches, then if the bulk
      status-fetch operation finishes first, it will try and update the inode
      from the other lookup.
      
      If this other inode is still in the throes of being created, however, this
      will cause an assertion failure in afs_apply_status():
      
      	BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));
      
      on or about fs/afs/inode.c:175 because it expects data to be there already
      that it can compare to.
      
      Fix this by skipping the update if the inode is being created as the
      creator will presumably set up the inode with the same information.
      
      Fixes: 39db9815 ("afs: Fix application of the results of a inline bulk status fetch")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a28f239e
  6. 15 11月, 2019 2 次提交
    • J
      ceph: increment/decrement dio counter on async requests · 6a81749e
      Jeff Layton 提交于
      Ceph can in some cases issue an async DIO request, in which case we can
      end up calling ceph_end_io_direct before the I/O is actually complete.
      That may allow buffered operations to proceed while DIO requests are
      still in flight.
      
      Fix this by incrementing the i_dio_count when issuing an async DIO
      request, and decrement it when tearing down the aio_req.
      
      Fixes: 321fe13c ("ceph: add buffered/direct exclusionary locking for reads and writes")
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      6a81749e
    • J
      ceph: take the inode lock before acquiring cap refs · a81bc310
      Jeff Layton 提交于
      Most of the time, we (or the vfs layer) takes the inode_lock and then
      acquires caps, but ceph_read_iter does the opposite, and that can lead
      to a deadlock.
      
      When there are multiple clients treading over the same data, we can end
      up in a situation where a reader takes caps and then tries to acquire
      the inode_lock. Another task holds the inode_lock and issues a request
      to the MDS which needs to revoke the caps, but that can't happen until
      the inode_lock is unwedged.
      
      Fix this by having ceph_read_iter take the inode_lock earlier, before
      attempting to acquire caps.
      
      Fixes: 321fe13c ("ceph: add buffered/direct exclusionary locking for reads and writes")
      Link: https://tracker.ceph.com/issues/36348Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a81bc310
  7. 14 11月, 2019 2 次提交
  8. 12 11月, 2019 2 次提交
    • J
      io_uring: make timeout sequence == 0 mean no sequence · 93bd25bb
      Jens Axboe 提交于
      Currently we make sequence == 0 be the same as sequence == 1, but that's
      not super useful if the intent is really to have a timeout that's just
      a pure timeout.
      
      If the user passes in sqe->off == 0, then don't apply any sequence logic
      to the request, let it purely be driven by the timeout specified.
      Reported-by: N李通洲 <carter.li@eoitek.com>
      Reviewed-by: N李通洲 <carter.li@eoitek.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      93bd25bb
    • F
      Btrfs: fix log context list corruption after rename exchange operation · e6c61710
      Filipe Manana 提交于
      During rename exchange we might have successfully log the new name in the
      source root's log tree, in which case we leave our log context (allocated
      on stack) in the root's list of log contextes. However we might fail to
      log the new name in the destination root, in which case we fallback to
      a transaction commit later and never sync the log of the source root,
      which causes the source root log context to remain in the list of log
      contextes. This later causes invalid memory accesses because the context
      was allocated on stack and after rename exchange finishes the stack gets
      reused and overwritten for other purposes.
      
      The kernel's linked list corruption detector (CONFIG_DEBUG_LIST=y) can
      detect this and report something like the following:
      
        [  691.489929] ------------[ cut here ]------------
        [  691.489947] list_add corruption. prev->next should be next (ffff88819c944530), but was ffff8881c23f7be4. (prev=ffff8881c23f7a38).
        [  691.489967] WARNING: CPU: 2 PID: 28933 at lib/list_debug.c:28 __list_add_valid+0x95/0xe0
        (...)
        [  691.489998] CPU: 2 PID: 28933 Comm: fsstress Not tainted 5.4.0-rc6-btrfs-next-62 #1
        [  691.490001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [  691.490003] RIP: 0010:__list_add_valid+0x95/0xe0
        (...)
        [  691.490007] RSP: 0018:ffff8881f0b3faf8 EFLAGS: 00010282
        [  691.490010] RAX: 0000000000000000 RBX: ffff88819c944530 RCX: 0000000000000000
        [  691.490011] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa2c497e0
        [  691.490013] RBP: ffff8881f0b3fe68 R08: ffffed103eaa4115 R09: ffffed103eaa4114
        [  691.490015] R10: ffff88819c944000 R11: ffffed103eaa4115 R12: 7fffffffffffffff
        [  691.490016] R13: ffff8881b4035610 R14: ffff8881e7b84728 R15: 1ffff1103e167f7b
        [  691.490019] FS:  00007f4b25ea2e80(0000) GS:ffff8881f5500000(0000) knlGS:0000000000000000
        [  691.490021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [  691.490022] CR2: 00007fffbb2d4eec CR3: 00000001f2a4a004 CR4: 00000000003606e0
        [  691.490025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [  691.490027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [  691.490029] Call Trace:
        [  691.490058]  btrfs_log_inode_parent+0x667/0x2730 [btrfs]
        [  691.490083]  ? join_transaction+0x24a/0xce0 [btrfs]
        [  691.490107]  ? btrfs_end_log_trans+0x80/0x80 [btrfs]
        [  691.490111]  ? dget_parent+0xb8/0x460
        [  691.490116]  ? lock_downgrade+0x6b0/0x6b0
        [  691.490121]  ? rwlock_bug.part.0+0x90/0x90
        [  691.490127]  ? do_raw_spin_unlock+0x142/0x220
        [  691.490151]  btrfs_log_dentry_safe+0x65/0x90 [btrfs]
        [  691.490172]  btrfs_sync_file+0x9f1/0xc00 [btrfs]
        [  691.490195]  ? btrfs_file_write_iter+0x1800/0x1800 [btrfs]
        [  691.490198]  ? rcu_read_lock_any_held.part.11+0x20/0x20
        [  691.490204]  ? __do_sys_newstat+0x88/0xd0
        [  691.490207]  ? cp_new_stat+0x5d0/0x5d0
        [  691.490218]  ? do_fsync+0x38/0x60
        [  691.490220]  do_fsync+0x38/0x60
        [  691.490224]  __x64_sys_fdatasync+0x32/0x40
        [  691.490228]  do_syscall_64+0x9f/0x540
        [  691.490233]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [  691.490235] RIP: 0033:0x7f4b253ad5f0
        (...)
        [  691.490239] RSP: 002b:00007fffbb2d6078 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
        [  691.490242] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4b253ad5f0
        [  691.490244] RDX: 00007fffbb2d5fe0 RSI: 00007fffbb2d5fe0 RDI: 0000000000000003
        [  691.490245] RBP: 000000000000000d R08: 0000000000000001 R09: 00007fffbb2d608c
        [  691.490247] R10: 00000000000002e8 R11: 0000000000000246 R12: 00000000000001f4
        [  691.490248] R13: 0000000051eb851f R14: 00007fffbb2d6120 R15: 00005635a498bda0
      
      This started happening recently when running some test cases from fstests
      like btrfs/004 for example, because support for rename exchange was added
      last week to fsstress from fstests.
      
      So fix this by deleting the log context for the source root from the list
      if we have logged the new name in the source root.
      Reported-by: NSu Yue <Damenly_Su@gmx.com>
      Fixes: d4682ba0 ("Btrfs: sync log after logging new name")
      CC: stable@vger.kernel.org # 4.19+
      Tested-by: NSu Yue <Damenly_Su@gmx.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e6c61710
  9. 11 11月, 2019 4 次提交
    • A
      ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either · 762c6968
      Al Viro 提交于
      We need to get the underlying dentry of parent; sure, absent the races
      it is the parent of underlying dentry, but there's nothing to prevent
      losing a timeslice to preemtion in the middle of evaluation of
      lower_dentry->d_parent->d_inode, having another process move lower_dentry
      around and have its (ex)parent not pinned anymore and freed on memory
      pressure.  Then we regain CPU and try to fetch ->d_inode from memory
      that is freed by that point.
      
      dentry->d_parent *is* stable here - it's an argument of ->lookup() and
      we are guaranteed that it won't be moved anywhere until we feed it
      to d_add/d_splice_alias.  So we safely go that way to get to its
      underlying dentry.
      
      Cc: stable@vger.kernel.org # since 2009 or so
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      762c6968
    • A
      ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable · e72b9dd6
      Al Viro 提交于
      lower_dentry can't go from positive to negative (we have it pinned),
      but it *can* go from negative to positive.  So fetching ->d_inode
      into a local variable, doing a blocking allocation, checking that
      now ->d_inode is non-NULL and feeding the value we'd fetched
      earlier to a function that won't accept NULL is not a good idea.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e72b9dd6
    • A
      ecryptfs: fix unlink and rmdir in face of underlying fs modifications · bcf0d9d4
      Al Viro 提交于
      A problem similar to the one caught in commit 74dd7c97 ("ecryptfs_rename():
      verify that lower dentries are still OK after lock_rename()") exists for
      unlink/rmdir as well.
      
      Instead of playing with dget_parent() of underlying dentry of victim
      and hoping it's the same as underlying dentry of our directory,
      do the following:
              * find the underlying dentry of victim
              * find the underlying directory of victim's parent (stable
      since the victim is ecryptfs dentry and inode of its parent is
      held exclusive by the caller).
              * lock the inode of dentry underlying the victim's parent
              * check that underlying dentry of victim is still hashed and
      has the right parent - it can be moved, but it can't be moved to/from
      the directory we are holding exclusive.  So while ->d_parent itself
      might not be stable, the result of comparison is.
      
      If the check passes, everything is fine - underlying directory is locked,
      underlying victim is still a child of that directory and we can go ahead
      and feed them to vfs_unlink().  As in the current mainline we need to
      pin the underlying dentry of victim, so that it wouldn't go negative under
      us, but that's the only temporary reference that needs to be grabbed there.
      Underlying dentry of parent won't go away (it's pinned by the parent,
      which is held by caller), so there's no need to grab it.
      
      The same problem (with the same solution) exists for rmdir.  Moreover,
      rename gets simpler and more robust with the same "don't bother with
      dget_parent()" approach.
      
      Fixes: 74dd7c97 "ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()"
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bcf0d9d4
    • A
  10. 09 11月, 2019 1 次提交