1. 21 6月, 2020 2 次提交
    • D
      afs: Fix hang on rmmod due to outstanding timer · 5481fc6e
      David Howells 提交于
      The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
      the kafs module is being removed and so the count it holds on
      net->servers_outstanding doesn't get dropped..
      
      This causes rmmod to wait forever.  The hung process shows a stack like:
      
      	afs_purge_servers+0x1b5/0x23c [kafs]
      	afs_net_exit+0x44/0x6e [kafs]
      	ops_exit_list+0x72/0x93
      	unregister_pernet_operations+0x14c/0x1ba
      	unregister_pernet_subsys+0x1d/0x2a
      	afs_exit+0x29/0x6f [kafs]
      	__do_sys_delete_module.isra.0+0x1a2/0x24b
      	do_syscall_64+0x51/0x95
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix this by:
      
       (1) Attempting to cancel the probe timer and, if successful, drop the
           count that the timer was holding.
      
       (2) Make the timer function just drop the count and not schedule the
           prober if the afs portion of net namespace is being destroyed.
      
      Also, whilst we're at it, make the following changes:
      
       (3) Initialise net->servers_outstanding to 1 and decrement it before
           waiting on it so that it doesn't generate wake up events by being
           decremented to 0 until we're cleaning up.
      
       (4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
           afs_purge_servers() to use the helper function for that.
      
      Fixes: f6cbb368 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5481fc6e
    • D
      afs: Fix afs_do_lookup() to call correct fetch-status op variant · f8ea5c7b
      David Howells 提交于
      Fix afs_do_lookup()'s fallback case for when FS.InlineBulkStatus isn't
      supported by the server.
      
      In the fallback, it calls FS.FetchStatus for the specific vnode it's
      meant to be looking up.  Commit b6489a49 broke this by renaming one
      of the two identically-named afs_fetch_status_operation descriptors to
      something else so that one of them could be made non-static.  The site
      that used the renamed one, however, wasn't renamed and didn't produce
      any warning because the other was declared in a header.
      
      Fix this by making afs_do_lookup() use the renamed variant.
      
      Note that there are two variants of the success method because one is
      called from ->lookup() where we may or may not have an inode, but can't
      call iget until after we've talked to the server - whereas the other is
      called from within iget where we have an inode, but it may or may not be
      initialised.
      
      The latter variant expects there to be an inode, but because it's being
      called from there former case, there might not be - resulting in an oops
      like the following:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000b0
        ...
        RIP: 0010:afs_fetch_status_success+0x27/0x7e
        ...
        Call Trace:
          afs_wait_for_operation+0xda/0x234
          afs_do_lookup+0x2fe/0x3c1
          afs_lookup+0x3c5/0x4bd
          __lookup_slow+0xcd/0x10f
          walk_component+0xa2/0x10c
          path_lookupat.isra.0+0x80/0x110
          filename_lookup+0x81/0x104
          vfs_statx+0x76/0x109
          __do_sys_newlstat+0x39/0x6b
          do_syscall_64+0x4c/0x78
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: b6489a49 ("afs: Fix silly rename")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8ea5c7b
  2. 18 6月, 2020 7 次提交
  3. 17 6月, 2020 3 次提交
    • M
      proc/bootconfig: Fix to use correct quotes for value · 4e264ffd
      Masami Hiramatsu 提交于
      Fix /proc/bootconfig to select double or single quotes
      corrctly according to the value.
      
      If a bootconfig value includes a double quote character,
      we must use single-quotes to quote that value.
      
      This modifies if() condition and blocks for avoiding
      double-quote in value check in 2 places. Anyway, since
      xbc_array_for_each_value() can handle the array which
      has a single node correctly.
      Thus,
      
      if (vnode && xbc_node_is_array(vnode)) {
      	xbc_array_for_each_value(vnode)	/* vnode->next != NULL */
      		...
      } else {
      	snprintf(val); /* val is an empty string if !vnode */
      }
      
      is equivalent to
      
      if (vnode) {
      	xbc_array_for_each_value(vnode)	/* vnode->next can be NULL */
      		...
      } else {
      	snprintf("");	/* value is always empty */
      }
      
      Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2
      
      Cc: stable@vger.kernel.org
      Fixes: c1a3c360 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      4e264ffd
    • D
      afs: Fix silly rename · b6489a49
      David Howells 提交于
      Fix AFS's silly rename by the following means:
      
       (1) Set the destination directory in afs_do_silly_rename() so as to avoid
           misbehaviour and indicate that the directory data version will
           increment by 1 so as to avoid warnings about unexpected changes in the
           DV.  Also indicate that the ctime should be updated to avoid xfstest
           grumbling.
      
       (2) Note when the server indicates that a directory changed more than we
           expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
           third party change, checking on successful completion of unlink and
           rename.
      
           The problem is that the FS.RemoveFile RPC op doesn't report the status
           of the unlinked file, though YFS.RemoveFile2 does.  This can be
           mitigated by the assumption that if the directory DV cranked by
           exactly 1, we can be sure we removed one link from the file; further,
           ordinarily in AFS, files cannot be hardlinked across directories, so
           if we reduce nlink to 0, the file is deleted.
      
           However, if the directory DV jumps by more than 1, we cannot know if a
           third party intervened by adding or removing a link on the file we
           just removed a link from.
      
           The same also goes for any vnode that is at the destination of the
           FS.Rename RPC op.
      
       (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
           section along with the other attribute updates if ->op_unlinked is set
           on the descriptor for the appropriate vnode.
      
       (4) Issue a follow up status fetch to the unlinked file in the event of a
           third party conflict that makes it impossible for us to know if we
           actually deleted the file or not.
      
       (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
           the user about the nlink of a silly deleted file so that it appears as
           0, not 1.
      
      Found with the generic/035 and generic/084 xfstests.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b6489a49
    • J
      block: Fix use-after-free in blkdev_get() · 2d3a8e2d
      Jason Yan 提交于
      In blkdev_get() we call __blkdev_get() to do some internal jobs and if
      there is some errors in __blkdev_get(), the bdput() is called which
      means we have released the refcount of the bdev (actually the refcount of
      the bdev inode). This means we cannot access bdev after that point. But
      acctually bdev is still accessed in blkdev_get() after calling
      __blkdev_get(). This results in use-after-free if the refcount is the
      last one we released in __blkdev_get(). Let's take a look at the
      following scenerio:
      
        CPU0            CPU1                    CPU2
      blkdev_open     blkdev_open           Remove disk
                        bd_acquire
      		  blkdev_get
      		    __blkdev_get      del_gendisk
      					bdev_unhash_inode
        bd_acquire          bdev_get_gendisk
          bd_forget           failed because of unhashed
      	  bdput
      	              bdput (the last one)
      		        bdev_evict_inode
      
      	  	    access bdev => use after free
      
      [  459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
      [  459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
      [  459.352347]
      [  459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
      [  459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  459.354947] Call Trace:
      [  459.355337]  dump_stack+0x111/0x19e
      [  459.355879]  ? __lock_acquire+0x24c1/0x31b0
      [  459.356523]  print_address_description+0x60/0x223
      [  459.357248]  ? __lock_acquire+0x24c1/0x31b0
      [  459.357887]  kasan_report.cold+0xae/0x2d8
      [  459.358503]  __lock_acquire+0x24c1/0x31b0
      [  459.359120]  ? _raw_spin_unlock_irq+0x24/0x40
      [  459.359784]  ? lockdep_hardirqs_on+0x37b/0x580
      [  459.360465]  ? _raw_spin_unlock_irq+0x24/0x40
      [  459.361123]  ? finish_task_switch+0x125/0x600
      [  459.361812]  ? finish_task_switch+0xee/0x600
      [  459.362471]  ? mark_held_locks+0xf0/0xf0
      [  459.363108]  ? __schedule+0x96f/0x21d0
      [  459.363716]  lock_acquire+0x111/0x320
      [  459.364285]  ? blkdev_get+0xce/0xbe0
      [  459.364846]  ? blkdev_get+0xce/0xbe0
      [  459.365390]  __mutex_lock+0xf9/0x12a0
      [  459.365948]  ? blkdev_get+0xce/0xbe0
      [  459.366493]  ? bdev_evict_inode+0x1f0/0x1f0
      [  459.367130]  ? blkdev_get+0xce/0xbe0
      [  459.367678]  ? destroy_inode+0xbc/0x110
      [  459.368261]  ? mutex_trylock+0x1a0/0x1a0
      [  459.368867]  ? __blkdev_get+0x3e6/0x1280
      [  459.369463]  ? bdev_disk_changed+0x1d0/0x1d0
      [  459.370114]  ? blkdev_get+0xce/0xbe0
      [  459.370656]  blkdev_get+0xce/0xbe0
      [  459.371178]  ? find_held_lock+0x2c/0x110
      [  459.371774]  ? __blkdev_get+0x1280/0x1280
      [  459.372383]  ? lock_downgrade+0x680/0x680
      [  459.373002]  ? lock_acquire+0x111/0x320
      [  459.373587]  ? bd_acquire+0x21/0x2c0
      [  459.374134]  ? do_raw_spin_unlock+0x4f/0x250
      [  459.374780]  blkdev_open+0x202/0x290
      [  459.375325]  do_dentry_open+0x49e/0x1050
      [  459.375924]  ? blkdev_get_by_dev+0x70/0x70
      [  459.376543]  ? __x64_sys_fchdir+0x1f0/0x1f0
      [  459.377192]  ? inode_permission+0xbe/0x3a0
      [  459.377818]  path_openat+0x148c/0x3f50
      [  459.378392]  ? kmem_cache_alloc+0xd5/0x280
      [  459.379016]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  459.379802]  ? path_lookupat.isra.0+0x900/0x900
      [  459.380489]  ? __lock_is_held+0xad/0x140
      [  459.381093]  do_filp_open+0x1a1/0x280
      [  459.381654]  ? may_open_dev+0xf0/0xf0
      [  459.382214]  ? find_held_lock+0x2c/0x110
      [  459.382816]  ? lock_downgrade+0x680/0x680
      [  459.383425]  ? __lock_is_held+0xad/0x140
      [  459.384024]  ? do_raw_spin_unlock+0x4f/0x250
      [  459.384668]  ? _raw_spin_unlock+0x1f/0x30
      [  459.385280]  ? __alloc_fd+0x448/0x560
      [  459.385841]  do_sys_open+0x3c3/0x500
      [  459.386386]  ? filp_open+0x70/0x70
      [  459.386911]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  459.387610]  ? trace_hardirqs_off_caller+0x55/0x1c0
      [  459.388342]  ? do_syscall_64+0x1a/0x520
      [  459.388930]  do_syscall_64+0xc3/0x520
      [  459.389490]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  459.390248] RIP: 0033:0x416211
      [  459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
      04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
         05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
            01
      [  459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
      [  459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
      [  459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
      [  459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
      [  459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
      [  459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
      [  459.400168]
      [  459.400430] Allocated by task 20132:
      [  459.401038]  kasan_kmalloc+0xbf/0xe0
      [  459.401652]  kmem_cache_alloc+0xd5/0x280
      [  459.402330]  bdev_alloc_inode+0x18/0x40
      [  459.402970]  alloc_inode+0x5f/0x180
      [  459.403510]  iget5_locked+0x57/0xd0
      [  459.404095]  bdget+0x94/0x4e0
      [  459.404607]  bd_acquire+0xfa/0x2c0
      [  459.405113]  blkdev_open+0x110/0x290
      [  459.405702]  do_dentry_open+0x49e/0x1050
      [  459.406340]  path_openat+0x148c/0x3f50
      [  459.406926]  do_filp_open+0x1a1/0x280
      [  459.407471]  do_sys_open+0x3c3/0x500
      [  459.408010]  do_syscall_64+0xc3/0x520
      [  459.408572]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  459.409415]
      [  459.409679] Freed by task 1262:
      [  459.410212]  __kasan_slab_free+0x129/0x170
      [  459.410919]  kmem_cache_free+0xb2/0x2a0
      [  459.411564]  rcu_process_callbacks+0xbb2/0x2320
      [  459.412318]  __do_softirq+0x225/0x8ac
      
      Fix this by delaying bdput() to the end of blkdev_get() which means we
      have finished accessing bdev.
      
      Fixes: 77ea887e ("implement in-kernel gendisk events handling")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d3a8e2d
  4. 16 6月, 2020 8 次提交
  5. 15 6月, 2020 13 次提交
    • P
      io_uring: cancel by ->task not pid · 801dd57b
      Pavel Begunkov 提交于
      For an exiting process it tries to cancel all its inflight requests. Use
      req->task to match such instead of work.pid. We always have req->task
      set, and it will be valid because we're matching only current exiting
      task.
      
      Also, remove work.pid and everything related, it's useless now.
      Reported-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      801dd57b
    • P
      io_uring: lazy get task · 4dd2824d
      Pavel Begunkov 提交于
      There will be multiple places where req->task is used, so refcount-pin
      it lazily with introduced *io_{get,put}_req_task(). We need to always
      have valid ->task for cancellation reasons, but don't care about pinning
      it in some cases. That's why it sets req->task in io_req_init() and
      implements get/put laziness with a flag.
      
      This also removes using @current from polling io_arm_poll_handler(),
      etc., but doesn't change observable behaviour.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4dd2824d
    • P
      io_uring: batch cancel in io_uring_cancel_files() · 67c4d9e6
      Pavel Begunkov 提交于
      Instead of waiting for each request one by one, first try to cancel all
      of them in a batched manner, and then go over inflight_list/etc to reap
      leftovers.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      67c4d9e6
    • P
      io_uring: cancel all task's requests on exit · 44e728b8
      Pavel Begunkov 提交于
      If a process is going away, io_uring_flush() will cancel only 1
      request with a matching pid. Cancel all of them
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      44e728b8
    • P
      io-wq: add an option to cancel all matched reqs · 4f26bda1
      Pavel Begunkov 提交于
      This adds support for cancelling all io-wq works matching a predicate.
      It isn't used yet, so no change in observable behaviour.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4f26bda1
    • P
      io-wq: reorder cancellation pending -> running · f4c2665e
      Pavel Begunkov 提交于
      Go all over all pending lists and cancel works there, and only then
      try to match running requests. No functional changes here, just a
      preparation for bulk cancellation.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f4c2665e
    • D
      afs: Fix the mapping of the UAEOVERFLOW abort code · 4ec89596
      David Howells 提交于
      Abort code UAEOVERFLOW is returned when we try and set a time that's out of
      range, but it's currently mapped to EREMOTEIO by the default case.
      
      Fix UAEOVERFLOW to map instead to EOVERFLOW.
      
      Found with the generic/258 xfstest.  Note that the test is wrong as it
      assumes that the filesystem will support a pre-UNIX-epoch date.
      
      Fixes: 1eda8bab ("afs: Add support for the UAE error table")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4ec89596
    • D
      afs: Fix truncation issues and mmap writeback size · 793fe82e
      David Howells 提交于
      Fix the following issues:
      
       (1) Fix writeback to reduce the size of a store operation to i_size,
           effectively discarding the extra data.
      
           The problem comes when afs_page_mkwrite() records that a page is about
           to be modified by mmap().  It doesn't know what bits of the page are
           going to be modified, so it records the whole page as being dirty
           (this is stored in page->private as start and end offsets).
      
           Without this, the marshalling for the store to the server extends the
           size of the file to the end of the page (in afs_fs_store_data() and
           yfs_fs_store_data()).
      
       (2) Fix setattr to actually truncate the pagecache, thereby clearing
           the discarded part of a file.
      
       (3) Fix setattr to check that the new size is okay and to disable
           ATTR_SIZE if i_size wouldn't change.
      
       (4) Force i_size to be updated as the result of a truncate.
      
       (5) Don't truncate if ATTR_SIZE is not set.
      
       (6) Call pagecache_isize_extended() if the file was enlarged.
      
      Note that truncate_set_size() isn't used because the setting of i_size is
      done inside afs_vnode_commit_status() under the vnode->cb_lock.
      
      Found with the generic/029 and generic/393 xfstests.
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      793fe82e
    • D
      afs: Concoct ctimes · da8d0755
      David Howells 提交于
      The in-kernel afs filesystem ignores ctime because the AFS fileserver
      protocol doesn't support ctimes.  This, however, causes various xfstests to
      fail.
      
      Work around this by:
      
       (1) Setting ctime to attr->ia_ctime in afs_setattr().
      
       (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.
      
       (3) Setting the ctime from the server mtime when on the target file when
           creating a hard link to it.
      
       (4) Setting the ctime on directories from their revised mtimes when
           renaming/moving a file.
      
      Found by the generic/221 and generic/309 xfstests.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      da8d0755
    • D
      afs: Fix EOF corruption · 3f4aa981
      David Howells 提交于
      When doing a partial writeback, afs_write_back_from_locked_page() may
      generate an FS.StoreData RPC request that writes out part of a file when a
      file has been constructed from pieces by doing seek, write, seek, write,
      ... as is done by ld.
      
      The FS.StoreData RPC is given the current i_size as the file length, but
      the server basically ignores it unless the data length is 0 (in which case
      it's just a truncate operation).  The revised file length returned in the
      result of the RPC may then not reflect what we suggested - and this leads
      to i_size getting moved backwards - which causes issues later.
      
      Fix the client to take account of this by ignoring the returned file size
      unless the data version number jumped unexpectedly - in which case we're
      going to have to clear the pagecache and reload anyway.
      
      This can be observed when doing a kernel build on an AFS mount.  The
      following pair of commands produce the issue:
      
        ld -m elf_x86_64 -z max-page-size=0x200000 --emit-relocs \
            -T arch/x86/realmode/rm/realmode.lds \
            arch/x86/realmode/rm/header.o \
            arch/x86/realmode/rm/trampoline_64.o \
            arch/x86/realmode/rm/stack.o \
            arch/x86/realmode/rm/reboot.o \
            -o arch/x86/realmode/rm/realmode.elf
        arch/x86/tools/relocs --realmode \
            arch/x86/realmode/rm/realmode.elf \
            >arch/x86/realmode/rm/realmode.relocs
      
      This results in the latter giving:
      
      	Cannot read ELF section headers 0/18: Success
      
      as the realmode.elf file got corrupted.
      
      The sequence of events can also be driven with:
      
      	xfs_io -t -f \
      		-c "pwrite -S 0x58 0 0x58" \
      		-c "pwrite -S 0x59 10000 1000" \
      		-c "close" \
      		/afs/example.com/scratch/a
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3f4aa981
    • D
      afs: afs_write_end() should change i_size under the right lock · 1f32ef79
      David Howells 提交于
      Fix afs_write_end() to change i_size under vnode->cb_lock rather than
      ->wb_lock so that it doesn't race with afs_vnode_commit_status() and
      afs_getattr().
      
      The ->wb_lock is only meant to guard access to ->wb_keys which isn't
      accessed by that piece of code.
      
      Fixes: 4343d008 ("afs: Get rid of the afs_writeback record")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1f32ef79
    • D
      afs: Fix non-setting of mtime when writing into mmap · bb413489
      David Howells 提交于
      The mtime on an inode needs to be updated when a write is made into an
      mmap'ed section.  There are three ways in which this could be done: update
      it when page_mkwrite is called, update it when a page is changed from dirty
      to writeback or leave it to the server and fix the mtime up from the reply
      to the StoreData RPC.
      
      Found with the generic/215 xfstest.
      
      Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bb413489
    • P
      io_uring: fix lazy work init · 59960b9d
      Pavel Begunkov 提交于
      Don't leave garbage in req.work before punting async on -EAGAIN
      in io_iopoll_queue().
      
      [  140.922099] general protection fault, probably for non-canonical
           address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
      ...
      [  140.922105] RIP: 0010:io_worker_handle_work+0x1db/0x480
      ...
      [  140.922114] Call Trace:
      [  140.922118]  ? __next_timer_interrupt+0xe0/0xe0
      [  140.922119]  io_wqe_worker+0x2a9/0x360
      [  140.922121]  ? _raw_spin_unlock_irqrestore+0x24/0x40
      [  140.922124]  kthread+0x12c/0x170
      [  140.922125]  ? io_worker_handle_work+0x480/0x480
      [  140.922126]  ? kthread_park+0x90/0x90
      [  140.922127]  ret_from_fork+0x22/0x30
      
      Fixes: 7cdaf587 ("io_uring: avoid whole io_wq_work copy for requests completed inline")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59960b9d
  6. 14 6月, 2020 2 次提交
    • D
      Revert "btrfs: switch to iomap_dio_rw() for dio" · 55e20bd1
      David Sterba 提交于
      This reverts commit a43a67a2.
      
      This patch reverts the main part of switching direct io implementation
      to iomap infrastructure. There's a problem in invalidate page that
      couldn't be solved as regression in this development cycle.
      
      The problem occurs when buffered and direct io are mixed, and the ranges
      overlap. Although this is not recommended, filesystems implement
      measures or fallbacks to make it somehow work. In this case, fallback to
      buffered IO would be an option for btrfs (this already happens when
      direct io is done on compressed data), but the change would be needed in
      the iomap code, bringing new semantics to other filesystems.
      
      Another problem arises when again the buffered and direct ios are mixed,
      invalidation fails, then -EIO is set on the mapping and fsync will fail,
      though there's no real error.
      
      There have been discussions how to fix that, but revert seems to be the
      least intrusive option.
      
      Link: https://lore.kernel.org/linux-btrfs/20200528192103.xm45qoxqmkw7i5yl@fiona/Signed-off-by: NDavid Sterba <dsterba@suse.com>
      55e20bd1
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  7. 13 6月, 2020 5 次提交
    • S
      smb3: Add debug message for new file creation with idsfromsid mount option · a7a519a4
      Steve French 提交于
      Pavel noticed that a debug message (disabled by default) in creating the security
      descriptor context could be useful for new file creation owner fields
      (as we already have for the mode) when using mount parm idsfromsid.
      
      [38120.392272] CIFS: FYI: owner S-1-5-88-1-0, group S-1-5-88-2-0
      [38125.792637] CIFS: FYI: owner S-1-5-88-1-1000, group S-1-5-88-2-1000
      
      Also cleans up a typo in a comment
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      a7a519a4
    • E
      proc: Use new_inode not new_inode_pseudo · ef1548ad
      Eric W. Biederman 提交于
      Recently syzbot reported that unmounting proc when there is an ongoing
      inotify watch on the root directory of proc could result in a use
      after free when the watch is removed after the unmount of proc
      when the watcher exits.
      
      Commit 69879c01 ("proc: Remove the now unnecessary internal mount
      of proc") made it easier to unmount proc and allowed syzbot to see the
      problem, but looking at the code it has been around for a long time.
      
      Looking at the code the fsnotify watch should have been removed by
      fsnotify_sb_delete in generic_shutdown_super.  Unfortunately the inode
      was allocated with new_inode_pseudo instead of new_inode so the inode
      was not on the sb->s_inodes list.  Which prevented
      fsnotify_unmount_inodes from finding the inode and removing the watch
      as well as made it so the "VFS: Busy inodes after unmount" warning
      could not find the inodes to warn about them.
      
      Make all of the inodes in proc visible to generic_shutdown_super,
      and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
      The only functional difference is that new_inode places the inodes
      on the sb->s_inodes list.
      
      I wrote a small test program and I can verify that without changes it
      can trigger this issue, and by replacing new_inode_pseudo with
      new_inode the issues goes away.
      
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com
      Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com
      Fixes: 0097875b ("proc: Implement /proc/thread-self to point at the directory of the current thread")
      Fixes: 021ada7d ("procfs: switch /proc/self away from proc_dir_entry")
      Fixes: 51f0885e ("vfs,proc: guarantee unique inodes in /proc")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ef1548ad
    • Z
      ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers · 7b97d868
      zhangyi (F) 提交于
      In the ext4 filesystem with errors=panic, if one process is recording
      errno in the superblock when invoking jbd2_journal_abort() due to some
      error cases, it could be raced by another __ext4_abort() which is
      setting the SB_RDONLY flag but missing panic because errno has not been
      recorded.
      
      jbd2_journal_commit_transaction()
       jbd2_journal_abort()
        journal->j_flags |= JBD2_ABORT;
        jbd2_journal_update_sb_errno()
                                          | ext4_journal_check_start()
                                          |  __ext4_abort()
                                          |   sb->s_flags |= SB_RDONLY;
                                          |   if (!JBD2_REC_ERR)
                                          |        return;
        journal->j_flags |= JBD2_REC_ERR;
      
      Finally, it will no longer trigger panic because the filesystem has
      already been set read-only. Fix this by introduce j_abort_mutex to make
      sure journal abort is completed before panic, and remove JBD2_REC_ERR
      flag.
      
      Fixes: 4327ba52 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      7b97d868
    • S
      cifs: fix chown and chgrp when idsfromsid mount option enabled · a6603398
      Steve French 提交于
      idsfromsid was ignored in chown and chgrp causing it to fail
      when upcalls were not configured for lookup.  idsfromsid allows
      mapping users when setting user or group ownership using
      "special SID" (reserved for this).  Add support for chmod and chgrp
      when idsfromsid mount option is enabled.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      a6603398
    • S
      smb3: allow uid and gid owners to be set on create with idsfromsid mount option · 975221ec
      Steve French 提交于
      Currently idsfromsid mount option allows querying owner information from the
      special sids used to represent POSIX uids and gids but needed changes to
      populate the security descriptor context with the owner information when
      idsfromsid mount option was used.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      975221ec