1. 26 9月, 2019 1 次提交
  2. 24 9月, 2019 2 次提交
  3. 19 9月, 2019 6 次提交
    • J
      io_uring: IORING_OP_TIMEOUT support · 5262f567
      Jens Axboe 提交于
      There's been a few requests for functionality similar to io_getevents()
      and epoll_wait(), where the user can specify a timeout for waiting on
      events. I deliberately did not add support for this through the system
      call initially to avoid overloading the args, but I can see that the use
      cases for this are valid.
      
      This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
      when waiting for events, simply submit one of these timeout commands
      with your wait call (or before). This ensures that the application
      sleeping on the CQ ring waiting for events will get woken. The timeout
      command is passed in as a pointer to a struct timespec. Timeouts are
      relative. The timeout command also includes a way to auto-cancel after
      N events has passed.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5262f567
    • J
      io_uring: use cond_resched() in sqthread · 9831a90c
      Jens Axboe 提交于
      If preempt isn't enabled in the kernel, we can run into hang issues with
      sqthread submissions. Use cond_resched() to play nice instead of
      cpu_relax(), if we end up starting the loop and not having any events
      pending for submissions.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9831a90c
    • J
      io_uring: fix potential crash issue due to io_get_req failure · a1041c27
      Jackie Liu 提交于
      Sometimes io_get_req will return a NUL, then we need to do the
      correct error handling, otherwise it will cause the kernel null
      pointer exception.
      
      Fixes: 4fe2c963 ("io_uring: add support for link with drain")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a1041c27
    • J
      io_uring: ensure poll commands clear ->sqe · 6cc47d1d
      Jens Axboe 提交于
      If we end up getting woken in poll (due to a signal), then we may need
      to punt the poll request to an async worker. When we do that, we look up
      the list to queue at, deferefencing req->submit.sqe, however that is
      only set for requests we initially decided to queue async.
      
      This fixes a crash with poll command usage and wakeups that need to punt
      to async context.
      
      Fixes: 54a91f3b ("io_uring: limit parallelism of buffered writes")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6cc47d1d
    • J
      io_uring: fix use-after-free of shadow_req · 5f5ad9ce
      Jackie Liu 提交于
      There is a potential dangling pointer problem. we never clean
      shadow_req, if there are multiple link lists in this series of
      sqes, then the shadow_req will not reallocate, and continue to
      use the last one. but in the previous, his memory has been
      released, thus forming a dangling pointer. let's clean up him
      and make sure that every new link list can reapply for a new
      shadow_req.
      
      Fixes: 4fe2c963 ("io_uring: add support for link with drain")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5f5ad9ce
    • J
      io_uring: use kmemdup instead of kmalloc and memcpy · 954dab19
      Jackie Liu 提交于
      Just clean up the code, no function changes.
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      954dab19
  4. 15 9月, 2019 1 次提交
  5. 13 9月, 2019 2 次提交
  6. 10 9月, 2019 5 次提交
    • J
      io_uring: limit parallelism of buffered writes · 54a91f3b
      Jens Axboe 提交于
      All the popular filesystems need to grab the inode lock for buffered
      writes. With io_uring punting buffered writes to async context, we
      observe a lot of contention with all workers hamming this mutex.
      
      For buffered writes, we generally don't need a lot of parallelism on
      the submission side, as the flushing will take care of that for us.
      Hence we don't need a deep queue on the write side, as long as we
      can safely punt from the original submission context.
      
      Add a workqueue with a limit of 2 that we can use for buffered writes.
      This greatly improves the performance and efficiency of higher queue
      depth buffered async writes with io_uring.
      Reported-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      54a91f3b
    • J
      io_uring: add io_queue_async_work() helper · 18d9be1a
      Jens Axboe 提交于
      Add a helper for queueing a request for async execution, in preparation
      for optimizing it.
      
      No functional change in this patch.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      18d9be1a
    • J
      io_uring: optimize submit_and_wait API · c5766668
      Jens Axboe 提交于
      For some applications that end up using a submit-and-wait type of
      approach for certain batches of IO, we can make that a bit more
      efficient by allowing the application to block for the last IO
      submission. This prevents an async when we don't need it, as the
      application will be blocking for the completion event(s) anyway.
      
      Typical use cases are using the liburing
      io_uring_submit_and_wait() API, or just using io_uring_enter()
      doing both submissions and completions. As a specific example,
      RocksDB doing MultiGet() is sped up quite a bit with this
      change.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c5766668
    • J
      io_uring: add support for link with drain · 4fe2c963
      Jackie Liu 提交于
      To support the link with drain, we need to do two parts.
      
      There is an sqes:
      
          0     1     2     3     4     5     6
       +-----+-----+-----+-----+-----+-----+-----+
       |  N  |  L  |  L  | L+D |  N  |  N  |  N  |
       +-----+-----+-----+-----+-----+-----+-----+
      
      First, we need to ensure that the io before the link is completed,
      there is a easy way is set drain flag to the link list's head, so
      all subsequent io will be inserted into the defer_list.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      
      Second, ensure that the following IO will not be completed first,
      an easy way is to create a mirror of drain io and insert it into
      defer_list, in this way, as long as drain io is not processed, the
      following io in the defer_list will not be actively process.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
         ('3) |  D  |   <== This is a shadow of (3)
      	+-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4fe2c963
    • J
      io_uring: fix wrong sequence setting logic · 8776f3fa
      Jackie Liu 提交于
      Sqo_thread will get sqring in batches, which will cause
      ctx->cached_sq_head to be added in batches. if one of these
      sqes is set with the DRAIN flag, then he will never get a
      chance to process, and finally sqo_thread will not exit.
      
      Fixes: de0617e4 ("io_uring: add support for marking commands as draining")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8776f3fa
  7. 07 9月, 2019 1 次提交
    • J
      io_uring: expose single mmap capability · ac90f249
      Jens Axboe 提交于
      After commit 75b28aff we can get by with just a single mmap to
      map both the sq and cq ring. However, userspace doesn't know that.
      
      Add a features variable to io_uring_params, and notify userspace
      that the kernel has this ability. This can then be used in liburing
      (or in applications directly) to avoid the second mmap.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac90f249
  8. 28 8月, 2019 2 次提交
  9. 25 8月, 2019 1 次提交
  10. 23 8月, 2019 2 次提交
    • D
      xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT · 1fb254aa
      Darrick J. Wong 提交于
      Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
      fails on account of being out of disk quota.  I ran his reproducer
      script:
      
      # adduser dummy
      # adduser dummy plugdev
      
      # dd if=/dev/zero bs=1M count=100 of=test.img
      # mkfs.xfs test.img
      # mount -t xfs -o gquota test.img /mnt
      # mkdir -p /mnt/dummy
      # chown -c dummy /mnt/dummy
      # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
      
      (and then as user dummy)
      
      $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
      $ chgrp plugdev /mnt/dummy/foo
      
      and saw:
      
      ================================================
      WARNING: lock held when returning to user space!
      5.3.0-rc5 #rc5 Tainted: G        W
      ------------------------------------------------
      chgrp/47006 is leaving the kernel with locks still held!
      1 lock held by chgrp/47006:
       #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
      
      ...which is clearly caused by xfs_setattr_nonsize failing to unlock the
      ILOCK after the xfs_qm_vop_chown_reserve call fails.  Add the missing
      unlock.
      
      Reported-by: benjamin.moody@gmail.com
      Fixes: 253f4911 ("xfs: better xfs_trans_alloc interface")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NSalvatore Bonaccorso <carnil@debian.org>
      1fb254aa
    • J
      io_uring: add need_resched() check in inner poll loop · 08f5439f
      Jens Axboe 提交于
      The outer poll loop checks for whether we need to reschedule, and
      returns to userspace if we do. However, it's possible to get stuck
      in the inner loop as well, if the CPU we are running on needs to
      reschedule to finish the IO work.
      
      Add the need_resched() check in the inner loop as well. This fixes
      a potential hang if the kernel is configured with
      CONFIG_PREEMPT_VOLUNTARY=y.
      Reported-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08f5439f
  11. 22 8月, 2019 11 次提交
    • L
      ubifs: Limit the number of pages in shrink_liability · 0af83abb
      Liu Song 提交于
      If the number of dirty pages to be written back is large,
      then writeback_inodes_sb will block waiting for a long time,
      causing hung task detection alarm. Therefore, we should limit
      the maximum number of pages written back this time, which let
      the budget be completed faster. The remaining dirty pages
      tend to rely on the writeback mechanism to complete the
      synchronization.
      
      Fixes: b6e51316 ("writeback: separate starting of sync vs opportunistic writeback")
      Signed-off-by: NLiu Song <liu.song11@zte.com.cn>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      0af83abb
    • R
      ubifs: Correctly initialize c->min_log_bytes · 377e208f
      Richard Weinberger 提交于
      Currently on a freshly mounted UBIFS, c->min_log_bytes is 0.
      This can lead to a log overrun and make commits fail.
      
      Recent kernels will report the following assert:
      UBIFS assert failed: c->lhead_lnum != c->ltail_lnum, in fs/ubifs/log.c:412
      
      c->min_log_bytes can have two states, 0 and c->leb_size.
      It controls how much bytes of the log area are reserved for non-bud
      nodes such as commit nodes.
      
      After a commit it has to be set to c->leb_size such that we have always
      enough space for a commit. While a commit runs it can be 0 to make the
      remaining bytes of the log available to writers.
      
      Having it set to 0 right after mount is wrong since no space for commits
      is reserved.
      
      Fixes: 1e51764a ("UBIFS: add new flash file system")
      Reported-and-tested-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      377e208f
    • R
      ubifs: Fix double unlock around orphan_delete() · 4dd75b33
      Richard Weinberger 提交于
      We unlock after orphan_delete(), so no need to unlock
      in the function too.
      Reported-by: NHan Xu <han.xu@nxp.com>
      Fixes: 8009ce95 ("ubifs: Don't leak orphans on memory during commit")
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      4dd75b33
    • Y
      afs: use correct afs_call_type in yfs_fs_store_opaque_acl2 · 7533be85
      YueHaibing 提交于
      It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in
      yfs_fs_store_opaque_acl2().
      
      Fixes: f5e45463 ("afs: Implement YFS ACL setting")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7533be85
    • M
      afs: Fix possible oops in afs_lookup trace event · c4c613ff
      Marc Dionne 提交于
      The afs_lookup trace event can cause the following:
      
      [  216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
      [  216.576803] #PF: supervisor read access in kernel mode
      [  216.576813] #PF: error_code(0x0000) - not-present page
      ...
      [  216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]
      
      If the inode from afs_do_lookup() is an error other than ENOENT, or if it
      is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
      try to dereference the error pointer as a valid pointer.
      
      Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.
      
      Ideally the trace would include the error value, but for now just avoid
      the oops.
      
      Fixes: 80548b03 ("afs: Add more tracepoints")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c4c613ff
    • D
      afs: Fix leak in afs_lookup_cell_rcu() · a5fb8e6c
      David Howells 提交于
      Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to
      non-clearance of the default error in the case a NULL cell name is passed
      and the workstation default cell is used.
      
      Also put a bit at the end to make sure we don't leak a cell ref if we're
      going to be returning an error.
      
      This leak results in an assertion like the following when the kafs module is
      unloaded:
      
      	AFS: Assertion failed
      	2 == 1 is false
      	0x2 == 0x1 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/afs/cell.c:770!
      	...
      	RIP: 0010:afs_manage_cells+0x220/0x42f [kafs]
      	...
      	 process_one_work+0x4c2/0x82c
      	 ? pool_mayday_timeout+0x1e1/0x1e1
      	 ? do_raw_spin_lock+0x134/0x175
      	 worker_thread+0x336/0x4a6
      	 ? rescuer_thread+0x4af/0x4af
      	 kthread+0x1de/0x1ee
      	 ? kthread_park+0xd4/0xd4
      	 ret_from_fork+0x24/0x30
      
      Fixes: 989782dc ("afs: Overhaul cell database management")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a5fb8e6c
    • J
      ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply · 28a28261
      Jeff Layton 提交于
      When ceph_mdsc_do_request returns an error, we can't assume that the
      filelock_reply pointer will be set. Only try to fetch fields out of
      the r_reply_info when it returns success.
      
      Cc: stable@vger.kernel.org
      Reported-by: NHector Martin <hector@marcansoft.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      28a28261
    • E
      ceph: clear page dirty before invalidate page · c95f1c5f
      Erqi Chen 提交于
      clear_page_dirty_for_io(page) before mapping->a_ops->invalidatepage().
      invalidatepage() clears page's private flag, if dirty flag is not
      cleared, the page may cause BUG_ON failure in ceph_set_page_dirty().
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/40862Signed-off-by: NErqi Chen <chenerqi@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c95f1c5f
    • L
      ceph: fix buffer free while holding i_ceph_lock in fill_inode() · af8a85a4
      Luis Henriques 提交于
      Calling ceph_buffer_put() in fill_inode() may result in freeing the
      i_xattrs.blob buffer while holding the i_ceph_lock.  This can be fixed by
      postponing the call until later, when the lock is released.
      
      The following backtrace was triggered by fstests generic/070.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4
        6 locks held by kworker/0:4/3852:
         #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0
         #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0
         #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476
         #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476
         #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476
         #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70
        CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Workqueue: ceph-msgr ceph_con_workfn
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         fill_inode.isra.0+0xa9b/0xf70
         ceph_fill_trace+0x13b/0xc70
         ? dispatch+0x2eb/0x1476
         dispatch+0x320/0x1476
         ? __mutex_unlock_slowpath+0x4d/0x2a0
         ceph_con_workfn+0xc97/0x2ec0
         ? process_one_work+0x1b8/0x5f0
         process_one_work+0x244/0x5f0
         worker_thread+0x4d/0x3e0
         kthread+0x105/0x140
         ? process_one_work+0x5f0/0x5f0
         ? kthread_park+0x90/0x90
         ret_from_fork+0x3a/0x50
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      af8a85a4
    • L
      ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() · 12fe3dda
      Luis Henriques 提交于
      Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in
      freeing the i_xattrs.blob buffer while holding the i_ceph_lock.  This can
      be fixed by having this function returning the old blob buffer and have
      the callers of this function freeing it when the lock is released.
      
      The following backtrace was triggered by fstests generic/117.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress
        4 locks held by fsstress/649:
         #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0
         #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60
         #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60
         #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60
        CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         __ceph_build_xattrs_blob+0x12b/0x170
         __send_cap+0x302/0x540
         ? __lock_acquire+0x23c/0x1e40
         ? __mark_caps_flushing+0x15c/0x280
         ? _raw_spin_unlock+0x24/0x30
         ceph_check_caps+0x5f0/0xc60
         ceph_flush_dirty_caps+0x7c/0x150
         ? __ia32_sys_fdatasync+0x20/0x20
         ceph_sync_fs+0x5a/0x130
         iterate_supers+0x8f/0xf0
         ksys_sync+0x4f/0xb0
         __ia32_sys_sync+0xa/0x10
         do_syscall_64+0x50/0x1c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fc6409ab617
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      12fe3dda
    • L
      ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() · 86968ef2
      Luis Henriques 提交于
      Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the
      i_xattrs.prealloc_blob buffer while holding the i_ceph_lock.  This can be
      fixed by postponing the call until later, when the lock is released.
      
      The following backtrace was triggered by fstests generic/117.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress
        3 locks held by fsstress/650:
         #0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50
         #1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0
         #2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810
        CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         __ceph_setxattr+0x2b4/0x810
         __vfs_setxattr+0x66/0x80
         __vfs_setxattr_noperm+0x59/0xf0
         vfs_setxattr+0x81/0xa0
         setxattr+0x115/0x230
         ? filename_lookup+0xc9/0x140
         ? rcu_read_lock_sched_held+0x74/0x80
         ? rcu_sync_lockdep_assert+0x2e/0x60
         ? __sb_start_write+0x142/0x1a0
         ? mnt_want_write+0x20/0x50
         path_setxattr+0xba/0xd0
         __x64_sys_lsetxattr+0x24/0x30
         do_syscall_64+0x50/0x1c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7ff23514359a
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      86968ef2
  12. 21 8月, 2019 2 次提交
    • J
      io_uring: don't enter poll loop if we have CQEs pending · a3a0e43f
      Jens Axboe 提交于
      We need to check if we have CQEs pending before starting a poll loop,
      as those could be the events we will be spinning for (and hence we'll
      find none). This can happen if a CQE triggers an error, or if it is
      found by eg an IRQ before we get a chance to find it through polling.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a3a0e43f
    • J
      io_uring: fix potential hang with polled IO · 500f9fba
      Jens Axboe 提交于
      If a request issue ends up being punted to async context to avoid
      blocking, we can get into a situation where the original application
      enters the poll loop for that very request before it has been issued.
      This should not be an issue, except that the polling will hold the
      io_uring uring_ctx mutex for the duration of the poll. When the async
      worker has actually issued the request, it needs to acquire this mutex
      to add the request to the poll issued list. Since the application
      polling is already holding this mutex, the workqueue sleeps on the
      mutex forever, and the application thus never gets a chance to poll for
      the very request it was interested in.
      
      Fix this by ensuring that the polling drops the uring_ctx occasionally
      if it's not making any progress.
      Reported-by: NJeffrey M. Birnbaum <jmbnyc@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      500f9fba
  13. 20 8月, 2019 1 次提交
  14. 19 8月, 2019 2 次提交
    • E
      signal: Allow cifs and drbd to receive their terminating signals · 33da8e7c
      Eric W. Biederman 提交于
      My recent to change to only use force_sig for a synchronous events
      wound up breaking signal reception cifs and drbd.  I had overlooked
      the fact that by default kthreads start out with all signals set to
      SIG_IGN.  So a change I thought was safe turned out to have made it
      impossible for those kernel thread to catch their signals.
      
      Reverting the work on force_sig is a bad idea because what the code
      was doing was very much a misuse of force_sig.  As the way force_sig
      ultimately allowed the signal to happen was to change the signal
      handler to SIG_DFL.  Which after the first signal will allow userspace
      to send signals to these kernel threads.  At least for
      wake_ack_receiver in drbd that does not appear actively wrong.
      
      So correct this problem by adding allow_kernel_signal that will allow
      signals whose siginfo reports they were sent by the kernel through,
      but will not allow userspace generated signals, and update cifs and
      drbd to call allow_kernel_signal in an appropriate place so that their
      thread can receive this signal.
      
      Fixing things this way ensures that userspace won't be able to send
      signals and cause problems, that it is clear which signals the
      threads are expecting to receive, and it guarantees that nothing
      else in the system will be affected.
      
      This change was partly inspired by similar cifs and drbd patches that
      added allow_signal.
      Reported-by: Nronnie sahlberg <ronniesahlberg@gmail.com>
      Reported-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Tested-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Cc: Steve French <smfrench@gmail.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Fixes: 247bc947 ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
      Fixes: 72abe3bc ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
      Fixes: fee10990 ("signal/drbd: Use send_sig not force_sig")
      Fixes: 3cf5d076 ("signal: Remove task parameter from force_sig")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      33da8e7c
    • D
      xfs: fix reflink source file racing with directio writes · 5d888b48
      Darrick J. Wong 提交于
      While trawling through the dedupe file comparison code trying to fix
      page deadlocking problems, Dave Chinner noticed that the reflink code
      only takes shared IOLOCK/MMAPLOCKs on the source file.  Because
      page_mkwrite and directio writes do not take the EXCL versions of those
      locks, this means that reflink can race with writer processes.
      
      For pure remapping this can lead to undefined behavior and file
      corruption; for dedupe this means that we cannot be sure that the
      contents are identical when we decide to go ahead with the remapping.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5d888b48
  15. 17 8月, 2019 1 次提交