1. 24 8月, 2021 17 次提交
  2. 21 8月, 2021 1 次提交
  3. 18 8月, 2021 1 次提交
  4. 15 8月, 2021 1 次提交
    • J
      io_uring: only assign io_uring_enter() SQPOLL error in actual error case · 21f96522
      Jens Axboe 提交于
      If an SQPOLL based ring is newly created and an application issues an
      io_uring_enter(2) system call on it, then we can return a spurious
      -EOWNERDEAD error. This happens because there's nothing to submit, and
      if the caller doesn't specify any other action, the initial error
      assignment of -EOWNERDEAD never gets overwritten. This causes us to
      return it directly, even if it isn't valid.
      
      Move the error assignment into the actual failure case instead.
      
      Cc: stable@vger.kernel.org
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Reported-by: Sherlock Holo sherlockya@gmail.com
      Link: https://github.com/axboe/liburing/issues/413Signed-off-by: NJens Axboe <axboe@kernel.dk>
      21f96522
  5. 10 8月, 2021 3 次提交
    • P
      io_uring: fix ctx-exit io_rsrc_put_work() deadlock · 43597aac
      Pavel Begunkov 提交于
      __io_rsrc_put_work() might need ->uring_lock, so nobody should wait for
      rsrc nodes holding the mutex. However, that's exactly what
      io_ring_ctx_free() does with io_wait_rsrc_data().
      
      Split it into rsrc wait + dealloc, and move the first one out of the
      lock.
      
      Cc: stable@vger.kernel.org
      Fixes: b60c8dce ("io_uring: preparation for rsrc tagging")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.1628559783.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      43597aac
    • J
      io_uring: drop ctx->uring_lock before flushing work item · c018db4a
      Jens Axboe 提交于
      Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags
      from the regression suite:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.14.0-rc3-bluetea-test-00249-gc7d10223 #5 Tainted: G           OE
      ------------------------------------------------------
      kworker/2:4/2684 is trying to acquire lock:
      ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0
      
      but task is already holding lock:
      ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}:
             __flush_work+0x31b/0x490
             io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
             __do_sys_io_uring_register+0x45b/0x1060
             do_syscall_64+0x35/0xb0
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #0 (&ctx->uring_lock){+.+.}-{3:3}:
             __lock_acquire+0x119a/0x1e10
             lock_acquire+0xc8/0x2f0
             __mutex_lock+0x86/0x740
             io_rsrc_put_work+0x13d/0x1a0
             process_one_work+0x236/0x530
             worker_thread+0x52/0x3b0
             kthread+0x135/0x160
             ret_from_fork+0x1f/0x30
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock((work_completion)(&(&ctx->rsrc_put_work)->work));
                                     lock(&ctx->uring_lock);
                                     lock((work_completion)(&(&ctx->rsrc_put_work)->work));
        lock(&ctx->uring_lock);
      
       *** DEADLOCK ***
      
      2 locks held by kworker/2:4/2684:
       #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
       #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
      
      stack backtrace:
      CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G           OE     5.14.0-rc3-bluetea-test-00249-gc7d10223 #5
      Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
      Workqueue: events io_rsrc_put_work
      Call Trace:
       dump_stack_lvl+0x6a/0x9a
       check_noncircular+0xfe/0x110
       __lock_acquire+0x119a/0x1e10
       lock_acquire+0xc8/0x2f0
       ? io_rsrc_put_work+0x13d/0x1a0
       __mutex_lock+0x86/0x740
       ? io_rsrc_put_work+0x13d/0x1a0
       ? io_rsrc_put_work+0x13d/0x1a0
       ? io_rsrc_put_work+0x13d/0x1a0
       ? process_one_work+0x1ce/0x530
       io_rsrc_put_work+0x13d/0x1a0
       process_one_work+0x236/0x530
       worker_thread+0x52/0x3b0
       ? process_one_work+0x530/0x530
       kthread+0x135/0x160
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x1f/0x30
      
      which is due to holding the ctx->uring_lock when flushing existing
      pending work, while the pending work flushing may need to grab the uring
      lock if we're using IOPOLL.
      
      Fix this by dropping the uring_lock a bit earlier as part of the flush.
      
      Cc: stable@vger.kernel.org
      Link: https://github.com/axboe/liburing/issues/404Tested-by: NAmmar Faizi <ammarfaizi2@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c018db4a
    • J
      io_uring: rsrc ref lock needs to be IRQ safe · 4956b9ea
      Jens Axboe 提交于
      Nadav reports running into the below splat on re-enabling softirqs:
      
      WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
      Modules linked in:
      CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
      RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
      Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
      RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
      RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
      RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
      R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
      R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
      FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
      Call Trace:
       _raw_spin_unlock_bh+0x31/0x40
       io_rsrc_node_ref_zero+0x13e/0x190
       io_dismantle_req+0x215/0x220
       io_req_complete_post+0x1b8/0x720
       __io_complete_rw.isra.0+0x16b/0x1f0
       io_complete_rw+0x10/0x20
      
      where it's clear we end up calling the percpu count release directly
      from the completion path, as it's in atomic mode and we drop the last
      ref. For file/block IO, this can be from IRQ context already, and the
      softirq locking for rsrc isn't enough.
      
      Just make the lock fully IRQ safe, and ensure we correctly safe state
      from the release path as we don't know the full context there.
      Reported-by: NNadav Amit <nadav.amit@gmail.com>
      Tested-by: NNadav Amit <nadav.amit@gmail.com>
      Link: https://lore.kernel.org/io-uring/C187C836-E78B-4A31-B24C-D16919ACA093@gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4956b9ea
  6. 09 8月, 2021 2 次提交
    • N
      io_uring: Use WRITE_ONCE() when writing to sq_flags · 20c0b380
      Nadav Amit 提交于
      The compiler should be forbidden from any strange optimization for async
      writes to user visible data-structures. Without proper protection, the
      compiler can cause write-tearing or invent writes that would confuse the
      userspace.
      
      However, there are writes to sq_flags which are not protected by
      WRITE_ONCE(). Use WRITE_ONCE() for these writes.
      
      This is purely a theoretical issue. Presumably, any compiler is very
      unlikely to do such optimizations.
      
      Fixes: 75b28aff ("io_uring: allocate the two rings together")
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Link: https://lore.kernel.org/r/20210808001342.964634-3-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      20c0b380
    • N
      io_uring: clear TIF_NOTIFY_SIGNAL when running task work · ef98eb04
      Nadav Amit 提交于
      When using SQPOLL, the submission queue polling thread calls
      task_work_run() to run queued work. However, when work is added with
      TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
      set afterwards and is never cleared.
      
      Consequently, when the submission queue polling thread checks whether
      signal_pending(), it may always find a pending signal, if
      task_work_add() was ever called before.
      
      The impact of this bug might be different on different kernel versions.
      It appears that on 5.14 it would only cause unnecessary calculation and
      prevent the polling thread from sleeping. On 5.13, where the bug was
      found, it stops the polling thread from finding newly submitted work.
      
      Instead of task_work_run(), use tracehook_notify_signal() that clears
      TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
      current->task_works to avoid a race in which task_works is cleared but
      the TIF_NOTIFY_SIGNAL is set.
      
      Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      ef98eb04
  7. 28 7月, 2021 3 次提交
  8. 27 7月, 2021 1 次提交
  9. 26 7月, 2021 1 次提交
  10. 24 7月, 2021 2 次提交
  11. 23 7月, 2021 1 次提交
  12. 20 7月, 2021 3 次提交
    • Y
      io_uring: fix memleak in io_init_wq_offload() · 362a9e65
      Yang Yingliang 提交于
      I got memory leak report when doing fuzz test:
      
      BUG: memory leak
      unreferenced object 0xffff888107310a80 (size 96):
      comm "syz-executor.6", pid 4610, jiffies 4295140240 (age 20.135s)
      hex dump (first 32 bytes):
      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
      backtrace:
      [<000000001974933b>] kmalloc include/linux/slab.h:591 [inline]
      [<000000001974933b>] kzalloc include/linux/slab.h:721 [inline]
      [<000000001974933b>] io_init_wq_offload fs/io_uring.c:7920 [inline]
      [<000000001974933b>] io_uring_alloc_task_context+0x466/0x640 fs/io_uring.c:7955
      [<0000000039d0800d>] __io_uring_add_tctx_node+0x256/0x360 fs/io_uring.c:9016
      [<000000008482e78c>] io_uring_add_tctx_node fs/io_uring.c:9052 [inline]
      [<000000008482e78c>] __do_sys_io_uring_enter fs/io_uring.c:9354 [inline]
      [<000000008482e78c>] __se_sys_io_uring_enter fs/io_uring.c:9301 [inline]
      [<000000008482e78c>] __x64_sys_io_uring_enter+0xabc/0xc20 fs/io_uring.c:9301
      [<00000000b875f18f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<00000000b875f18f>] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
      [<000000006b0a8484>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU0                          CPU1
      io_uring_enter                io_uring_enter
      io_uring_add_tctx_node        io_uring_add_tctx_node
      __io_uring_add_tctx_node      __io_uring_add_tctx_node
      io_uring_alloc_task_context   io_uring_alloc_task_context
      io_init_wq_offload            io_init_wq_offload
      hash = kzalloc                hash = kzalloc
      ctx->hash_map = hash          ctx->hash_map = hash <- one of the hash is leaked
      
      When calling io_uring_enter() in parallel, the 'hash_map' will be leaked,
      add uring_lock to protect 'hash_map'.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/20210720083805.3030730-1-yangyingliang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      362a9e65
    • P
      io_uring: remove double poll entry on arm failure · 46fee9ab
      Pavel Begunkov 提交于
      __io_queue_proc() can enqueue both poll entries and still fail
      afterwards, so the callers trying to cancel it should also try to remove
      the second poll entry (if any).
      
      For example, it may leave the request alive referencing a io_uring
      context but not accessible for cancellation:
      
      [  282.599913][ T1620] task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
      [  282.609927][ T1620] Call Trace:
      [  282.613711][ T1620]  __schedule+0x93a/0x26f0
      [  282.634647][ T1620]  schedule+0xd3/0x270
      [  282.638874][ T1620]  io_uring_cancel_generic+0x54d/0x890
      [  282.660346][ T1620]  io_sq_thread+0xaac/0x1250
      [  282.696394][ T1620]  ret_from_fork+0x1f/0x30
      
      Cc: stable@vger.kernel.org
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      46fee9ab
    • P
      io_uring: explicitly count entries for poll reqs · 68b11e8b
      Pavel Begunkov 提交于
      If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc()
      failed, but it goes on with a third waitqueue, it may succeed and
      overwrite the error status. Count the number of poll entries we added,
      so we can set pt->error to zero at the beginning and find out when the
      mentioned scenario happens.
      
      Cc: stable@vger.kernel.org
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.1626774457.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      68b11e8b
  13. 12 7月, 2021 2 次提交
  14. 09 7月, 2021 2 次提交