1. 27 11月, 2021 1 次提交
    • Y
      io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6
      Ye Bin 提交于
      I got issue as follows:
      [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      [  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  594.364987] Modules linked in:
      [  594.365405] irq event stamp: 604180238
      [  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
      [  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
      [  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
      [  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
      [  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
      [  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  594.373604] Workqueue: events_unbound io_ring_exit_work
      [  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
      [  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
      [  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
      [  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
      [  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
      [  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
      [  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
      [  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
      [  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
      [  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
      [  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  594.387403] Call Trace:
      [  594.387738]  <TASK>
      [  594.388042]  find_and_remove_object+0x118/0x160
      [  594.389321]  delete_object_full+0xc/0x20
      [  594.389852]  kfree+0x193/0x470
      [  594.390275]  __io_remove_buffers.part.0+0xed/0x147
      [  594.390931]  io_ring_ctx_free+0x342/0x6a2
      [  594.392159]  io_ring_exit_work+0x41e/0x486
      [  594.396419]  process_one_work+0x906/0x15a0
      [  594.399185]  worker_thread+0x8b/0xd80
      [  594.400259]  kthread+0x3bf/0x4a0
      [  594.401847]  ret_from_fork+0x22/0x30
      [  594.402343]  </TASK>
      
      Message from syslogd@localhost at Nov 13 09:09:54 ...
      kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      
      We can reproduce this issue by follow syzkaller log:
      r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
      sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
      syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
      io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
      
      The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
      to soft lockup.
      To solve this issue, we need add schedule point when do while loop in
      '__io_remove_buffers'.
      After add  schedule point we do regression, get follow data.
      [  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      [  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
      [  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      ...
      
      Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      1d0254e6
  2. 26 11月, 2021 2 次提交
    • P
      io_uring: fix link traversal locking · 6af3f48b
      Pavel Begunkov 提交于
      WARNING: inconsistent lock state
      5.16.0-rc2-syzkaller #0 Not tainted
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      ffff888078e11418 (&ctx->timeout_lock
      ){?.+.}-{2:2}
      , at: io_timeout_fn+0x6f/0x360 fs/io_uring.c:5943
      {HARDIRQ-ON-W} state was registered at:
        [...]
        spin_unlock_irq include/linux/spinlock.h:399 [inline]
        __io_poll_remove_one fs/io_uring.c:5669 [inline]
        __io_poll_remove_one fs/io_uring.c:5654 [inline]
        io_poll_remove_one+0x236/0x870 fs/io_uring.c:5680
        io_poll_remove_all+0x1af/0x235 fs/io_uring.c:5709
        io_ring_ctx_wait_and_kill+0x1cc/0x322 fs/io_uring.c:9534
        io_uring_release+0x42/0x46 fs/io_uring.c:9554
        __fput+0x286/0x9f0 fs/file_table.c:280
        task_work_run+0xdd/0x1a0 kernel/task_work.c:164
        exit_task_work include/linux/task_work.h:32 [inline]
        do_exit+0xc14/0x2b40 kernel/exit.c:832
      
      674ee8e1 ("io_uring: correct link-list traversal locking") fixed a
      data race but introduced a possible deadlock and inconsistentcy in irq
      states. E.g.
      
      io_poll_remove_all()
          spin_lock_irq(timeout_lock)
          io_poll_remove_one()
              spin_lock/unlock_irq(poll_lock);
          spin_unlock_irq(timeout_lock)
      
      Another type of problem is freeing a request while holding
      ->timeout_lock, which may leads to a deadlock in
      io_commit_cqring() -> io_flush_timeouts() and other places.
      
      Having 3 nested locks is also too ugly. Add io_match_task_safe(), which
      would briefly take and release timeout_lock for race prevention inside,
      so the actuall request cancellation / free / etc. code doesn't have it
      taken.
      
      Reported-by: syzbot+ff49a3059d49b0ca0eec@syzkaller.appspotmail.com
      Reported-by: syzbot+847f02ec20a6609a328b@syzkaller.appspotmail.com
      Reported-by: syzbot+3368aadcd30425ceb53b@syzkaller.appspotmail.com
      Reported-by: syzbot+51ce8887cdef77c9ac83@syzkaller.appspotmail.com
      Reported-by: syzbot+3cb756a49d2f394a9ee3@syzkaller.appspotmail.com
      Fixes: 674ee8e1 ("io_uring: correct link-list traversal locking")
      Cc: stable@kernel.org # 5.15+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/397f7ebf3f4171f1abe41f708ac1ecb5766f0b68.1637937097.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      6af3f48b
    • P
      io_uring: fail cancellation for EXITING tasks · 617a8948
      Pavel Begunkov 提交于
      WARNING: CPU: 1 PID: 20 at fs/io_uring.c:6269 io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
      CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 5.16.0-rc1-syzkaller #0
      Workqueue: events io_fallback_req_func
      RIP: 0010:io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
      Call Trace:
       <TASK>
       io_req_task_link_timeout+0x6b/0x1e0 fs/io_uring.c:6886
       io_fallback_req_func+0xf9/0x1ae fs/io_uring.c:1334
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      
      We need original task's context to do cancellations, so if it's dying
      and the callback is executed in a fallback mode, fail the cancellation
      attempt.
      
      Fixes: 89b263f6 ("io_uring: run linked timeouts from task_work")
      Cc: stable@kernel.org # 5.15+
      Reported-by: syzbot+ab0cfe96c2b3cd1c1153@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/4c41c5f379c6941ad5a07cd48cb66ed62199cf7e.1637937097.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      617a8948
  3. 23 11月, 2021 1 次提交
  4. 17 11月, 2021 1 次提交
  5. 08 11月, 2021 1 次提交
  6. 05 11月, 2021 1 次提交
  7. 03 11月, 2021 1 次提交
  8. 02 11月, 2021 1 次提交
  9. 29 10月, 2021 1 次提交
    • J
      io_uring: harder fdinfo sq/cq ring iterating · f75d1183
      Jens Axboe 提交于
      The ring iteration is racy, which isn't necessarily a problem except it
      can cause us to iterate the whole thing. That isn't desired or ideal,
      and it can lead to excessive runtimes of reading fdinfo.
      
      Cap the iteration at tail - head OR the ring size. While in there, clean
      up the ring masking and just dump the raw values along with the masks.
      That provides more useful debug info.
      
      Fixes: 83f84356 ("io_uring: add more uring info to fdinfo for debug")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f75d1183
  10. 27 10月, 2021 1 次提交
  11. 26 10月, 2021 1 次提交
  12. 25 10月, 2021 7 次提交
  13. 23 10月, 2021 1 次提交
    • H
      io_uring: implement async hybrid mode for pollable requests · 90fa0288
      Hao Xu 提交于
      The current logic of requests with IOSQE_ASYNC is first queueing it to
      io-worker, then execute it in a synchronous way. For unbound works like
      pollable requests(e.g. read/write a socketfd), the io-worker may stuck
      there waiting for events for a long time. And thus other works wait in
      the list for a long time too.
      Let's introduce a new way for unbound works (currently pollable
      requests), with this a request will first be queued to io-worker, then
      executed in a nonblock try rather than a synchronous way. Failure of
      that leads it to arm poll stuff and then the worker can begin to handle
      other works.
      The detail process of this kind of requests is:
      
      step1: original context:
                 queue it to io-worker
      step2: io-worker context:
                 nonblock try(the old logic is a synchronous try here)
                     |
                     |--fail--> arm poll
                                  |
                                  |--(fail/ready)-->synchronous issue
                                  |
                                  |--(succeed)-->worker finish it's job, tw
                                                 take over the req
      
      This works much better than the old IOSQE_ASYNC logic in cases where
      unbound max_worker is relatively small. In this case, number of
      io-worker eazily increments to max_worker, new worker cannot be created
      and running workers stuck there handling old works in IOSQE_ASYNC mode.
      
      In my 64-core machine, set unbound max_worker to 20, run echo-server,
      turns out:
      (arguments: register_file, connetion number is 1000, message size is 12
      Byte)
      original IOSQE_ASYNC: 76664.151 tps
      after this patch: 166934.985 tps
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20211018133445.103438-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      90fa0288
  14. 22 10月, 2021 1 次提交
  15. 20 10月, 2021 4 次提交
  16. 19 10月, 2021 15 次提交