1. 26 11月, 2019 28 次提交
  2. 15 11月, 2019 1 次提交
    • J
      io_uring: make POLL_ADD/POLL_REMOVE scale better · eac406c6
      Jens Axboe 提交于
      One of the obvious use cases for these commands is networking, where
      it's not uncommon to have tons of sockets open and polled for. The
      current implementation uses a list for insertion and lookup, which works
      fine for file based use cases where the count is usually low, it breaks
      down somewhat for higher number of files / sockets. A test case with
      30k sockets being polled for and cancelled takes:
      
      real    0m6.968s
      user    0m0.002s
      sys     0m6.936s
      
      with the patch it takes:
      
      real    0m0.233s
      user    0m0.010s
      sys     0m0.176s
      
      If you go to 50k sockets, it gets even more abysmal with the current
      code:
      
      real    0m40.602s
      user    0m0.010s
      sys     0m40.555s
      
      with the patch it takes:
      
      real    0m0.398s
      user    0m0.000s
      sys     0m0.341s
      
      Change is pretty straight forward, just replace the cancel_list with
      a red/black tree instead.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eac406c6
  3. 14 11月, 2019 7 次提交
    • P
      io_uring: Fix getting file for non-fd opcodes · a320e9fa
      Pavel Begunkov 提交于
      For timeout requests and bunch of others io_uring tries to grab a file
      with specified fd, which is usually stdin/fd=0.
      Update io_op_needs_file()
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a320e9fa
    • B
      io_uring: introduce req_need_defer() · 9d858b21
      Bob Liu 提交于
      Makes the code easier to read.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9d858b21
    • B
      io_uring: clean up io_uring_cancel_files() · 2f6d9b9d
      Bob Liu 提交于
      We don't use the return value anymore, drop it. Also drop the
      unecessary double cancel_req value check.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f6d9b9d
    • J
      io_uring: ensure registered buffer import returns the IO length · 5e559561
      Jens Axboe 提交于
      A test case was reported where two linked reads with registered buffers
      failed the second link always. This is because we set the expected value
      of a request in req->result, and if we don't get this result, then we
      fail the dependent links. For some reason the registered buffer import
      returned -ERROR/0, while the normal import returns -ERROR/length. This
      broke linked commands with registered buffers.
      
      Fix this by making io_import_fixed() correctly return the mapped length.
      
      Cc: stable@vger.kernel.org # v5.3
      Reported-by: N李通洲 <carter.li@eoitek.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5e559561
    • P
      io_uring: Fix getting file for timeout · 5683e540
      Pavel Begunkov 提交于
      For timeout requests io_uring tries to grab a file with specified fd,
      which is usually stdin/fd=0.
      Update io_op_needs_file()
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5683e540
    • J
      io_wq: add get/put_work handlers to io_wq_create() · 7d723065
      Jens Axboe 提交于
      For cancellation, we need to ensure that the work item stays valid for
      as long as ->cur_work is valid. Right now we can't safely dereference
      the work item even under the wqe->lock, because while the ->cur_work
      pointer will remain valid, the work could be completing and be freed
      in parallel.
      
      Only invoke ->get/put_work() on items we know that the caller queued
      themselves. Add IO_WQ_WORK_INTERNAL for io-wq to use, which is needed
      when we're queueing a flush item, for instance.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7d723065
    • J
      io_uring: check for validity of ->rings in teardown · 15dff286
      Jens Axboe 提交于
      Normally the rings are always valid, the exception is if we failed to
      allocate the rings at setup time. syzbot reports this:
      
      RSP: 002b:00007ffd6e8aa078 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441229
      RDX: 0000000000000002 RSI: 0000000020000140 RDI: 0000000000000d0d
      RBP: 00007ffd6e8aa090 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
      R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 8903 Comm: syz-executor410 Not tainted 5.4.0-rc7-next-20191113
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
      RIP: 0010:__io_commit_cqring fs/io_uring.c:496 [inline]
      RIP: 0010:io_commit_cqring+0x1e1/0xdb0 fs/io_uring.c:592
      Code: 03 0f 8e df 09 00 00 48 8b 45 d0 4c 8d a3 c0 00 00 00 4c 89 e2 48 c1
      ea 03 44 8b b8 c0 01 00 00 48 b8 00 00 00 00 00 fc ff df <0f> b6 14 02 4c
      89 e0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 61
      RSP: 0018:ffff88808f51fc08 EFLAGS: 00010006
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815abe4a
      RDX: 0000000000000018 RSI: ffffffff81d168d5 RDI: ffff8880a9166100
      RBP: ffff88808f51fc70 R08: 0000000000000004 R09: ffffed1011ea3f7d
      R10: ffffed1011ea3f7c R11: 0000000000000003 R12: 00000000000000c0
      R13: ffff8880a91661c0 R14: 1ffff1101522cc10 R15: 0000000000000000
      FS:  0000000001e7a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000140 CR3: 000000009a74c000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_cqring_overflow_flush+0x6b9/0xa90 fs/io_uring.c:673
        io_ring_ctx_wait_and_kill+0x24f/0x7c0 fs/io_uring.c:4260
        io_uring_create fs/io_uring.c:4600 [inline]
        io_uring_setup+0x1256/0x1cc0 fs/io_uring.c:4626
        __do_sys_io_uring_setup fs/io_uring.c:4639 [inline]
        __se_sys_io_uring_setup fs/io_uring.c:4636 [inline]
        __x64_sys_io_uring_setup+0x54/0x80 fs/io_uring.c:4636
        do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441229
      Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd6e8aa078 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441229
      RDX: 0000000000000002 RSI: 0000000020000140 RDI: 0000000000000d0d
      RBP: 00007ffd6e8aa090 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
      R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace b0f5b127a57f623f ]---
      RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
      RIP: 0010:__io_commit_cqring fs/io_uring.c:496 [inline]
      RIP: 0010:io_commit_cqring+0x1e1/0xdb0 fs/io_uring.c:592
      Code: 03 0f 8e df 09 00 00 48 8b 45 d0 4c 8d a3 c0 00 00 00 4c 89 e2 48 c1
      ea 03 44 8b b8 c0 01 00 00 48 b8 00 00 00 00 00 fc ff df <0f> b6 14 02 4c
      89 e0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 61
      RSP: 0018:ffff88808f51fc08 EFLAGS: 00010006
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815abe4a
      RDX: 0000000000000018 RSI: ffffffff81d168d5 RDI: ffff8880a9166100
      RBP: ffff88808f51fc70 R08: 0000000000000004 R09: ffffed1011ea3f7d
      R10: ffffed1011ea3f7c R11: 0000000000000003 R12: 00000000000000c0
      R13: ffff8880a91661c0 R14: 1ffff1101522cc10 R15: 0000000000000000
      FS:  0000000001e7a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000140 CR3: 000000009a74c000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is exactly the case of failing to allocate the SQ/CQ rings, and
      then entering shutdown. Check if the rings are valid before trying to
      access them at shutdown time.
      
      Reported-by: syzbot+21147d79607d724bd6f3@syzkaller.appspotmail.com
      Fixes: 1d7bb1d5 ("io_uring: add support for backlogged CQ ring")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15dff286
  4. 13 11月, 2019 1 次提交
    • J
      io_uring: fix potential deadlock in io_poll_wake() · 7c9e7f0f
      Jens Axboe 提交于
      We attempt to run the poll completion inline, but we're using trylock to
      do so. This avoids a deadlock since we're grabbing the locks in reverse
      order at this point, we already hold the poll wq lock and we're trying
      to grab the completion lock, while the normal rules are the reverse of
      that order.
      
      IO completion for a timeout link will need to grab the completion lock,
      but that's not safe from this context. Put the completion under the
      completion_lock in io_poll_wake(), and mark the request as entering
      the completion with the completion_lock already held.
      
      Fixes: 2665abfd ("io_uring: add support for linked SQE timeouts")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7c9e7f0f
  5. 12 11月, 2019 3 次提交
    • J
      io_uring: use correct "is IO worker" helper · 960e432d
      Jens Axboe 提交于
      Since we switched to io-wq, the dependent link optimization for when to
      pass back work inline has been broken. Fix this by providing a suitable
      io-wq helper for io_uring to use to detect when to do this.
      
      Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      960e432d
    • J
      io_uring: make timeout sequence == 0 mean no sequence · 93bd25bb
      Jens Axboe 提交于
      Currently we make sequence == 0 be the same as sequence == 1, but that's
      not super useful if the intent is really to have a timeout that's just
      a pure timeout.
      
      If the user passes in sqe->off == 0, then don't apply any sequence logic
      to the request, let it purely be driven by the timeout specified.
      Reported-by: N李通洲 <carter.li@eoitek.com>
      Reviewed-by: N李通洲 <carter.li@eoitek.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      93bd25bb
    • J
      io_uring: fix -ENOENT issue with linked timer with short timeout · 76a46e06
      Jens Axboe 提交于
      If you prep a read (for example) that needs to get punted to async
      context with a timer, if the timeout is sufficiently short, the timer
      request will get completed with -ENOENT as it could not find the read.
      
      The issue is that we prep and start the timer before we start the read.
      Hence the timer can trigger before the read is even started, and the end
      result is then that the timer completes with -ENOENT, while the read
      starts instead of being cancelled by the timer.
      
      Fix this by splitting the linked timer into two parts:
      
      1) Prep and validate the linked timer
      2) Start timer
      
      The read is then started between steps 1 and 2, so we know that the
      timer will always have a consistent view of the read request state.
      Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      76a46e06
新手
引导
客服 返回
顶部