1. 20 10月, 2022 2 次提交
    • R
      io-wq: Fix memory leak in worker creation · 996d3efe
      Rafael Mendonca 提交于
      If the CPU mask allocation for a node fails, then the memory allocated for
      the 'io_wqe' struct of the current node doesn't get freed on the error
      handling path, since it has not yet been added to the 'wqes' array.
      
      This was spotted when fuzzing v6.1-rc1 with Syzkaller:
      BUG: memory leak
      unreferenced object 0xffff8880093d5000 (size 1024):
        comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720
          [<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130
          [<000000004e107011>] io_wq_create+0x7b9/0xdc0
          [<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d
          [<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba
          [<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce
          [<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80
          [<000000008a8a7508>] do_syscall_64+0x5d/0x90
          [<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 0e03496d ("io-wq: use private CPU mask")
      Cc: stable@vger.kernel.org
      Signed-off-by: NRafael Mendonca <rafaelmendsr@gmail.com>
      Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      996d3efe
    • H
      io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd() · 16bbdfe5
      Harshit Mogalapalli 提交于
      Syzkaller produced the below call trace:
      
       BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0
       Write of size 8 at addr 0000000000000070 by task repro/16399
      
       CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7
       Call Trace:
        <TASK>
        dump_stack_lvl+0xcd/0x134
        ? io_msg_ring+0x3cb/0x9f0
        kasan_report+0xbc/0xf0
        ? io_msg_ring+0x3cb/0x9f0
        kasan_check_range+0x140/0x190
        io_msg_ring+0x3cb/0x9f0
        ? io_msg_ring_prep+0x300/0x300
        io_issue_sqe+0x698/0xca0
        io_submit_sqes+0x92f/0x1c30
        __do_sys_io_uring_enter+0xae4/0x24b0
      ....
       RIP: 0033:0x7f2eaf8f8289
       RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289
       RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004
       RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0
       R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
       Kernel panic - not syncing: panic_on_warn set ...
      
      We don't have a NULL check on file_ptr in io_msg_send_fd() function,
      so when file_ptr is NUL src_file is also NULL and get_file()
      dereferences a NULL pointer and leads to above crash.
      
      Add a NULL check to fix this issue.
      
      Fixes: e6130eba ("io_uring: add support for passing fixed file descriptors")
      Reported-by: Nsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: NHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      16bbdfe5
  2. 17 10月, 2022 5 次提交
  3. 13 10月, 2022 10 次提交
  4. 08 10月, 2022 3 次提交
  5. 30 9月, 2022 6 次提交
  6. 29 9月, 2022 3 次提交
  7. 28 9月, 2022 1 次提交
  8. 27 9月, 2022 4 次提交
  9. 26 9月, 2022 1 次提交
    • P
      io_uring/net: fix cleanup double free free_iov init · 4c17a496
      Pavel Begunkov 提交于
      Having ->async_data doesn't mean it's initialised and previously we vere
      relying on setting F_CLEANUP at the right moment. With zc sendmsg
      though, we set F_CLEANUP early in prep when we alloc a notif and so we
      may allocate async_data, fail in copy_msg_hdr() leaving
      struct io_async_msghdr not initialised correctly but with F_CLEANUP
      set, which causes a ->free_iov double free and probably other nastiness.
      
      Always initialise ->free_iov. Also, now it might point to fast_iov when
      fails, so avoid freeing it during cleanups.
      
      Reported-by: syzbot+edfd15cd4246a3fc615a@syzkaller.appspotmail.com
      Fixes: 493108d9 ("io_uring/net: zerocopy sendmsg")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4c17a496
  10. 24 9月, 2022 3 次提交
    • J
      io_uring: ensure that cached task references are always put on exit · e775f93f
      Jens Axboe 提交于
      io_uring caches task references to avoid doing atomics for each of them
      per request. If a request is put from the same task that allocated it,
      then we can maintain a per-ctx cache of them. This obviously relies
      on io_uring always pruning caches in a reliable way, and there's
      currently a case off io_uring fd release where we can miss that.
      
      One example is a ring setup with IOPOLL, which relies on the task
      polling for completions, which will free them. However, if such a task
      submits a request and then exits or closes the ring without reaping
      the completion, then ring release will reap and put. If release happens
      from that very same task, the completed request task refs will get
      put back into the cache pool. This is problematic, as we're now beyond
      the point of pruning caches.
      
      Manually drop these caches after doing an IOPOLL reap. This releases
      references from the current task, which is enough. If another task
      happens to be doing the release, then the caching will not be
      triggered and there's no issue.
      
      Cc: stable@vger.kernel.org
      Fixes: e98e49b2 ("io_uring: extend task put optimisations")
      Reported-by: NHomin Rhee <hominlab@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e775f93f
    • P
      io_uring: fix CQE reordering · aa1df3a3
      Pavel Begunkov 提交于
      Overflowing CQEs may result in reordering, which is buggy in case of
      links, F_MORE and so on. If we guarantee that we don't reorder for
      the unlikely event of a CQ ring overflow, then we can further extend
      this to not have to terminate multishot requests if it happens. For
      other operations, like zerocopy sends, we have no choice but to honor
      CQE ordering.
      Reported-by: NDylan Yudaken <dylany@fb.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/ec3bc55687b0768bbe20fb62d7d06cfced7d7e70.1663892031.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      aa1df3a3
    • P
      io_uring/net: fix UAF in io_sendrecv_fail() · a75155fa
      Pavel Begunkov 提交于
      We should not assume anything about ->free_iov just from
      REQ_F_ASYNC_DATA but rather rely on REQ_F_NEED_CLEANUP, as we may
      allocate ->async_data but failed init would leave the field in not
      consistent state. The easiest solution is to remove removing
      REQ_F_NEED_CLEANUP and so ->async_data dealloc from io_sendrecv_fail()
      and let io_send_zc_cleanup() do the job. The catch here is that we also
      need to prevent double notif flushing, just test it for NULL and zero
      where it's needed.
      
      BUG: KASAN: use-after-free in io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
      Write of size 8 at addr ffff8880771b4080 by task syz-executor.3/30199
      
      CPU: 1 PID: 30199 Comm: syz-executor.3 Not tainted 6.0.0-rc6-next-20220923-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:284 [inline]
       print_report+0x15e/0x45d mm/kasan/report.c:395
       kasan_report+0xbb/0x1f0 mm/kasan/report.c:495
       io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
       io_req_complete_failed+0x155/0x1b0 io_uring/io_uring.c:873
       io_drain_req io_uring/io_uring.c:1648 [inline]
       io_queue_sqe_fallback.cold+0x29f/0x788 io_uring/io_uring.c:1931
       io_submit_sqe io_uring/io_uring.c:2160 [inline]
       io_submit_sqes+0x1180/0x1df0 io_uring/io_uring.c:2276
       __do_sys_io_uring_enter+0xac6/0x2410 io_uring/io_uring.c:3216
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: c4c0009e ("io_uring/net: combine fail handlers")
      Reported-by: syzbot+4c597a574a3f5a251bda@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/23ab8346e407ea50b1198a172c8a97e1cf22915b.1663945875.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      a75155fa
  11. 22 9月, 2022 2 次提交
    • J
      io_uring: ensure local task_work marks task as running · ec7fd256
      Jens Axboe 提交于
      io_uring will run task_work from contexts that have been prepared for
      waiting, and in doing so it'll implicitly set the task running again
      to avoid issues with blocking conditions. The new deferred local
      task_work doesn't do that, which can result in spews on this being
      an invalid condition:
      
      

[  112.917576] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000ad64af64>] prepare_to_wait_exclusive+0x3f/0xd0
      [  112.983088] WARNING: CPU: 1 PID: 190 at kernel/sched/core.c:9819 __might_sleep+0x5a/0x60
      [  112.987240] Modules linked in:
      [  112.990504] CPU: 1 PID: 190 Comm: io_uring Not tainted 6.0.0-rc6+ #1617
      [  113.053136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      [  113.133650] RIP: 0010:__might_sleep+0x5a/0x60
      [  113.136507] Code: ee 48 89 df 5b 31 d2 5d e9 33 ff ff ff 48 8b 90 30 0b 00 00 48 c7 c7 90 de 45 82 c6 05 20 8b 79 01 01 48 89 d1 e8 3a 49 77 00 <0f> 0b eb d1 66 90 0f 1f 44 00 00 9c 58 f6 c4 02 74 35 65 8b 05 ed
      [  113.223940] RSP: 0018:ffffc90000537ca0 EFLAGS: 00010286
      [  113.232903] RAX: 0000000000000000 RBX: ffffffff8246782c RCX: ffffffff8270bcc8
      IOPS=133.15K, BW=520MiB/s, IOS/call=32/31
      [  113.353457] RDX: ffffc90000537b50 RSI: 00000000ffffdfff RDI: 0000000000000001
      [  113.358970] RBP: 00000000000003bc R08: 0000000000000000 R09: c0000000ffffdfff
      [  113.361746] R10: 0000000000000001 R11: ffffc90000537b48 R12: ffff888103f97280
      [  113.424038] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
      [  113.428009] FS:  00007f67ae7fc700(0000) GS:ffff88842fc80000(0000) knlGS:0000000000000000
      [  113.432794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  113.503186] CR2: 00007f67b8b9b3b0 CR3: 0000000102b9b005 CR4: 0000000000770ee0
      [  113.507291] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  113.512669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  113.574374] PKRU: 55555554
      [  113.576800] Call Trace:
      [  113.578325]  <TASK>
      [  113.579799]  set_page_dirty_lock+0x1b/0x90
      [  113.582411]  __bio_release_pages+0x141/0x160
      [  113.673078]  ? set_next_entity+0xd7/0x190
      [  113.675632]  blk_rq_unmap_user+0xaa/0x210
      [  113.678398]  ? timerqueue_del+0x2a/0x40
      [  113.679578]  nvme_uring_task_cb+0x94/0xb0
      [  113.683025]  __io_run_local_work+0x8a/0x150
      [  113.743724]  ? io_cqring_wait+0x33d/0x500
      [  113.746091]  io_run_local_work.part.76+0x2e/0x60
      [  113.750091]  io_cqring_wait+0x2e7/0x500
      [  113.752395]  ? trace_event_raw_event_io_uring_req_failed+0x180/0x180
      [  113.823533]  __x64_sys_io_uring_enter+0x131/0x3c0
      [  113.827382]  ? switch_fpu_return+0x49/0xc0
      [  113.830753]  do_syscall_64+0x34/0x80
      [  113.832620]  entry_SYSCALL_64_after_hwframe+0x5e/0xc8
      
      Ensure that we mark current as TASK_RUNNING for deferred task_work
      as well.
      
      Fixes: c0e0d6ba ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
      Reported-by: NStefan Roesch <shr@fb.com>
      Reviewed-by: NDylan Yudaken <dylany@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ec7fd256
    • P