1. 11 2月, 2020 33 次提交
  2. 04 2月, 2020 7 次提交
    • J
      io_uring: add need_resched() check in inner poll loop · a28be734
      Jens Axboe 提交于
      commit 08f5439f1df25a6cf6cf4c72cf6c13025599ce67 upstream.
      
      The outer poll loop checks for whether we need to reschedule, and
      returns to userspace if we do. However, it's possible to get stuck
      in the inner loop as well, if the CPU we are running on needs to
      reschedule to finish the IO work.
      
      Add the need_resched() check in the inner loop as well. This fixes
      a potential hang if the kernel is configured with
      CONFIG_PREEMPT_VOLUNTARY=y.
      Reported-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      a28be734
    • J
      io_uring: don't enter poll loop if we have CQEs pending · ac7d1176
      Jens Axboe 提交于
      commit a3a0e43fd77013819e4b6f55e37e0efe8e35d805 upstream.
      
      We need to check if we have CQEs pending before starting a poll loop,
      as those could be the events we will be spinning for (and hence we'll
      find none). This can happen if a CQE triggers an error, or if it is
      found by eg an IRQ before we get a chance to find it through polling.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      ac7d1176
    • J
      io_uring: fix potential hang with polled IO · bea30534
      Jens Axboe 提交于
      commit 500f9fbadef86466a435726192f4ca4df7d94236 upstream.
      
      If a request issue ends up being punted to async context to avoid
      blocking, we can get into a situation where the original application
      enters the poll loop for that very request before it has been issued.
      This should not be an issue, except that the polling will hold the
      io_uring uring_ctx mutex for the duration of the poll. When the async
      worker has actually issued the request, it needs to acquire this mutex
      to add the request to the poll issued list. Since the application
      polling is already holding this mutex, the workqueue sleeps on the
      mutex forever, and the application thus never gets a chance to poll for
      the very request it was interested in.
      
      Fix this by ensuring that the polling drops the uring_ctx occasionally
      if it's not making any progress.
      Reported-by: NJeffrey M. Birnbaum <jmbnyc@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      bea30534
    • J
      io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list · 340538dc
      Jackie Liu 提交于
      commit a982eeb09b6030e567b8b815277c8c9197168040 upstream.
      
      This patch may fix two issues:
      
      First, when IOSQE_IO_DRAIN set, the next IOs need to be inserted into
      defer list to delay execution, but link io will be actively scheduled to
      run by calling io_queue_sqe.
      
      Second, when multiple LINK_IOs are inserted together with defer_list,
      the LINK_IO is no longer keep order.
      
         |-------------|
         |   LINK_IO   |      ----> insert to defer_list  -----------
         |-------------|                                            |
         |   LINK_IO   |      ----> insert to defer_list  ----------|
         |-------------|                                            |
         |   LINK_IO   |      ----> insert to defer_list  ----------|
         |-------------|                                            |
         |   NORMAL_IO |      ----> insert to defer_list  ----------|
         |-------------|                                            |
                                                                    |
                                    queue_work at same time   <-----|
      
      Fixes: 9e645e1105c ("io_uring: add support for sqe links")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      340538dc
    • A
      io_uring: fix manual setup of iov_iter for fixed buffers · 19dfcec1
      Aleix Roca Nonell 提交于
      commit 99c79f6692ccdc42e04deea8a36e22bb48168a62 upstream.
      
      Commit bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed
      buffers") introduced an optimization to avoid using the slow
      iov_iter_advance by manually populating the iov_iter iterator in some
      cases.
      
      However, the computation of the iterator count field was erroneous: The
      first bvec was always accounted for an extent of page size even if the
      bvec length was smaller.
      
      In consequence, some I/O operations on fixed buffers were unable to
      operate on the full extent of the buffer, consistently skipping some
      bytes at the end of it.
      
      Fixes: bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed buffers")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAleix Roca Nonell <aleix.rocanonell@bsc.es>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      19dfcec1
    • J
      io_uring: fix KASAN use after free in io_sq_wq_submit_work · 5707a64b
      Jackie Liu 提交于
      commit d0ee879187df966ef638031b5f5183078d672141 upstream.
      
      [root@localhost ~]# ./liburing/test/link
      
      QEMU Standard PC report that:
      
      [   29.379892] CPU: 0 PID: 84 Comm: kworker/u2:2 Not tainted 5.3.0-rc2-00051-g4010b622f1d2-dirty #86
      [   29.379902] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [   29.379913] Workqueue: io_ring-wq io_sq_wq_submit_work
      [   29.379929] Call Trace:
      [   29.379953]  dump_stack+0xa9/0x10e
      [   29.379970]  ? io_sq_wq_submit_work+0xbf4/0xe90
      [   29.379986]  print_address_description.cold.6+0x9/0x317
      [   29.379999]  ? io_sq_wq_submit_work+0xbf4/0xe90
      [   29.380010]  ? io_sq_wq_submit_work+0xbf4/0xe90
      [   29.380026]  __kasan_report.cold.7+0x1a/0x34
      [   29.380044]  ? io_sq_wq_submit_work+0xbf4/0xe90
      [   29.380061]  kasan_report+0xe/0x12
      [   29.380076]  io_sq_wq_submit_work+0xbf4/0xe90
      [   29.380104]  ? io_sq_thread+0xaf0/0xaf0
      [   29.380152]  process_one_work+0xb59/0x19e0
      [   29.380184]  ? pwq_dec_nr_in_flight+0x2c0/0x2c0
      [   29.380221]  worker_thread+0x8c/0xf40
      [   29.380248]  ? __kthread_parkme+0xab/0x110
      [   29.380265]  ? process_one_work+0x19e0/0x19e0
      [   29.380278]  kthread+0x30b/0x3d0
      [   29.380292]  ? kthread_create_on_node+0xe0/0xe0
      [   29.380311]  ret_from_fork+0x3a/0x50
      
      [   29.380635] Allocated by task 209:
      [   29.381255]  save_stack+0x19/0x80
      [   29.381268]  __kasan_kmalloc.constprop.6+0xc1/0xd0
      [   29.381279]  kmem_cache_alloc+0xc0/0x240
      [   29.381289]  io_submit_sqe+0x11bc/0x1c70
      [   29.381300]  io_ring_submit+0x174/0x3c0
      [   29.381311]  __x64_sys_io_uring_enter+0x601/0x780
      [   29.381322]  do_syscall_64+0x9f/0x4d0
      [   29.381336]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [   29.381633] Freed by task 84:
      [   29.382186]  save_stack+0x19/0x80
      [   29.382198]  __kasan_slab_free+0x11d/0x160
      [   29.382210]  kmem_cache_free+0x8c/0x2f0
      [   29.382220]  io_put_req+0x22/0x30
      [   29.382230]  io_sq_wq_submit_work+0x28b/0xe90
      [   29.382241]  process_one_work+0xb59/0x19e0
      [   29.382251]  worker_thread+0x8c/0xf40
      [   29.382262]  kthread+0x30b/0x3d0
      [   29.382272]  ret_from_fork+0x3a/0x50
      
      [   29.382569] The buggy address belongs to the object at ffff888067172140
                      which belongs to the cache io_kiocb of size 224
      [   29.384692] The buggy address is located 120 bytes inside of
                      224-byte region [ffff888067172140, ffff888067172220)
      [   29.386723] The buggy address belongs to the page:
      [   29.387575] page:ffffea00019c5c80 refcount:1 mapcount:0 mapping:ffff88806ace5180 index:0x0
      [   29.387587] flags: 0x100000000000200(slab)
      [   29.387603] raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806ace5180
      [   29.387617] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      [   29.387624] page dumped because: kasan: bad access detected
      
      [   29.387920] Memory state around the buggy address:
      [   29.388771]  ffff888067172080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [   29.390062]  ffff888067172100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [   29.391325] >ffff888067172180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   29.392578]                                         ^
      [   29.393480]  ffff888067172200: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
      [   29.394744]  ffff888067172280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   29.396003] ==================================================================
      [   29.397260] Disabling lock debugging due to kernel taint
      
      io_sq_wq_submit_work free and read req again.
      
      Cc: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
      Cc: linux-block@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: f7b76ac9d17e ("io_uring: fix counter inc/dec mismatch in async_list")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      5707a64b
    • J
      io_uring: ensure ->list is initialized for poll commands · 2d4ff975
      Jens Axboe 提交于
      commit 36703247d5f52a679df9da51192b6950fe81689f upstream.
      
      Daniel reports that when testing an http server that uses io_uring
      to poll for incoming connections, sometimes it hard crashes. This is
      due to an uninitialized list member for the io_uring request. Normally
      this doesn't trigger and none of the test cases caught it.
      Reported-by: NDaniel Kozak <kozzi11@gmail.com>
      Tested-by: NDaniel Kozak <kozzi11@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      2d4ff975