1. 03 9月, 2021 3 次提交
  2. 01 9月, 2021 4 次提交
    • P
      io_uring: don't submit half-prepared drain request · b8ce1b9d
      Pavel Begunkov 提交于
      [ 3784.910888] BUG: kernel NULL pointer dereference, address: 0000000000000020
      [ 3784.910904] RIP: 0010:__io_file_supports_nowait+0x5/0xc0
      [ 3784.910926] Call Trace:
      [ 3784.910928]  ? io_read+0x17c/0x480
      [ 3784.910945]  io_issue_sqe+0xcb/0x1840
      [ 3784.910953]  __io_queue_sqe+0x44/0x300
      [ 3784.910959]  io_req_task_submit+0x27/0x70
      [ 3784.910962]  tctx_task_work+0xeb/0x1d0
      [ 3784.910966]  task_work_run+0x61/0xa0
      [ 3784.910968]  io_run_task_work_sig+0x53/0xa0
      [ 3784.910975]  __x64_sys_io_uring_enter+0x22/0x30
      [ 3784.910977]  do_syscall_64+0x3d/0x90
      [ 3784.910981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      io_drain_req() goes before checks for REQ_F_FAIL, which protect us from
      submitting under-prepared request (e.g. failed in io_init_req(). Fail
      such drained requests as well.
      
      Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/e411eb9924d47a131b1e200b26b675df0c2b7627.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      b8ce1b9d
    • P
      io_uring: fix queueing half-created requests · c6d3d9cb
      Pavel Begunkov 提交于
      [   27.259845] general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
      [   27.261043] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      [   27.263730] RIP: 0010:sock_from_file+0x20/0x90
      [   27.272444] Call Trace:
      [   27.272736]  io_sendmsg+0x98/0x600
      [   27.279216]  io_issue_sqe+0x498/0x68d0
      [   27.281142]  __io_queue_sqe+0xab/0xb50
      [   27.285830]  io_req_task_submit+0xbf/0x1b0
      [   27.286306]  tctx_task_work+0x178/0xad0
      [   27.288211]  task_work_run+0xe2/0x190
      [   27.288571]  exit_to_user_mode_prepare+0x1a1/0x1b0
      [   27.289041]  syscall_exit_to_user_mode+0x19/0x50
      [   27.289521]  do_syscall_64+0x48/0x90
      [   27.289871]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      io_req_complete_failed() -> io_req_complete_post() ->
      io_req_task_queue() still would try to enqueue hard linked request,
      which can be half prepared (e.g. failed init), so we can't allow
      that to happen.
      
      Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
      Reported-by: syzbot+f9704d1878e290eddf73@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/70b513848c1000f88bd75965504649c6bb1415c0.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      c6d3d9cb
    • M
      io_uring: retry in case of short read on block device · 7db30437
      Ming Lei 提交于
      In case of buffered reading from block device, when short read happens,
      we should retry to read more, otherwise the IO will be completed
      partially, for example, the following fio expects to read 2MB, but it
      can only read 1M or less bytes:
      
          fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \
      	--rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \
      	--iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \
      	--registerfiles --fixedbufs --gtod_reduce=1 --group_reporting
      
      Fix the issue by allowing short read retry for block device, which sets
      FMODE_BUF_RASYNC really.
      
      Fixes: 9a173346 ("io_uring: fix short read retries for non-reg files")
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      7db30437
    • J
      io_uring: IORING_OP_WRITE needs hash_reg_file set · 7b3188e7
      Jens Axboe 提交于
      During some testing, it became evident that using IORING_OP_WRITE doesn't
      hash buffered writes like the other writes commands do. That's simply
      an oversight, and can cause performance regressions when doing buffered
      writes with this command.
      
      Correct that and add the flag, so that buffered writes are correctly
      hashed when using the non-iovec based write command.
      
      Cc: stable@vger.kernel.org
      Fixes: 3a6820f2 ("io_uring: add non-vectored read/write commands")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b3188e7
  3. 30 8月, 2021 2 次提交
  4. 29 8月, 2021 2 次提交
    • J
      io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts · 50c1df2b
      Jens Axboe 提交于
      Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than
      CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC.
      
      Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that
      allows timeouts and linked timeouts to use the selected clock source.
      
      Only one clock source may be selected, and we -EINVAL the request if more
      than one is given. If neither BOOTIME nor REALTIME are selected, the
      previous default of MONOTONIC is used.
      
      Link: https://github.com/axboe/liburing/issues/369Signed-off-by: NJens Axboe <axboe@kernel.dk>
      50c1df2b
    • J
      io-wq: provide a way to limit max number of workers · 2e480058
      Jens Axboe 提交于
      io-wq divides work into two categories:
      
      1) Work that completes in a bounded time, like reading from a regular file
         or a block device. This type of work is limited based on the size of
         the SQ ring.
      
      2) Work that may never complete, we call this unbounded work. The amount
         of workers here is just limited by RLIMIT_NPROC.
      
      For various uses cases, it's handy to have the kernel limit the maximum
      amount of pending workers for both categories. Provide a way to do with
      with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.
      
      IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
      the max worker count to what is being passed in for each category. The
      old values are returned into that same array. If 0 is being passed in for
      either category, it simply returns the current value.
      
      The value is capped at RLIMIT_NPROC. This actually isn't that important
      as it's more of a hint, if we're exceeding the value then our attempt
      to fork a new worker will fail. This happens naturally already if more
      than one node is in the system, as these values are per-node internally
      for io-wq.
      Reported-by: NJohannes Lundberg <johalun0@gmail.com>
      Link: https://github.com/axboe/liburing/issues/420Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2e480058
  5. 27 8月, 2021 5 次提交
  6. 26 8月, 2021 1 次提交
  7. 25 8月, 2021 3 次提交
  8. 24 8月, 2021 20 次提交