1. 28 4月, 2020 1 次提交
    • J
      io_uring: statx must grab the file table for valid fd · 5b0bbee4
      Jens Axboe 提交于
      Clay reports that OP_STATX fails for a test case with a valid fd
      and empty path:
      
       -- Test 0: statx:fd 3: SUCCEED, file mode 100755
       -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755
       -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor
       -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755
      
      This is due to statx not grabbing the process file table, hence we can't
      lookup the fd in async context. If the fd is valid, ensure that we grab
      the file table so we can grab the file from async context.
      
      Cc: stable@vger.kernel.org # v5.6
      Reported-by: NClay Harris <bugs@claycon.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5b0bbee4
  2. 20 4月, 2020 1 次提交
    • X
      io_uring: only restore req->work for req that needs do completion · 44575a67
      Xiaoguang Wang 提交于
      When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic:
      BUG: kernel NULL pointer dereference, address: 0000000000000030
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:io_wq_submit_work+0xf/0xa0
      Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f
      84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b>
      45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a
      RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286
      RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000
      RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8
      RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8
      R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8
      FS:  00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0
      Call Trace:
       task_work_run+0x6d/0xa0
       do_exit+0x39a/0xb80
       ? get_signal+0xfe/0xbc0
       do_group_exit+0x47/0xb0
       get_signal+0x14b/0xbc0
       ? __x64_sys_io_uring_enter+0x1b7/0x450
       do_signal+0x2c/0x260
       ? __x64_sys_io_uring_enter+0x228/0x450
       exit_to_usermode_loop+0x87/0xf0
       do_syscall_64+0x209/0x230
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x7fe9f64f8df9
      Code: Bad RIP value.
      
      task_work_run calls io_wq_submit_work unexpectedly, it's obvious that
      struct callback_head's func member has been changed. After looking into
      codes, I found this issue is still due to the union definition:
          union {
              /*
               * Only commands that never go async can use the below fields,
               * obviously. Right now only IORING_OP_POLL_ADD uses them, and
               * async armed poll handlers for regular commands. The latter
               * restore the work, if needed.
               */
              struct {
                  struct callback_head	task_work;
                  struct hlist_node	hash_node;
                  struct async_poll	*apoll;
              };
              struct io_wq_work	work;
          };
      
      When task_work_run has multiple work to execute, the work that calls
      io_poll_remove_all() will do req->work restore for  non-poll request
      always, but indeed if a non-poll request has been added to a new
      callback_head, subsequent callback will call io_async_task_func() to
      handle this request, that means we should not do the restore work
      for such non-poll request. Meanwhile in io_async_task_func(), we should
      drop submit ref when req has been canceled.
      
      Fix both issues.
      
      Fixes: b1f573bd ("io_uring: restore req->work when canceling poll request")
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      
      Use io_double_put_req()
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      44575a67
  3. 15 4月, 2020 3 次提交
  4. 14 4月, 2020 3 次提交
    • J
      io_uring: only post events in io_poll_remove_all() if we completed some · 8e2e1faf
      Jens Axboe 提交于
      syzbot reports this crash:
      
      BUG: unable to handle page fault for address: ffffffffffffffe8
      PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G    B        L    5.7.0-rc1-next-20200413 #4
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017
      RIP: 0010:__wake_up_common+0x98/0x290
      el/sched/wait.c:87
      Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
      RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
      RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
      RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
      R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
      R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
      FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
      Call Trace:
       __wake_up_common_lock+0xea/0x150
      ommon_lock at kernel/sched/wait.c:124
       ? __wake_up_common+0x290/0x290
       ? lockdep_hardirqs_on+0x16/0x2c0
       __wake_up+0x13/0x20
       io_cqring_ev_posted+0x75/0xe0
      v_posted at fs/io_uring.c:1160
       io_ring_ctx_wait_and_kill+0x1c0/0x2f0
      l at fs/io_uring.c:7305
       io_uring_create+0xa8d/0x13b0
       ? io_req_defer_prep+0x990/0x990
       ? __kasan_check_write+0x14/0x20
       io_uring_setup+0xb8/0x130
       ? io_uring_create+0x13b0/0x13b0
       ? check_flags.part.28+0x220/0x220
       ? lockdep_hardirqs_on+0x16/0x2c0
       __x64_sys_io_uring_setup+0x31/0x40
       do_syscall_64+0xcc/0xaf0
       ? syscall_return_slowpath+0x580/0x580
       ? lockdep_hardirqs_off+0x1f/0x140
       ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
       ? trace_hardirqs_off_caller+0x3a/0x150
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x7fdcb9dd76ed
      Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed
      RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54
      RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000
      R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002
      R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000
      Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc]
      CR2: ffffffffffffffe8
      ---[ end trace f9502383d57e0e22 ]---
      RIP: 0010:__wake_up_common+0x98/0x290
      Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
      RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
      RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
      RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
      R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
      R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
      FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
      Kernel panic - not syncing: Fatal exception
      Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      ---[ end Kernel panic - not syncing: Fatal exception ]—
      
      which is due to error injection (or allocation failure) preventing the
      rings from being setup. On shutdown, we attempt to remove any pending
      requests, and for poll request, we call io_cqring_ev_posted() when we've
      killed poll requests. However, since the rings aren't setup, we won't
      find any poll requests. Make the calling of io_cqring_ev_posted()
      dependent on actually having completed requests. This fixes this setup
      corner case, and removes spurious calls if we remove poll requests and
      don't find any.
      Reported-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8e2e1faf
    • J
      io_uring: io_async_task_func() should check and honor cancelation · 2bae047e
      Jens Axboe 提交于
      If the request has been marked as canceled, don't try and issue it.
      Instead just fill a canceled event and finish the request.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2bae047e
    • J
      io_uring: check for need to re-wait in polled async handling · 74ce6ce4
      Jens Axboe 提交于
      We added this for just the regular poll requests in commit a6ba632d
      ("io_uring: retry poll if we got woken with non-matching mask"), we
      should do the same for the poll handler used pollable async requests.
      Move the re-wait check and arm into a helper, and call it from
      io_async_task_func() as well.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74ce6ce4
  5. 13 4月, 2020 1 次提交
    • J
      io_uring: correct O_NONBLOCK check for splice punt · 88357580
      Jens Axboe 提交于
      The splice file punt check uses file->f_mode to check for O_NONBLOCK,
      but it should be checking file->f_flags. This leads to punting even
      for files that have O_NONBLOCK set, which isn't necessary. This equates
      to checking for FMODE_PATH, which will never be set on the fd in
      question.
      
      Fixes: 7d67af2c ("io_uring: add splice(2) support")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88357580
  6. 12 4月, 2020 6 次提交
    • X
      io_uring: restore req->work when canceling poll request · b1f573bd
      Xiaoguang Wang 提交于
      When running liburing test case 'accept', I got below warning:
      RED: Invalid credentials
      RED: At include/linux/cred.h:285
      RED: Specified credentials: 00000000d02474a0
      RED: ->magic=4b, put_addr=000000005b4f46e9
      RED: ->usage=-1699227648, subscr=-25693
      RED: ->*uid = { 256,-25693,-25693,65534 }
      RED: ->*gid = { 0,-1925859360,-1789740800,-1827028688 }
      RED: ->security is 00000000258c136e
      eneral protection fault, probably for non-canonical address 0xdead4ead00000000: 0000 [#1] SMP PTI
      PU: 21 PID: 2037 Comm: accept Not tainted 5.6.0+ #318
      ardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
      IP: 0010:dump_invalid_creds+0x16f/0x184
      ode: 48 8b 83 88 00 00 00 48 3d ff 0f 00 00 76 29 48 89 c2 81 e2 00 ff ff ff 48
      81 fa 00 6b 6b 6b 74 17 5b 48 c7 c7 4b b1 10 8e 5d <8b> 50 04 41 5c 8b 30 41 5d
      e9 67 e3 04 00 5b 5d 41 5c 41 5d c3 0f
      SP: 0018:ffffacc1039dfb38 EFLAGS: 00010087
      AX: dead4ead00000000 RBX: ffff9ba39319c100 RCX: 0000000000000007
      DX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e10b14b
      BP: ffffffff8e108476 R08: 0000000000000000 R09: 0000000000000001
      10: 0000000000000000 R11: ffffacc1039df9e5 R12: 000000009552b900
      13: 000000009319c130 R14: ffff9ba39319c100 R15: 0000000000000246
      S:  00007f96b2bfc4c0(0000) GS:ffff9ba39f340000(0000) knlGS:0000000000000000
      S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      R2: 0000000000401870 CR3: 00000007db7a4000 CR4: 00000000000006e0
      all Trace:
      __invalid_creds+0x48/0x4a
      __io_req_aux_free+0x2e8/0x3b0
      ? io_poll_remove_one+0x2a/0x1d0
      __io_free_req+0x18/0x200
      io_free_req+0x31/0x350
      io_poll_remove_one+0x17f/0x1d0
      io_poll_cancel.isra.80+0x6c/0x80
      io_async_find_and_cancel+0x111/0x120
      io_issue_sqe+0x181/0x10e0
      ? __lock_acquire+0x552/0xae0
      ? lock_acquire+0x8e/0x310
      ? fs_reclaim_acquire.part.97+0x5/0x30
      __io_queue_sqe.part.100+0xc4/0x580
      ? io_submit_sqes+0x751/0xbd0
      ? rcu_read_lock_sched_held+0x32/0x40
      io_submit_sqes+0x9ba/0xbd0
      ? __x64_sys_io_uring_enter+0x2b2/0x460
      ? __x64_sys_io_uring_enter+0xaf/0x460
      ? find_held_lock+0x2d/0x90
      ? __x64_sys_io_uring_enter+0x111/0x460
      __x64_sys_io_uring_enter+0x2d7/0x460
      do_syscall_64+0x5a/0x230
      entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      After looking into codes, it turns out that this issue is because we didn't
      restore the req->work, which is changed in io_arm_poll_handler(), req->work
      is a union with below struct:
      	struct {
      		struct callback_head	task_work;
      		struct hlist_node	hash_node;
      		struct async_poll	*apoll;
      	};
      If we forget to restore, members in struct io_wq_work would be invalid,
      restore the req->work to fix this issue.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      
      Get rid of not needed 'need_restore' variable.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f573bd
    • P
      io_uring: move all request init code in one place · ef4ff581
      Pavel Begunkov 提交于
      Requests initialisation is scattered across several functions, namely
      io_init_req(), io_submit_sqes(), io_submit_sqe(). Put it
      in io_init_req() for better data locality and code clarity.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ef4ff581
    • P
      io_uring: keep all sqe->flags in req->flags · dea3b49c
      Pavel Begunkov 提交于
      It's a good idea to not read sqe->flags twice, as it's prone to security
      bugs. Instead of passing it around, embeed them in req->flags. It's
      already so except for IOSQE_IO_LINK.
      1. rename former REQ_F_LINK -> REQ_F_LINK_HEAD
      2. introduce and copy REQ_F_LINK, which mimics IO_IOSQE_LINK
      
      And leave req_set_fail_links() using new REQ_F_LINK, because it's more
      sensible.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dea3b49c
    • P
      io_uring: early submission req fail code · 1d4240cc
      Pavel Begunkov 提交于
      Having only one place for cleaning up a request after a link assembly/
      submission failure will play handy in the future. At least it allows
      to remove duplicated cleanup sequence.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d4240cc
    • P
      io_uring: track mm through current->mm · bf9c2f1c
      Pavel Begunkov 提交于
      As a preparation for extracting request init bits, remove self-coded mm
      tracking from io_submit_sqes(), but rely on current->mm. It's more
      convenient, than passing this piece of state in other functions.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf9c2f1c
    • P
      io_uring: remove obsolete @mm_fault · dccc587f
      Pavel Begunkov 提交于
      If io_submit_sqes() can't grab an mm, it fails and exits right away.
      There is no need to track the fact of the failure. Remove @mm_fault.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dccc587f
  7. 10 4月, 2020 1 次提交
    • J
      io_uring: punt final io_ring_ctx wait-and-free to workqueue · 85faa7b8
      Jens Axboe 提交于
      We can't reliably wait in io_ring_ctx_wait_and_kill(), since the
      task_works list isn't ordered (in fact it's LIFO ordered). We could
      either fix this with a separate task_works list for io_uring work, or
      just punt the wait-and-free to async context. This ensures that
      task_work that comes in while we're shutting down is processed
      correctly. If we don't go async, we could have work past the fput()
      work for the ring that depends on work that won't be executed until
      after we're done with the wait-and-free. But as this operation is
      blocking, it'll never get a chance to run.
      
      This was reproduced with hundreds of thousands of sockets running
      memcached, haven't been able to reproduce this synthetically.
      Reported-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85faa7b8
  8. 09 4月, 2020 1 次提交
  9. 08 4月, 2020 6 次提交
  10. 07 4月, 2020 2 次提交
    • X
      io_uring: initialize fixed_file_data lock · f7fe9346
      Xiaoguang Wang 提交于
      syzbot reports below warning:
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       assign_lock_key kernel/locking/lockdep.c:913 [inline]
       register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
       __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223
       lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
       io_sqe_files_register fs/io_uring.c:6599 [inline]
       __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001
       __do_sys_io_uring_register fs/io_uring.c:8081 [inline]
       __se_sys_io_uring_register fs/io_uring.c:8063 [inline]
       __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x440289
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289
      RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10
      R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000
      
      Initialize struct fixed_file_data's lock to fix this issue.
      
      Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com
      Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling")
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f7fe9346
    • C
      io_uring: remove redundant variable pointer nxt and io_wq_assign_next call · 211fea18
      Colin Ian King 提交于
      An earlier commit "io_uring: remove @nxt from handlers" removed the
      setting of pointer nxt and now it is always null, hence the non-null
      check and call to io_wq_assign_next is redundant and can be removed.
      
      Addresses-Coverity: ("'Constant' variable guard")
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      211fea18
  11. 06 4月, 2020 1 次提交
  12. 04 4月, 2020 5 次提交
  13. 01 4月, 2020 1 次提交
  14. 31 3月, 2020 1 次提交
    • X
      io_uring: refactor file register/unregister/update handling · 05589553
      Xiaoguang Wang 提交于
      While diving into io_uring fileset register/unregister/update codes, we
      found one bug in the fileset update handling. io_uring fileset update
      use a percpu_ref variable to check whether we can put the previously
      registered file, only when the refcnt of the perfcpu_ref variable
      reaches zero, can we safely put these files. But this doesn't work so
      well. If applications always issue requests continually, this
      perfcpu_ref will never have an chance to reach zero, and it'll always be
      in atomic mode, also will defeat the gains introduced by fileset
      register/unresiger/update feature, which are used to reduce the atomic
      operation overhead of fput/fget.
      
      To fix this issue, while applications do IORING_REGISTER_FILES or
      IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref
      and kill the old percpu_ref, new requests will use the new percpu_ref.
      Once all previous old requests complete, old percpu_refs will be dropped
      and registered files will be put safely.
      
      Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#tSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05589553
  15. 27 3月, 2020 1 次提交
  16. 25 3月, 2020 1 次提交
  17. 23 3月, 2020 3 次提交
    • H
      io-uring: drop 'free_pfile' in struct io_file_put · a5318d3c
      Hillf Danton 提交于
      Sync removal of file is only used in case of a GFP_KERNEL kmalloc
      failure at the cost of io_file_put::done and work flush, while a
      glich like it can be handled at the call site without too much pain.
      
      That said, what is proposed is to drop sync removing of file, and
      the kink in neck as well.
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a5318d3c
    • H
      io-uring: drop completion when removing file · 4afdb733
      Hillf Danton 提交于
      A case of task hung was reported by syzbot,
      
      INFO: task syz-executor975:9880 blocked for more than 143 seconds.
            Not tainted 5.6.0-rc6-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor975 D27576  9880   9878 0x80004000
      Call Trace:
       schedule+0xd0/0x2a0 kernel/sched/core.c:4154
       schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
       do_wait_for_common kernel/sched/completion.c:83 [inline]
       __wait_for_common kernel/sched/completion.c:104 [inline]
       wait_for_common kernel/sched/completion.c:115 [inline]
       wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
       io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
       __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
       io_sqe_files_update fs/io_uring.c:5918 [inline]
       __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
       __do_sys_io_uring_register fs/io_uring.c:7202 [inline]
       __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
       __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      and bisect pointed to 05f3fb3c ("io_uring: avoid ring quiesce for
      fixed file set unregister and update").
      
      It is down to the order that we wait for work done before flushing it
      while nobody is likely going to wake us up.
      
      We can drop that completion on stack as flushing work itself is a sync
      operation we need and no more is left behind it.
      
      To that end, io_file_put::done is re-used for indicating if it can be
      freed in the workqueue worker context.
      Reported-and-Inspired-by: Nsyzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      
      Rename ->done to ->free_pfile
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4afdb733
    • P
      io_uring: Fix ->data corruption on re-enqueue · 18a542ff
      Pavel Begunkov 提交于
      work->data and work->list are shared in union. io_wq_assign_next() sets
      ->data if a req having a linked_timeout, but then io-wq may want to use
      work->list, e.g. to do re-enqueue of a request, so corrupting ->data.
      
      ->data is not necessary, just remove it and extract linked_timeout
      through @Link_list.
      
      Fixes: 60cf46ae ("io-wq: hash dependent work")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      18a542ff
  18. 21 3月, 2020 1 次提交
    • J
      io_uring: honor original task RLIMIT_FSIZE · 4ed734b0
      Jens Axboe 提交于
      With the previous fixes for number of files open checking, I added some
      debug code to see if we had other spots where we're checking rlimit()
      against the async io-wq workers. The only one I found was file size
      checking, which we should also honor.
      
      During write and fallocate prep, store the max file size and override
      that for the current ask if we're in io-wq worker context.
      
      Cc: stable@vger.kernel.org # 5.1+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ed734b0
  19. 20 3月, 2020 1 次提交