1. 04 2月, 2020 8 次提交
    • Z
      io_uring: add a memory barrier before atomic_read · 6a9af3c7
      Zhengyuan Liu 提交于
      commit c0e48f9dea9129aa11bec3ed13803bcc26e96e49 upstream.
      
      There is a hang issue while using fio to do some basic test. The issue
      can be easily reproduced using the below script:
      
              while true
              do
                      fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
                           -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
              done
      
      After several minutes (or more), fio would block at
      io_uring_enter->io_cqring_wait in order to waiting for previously
      committed sqes to be completed and can't return to user anymore until
      we send a SIGTERM to fio. After receiving SIGTERM, fio hangs at
      io_ring_ctx_wait_and_kill with a backtrace like this:
      
              [54133.243816] Call Trace:
              [54133.243842]  __schedule+0x3a0/0x790
              [54133.243868]  schedule+0x38/0xa0
              [54133.243880]  schedule_timeout+0x218/0x3b0
              [54133.243891]  ? sched_clock+0x9/0x10
              [54133.243903]  ? wait_for_completion+0xa3/0x130
              [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
              [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
              [54133.243951]  wait_for_completion+0xab/0x130
              [54133.243962]  ? wake_up_q+0x70/0x70
              [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
              [54133.243998]  io_uring_release+0x20/0x30
              [54133.244008]  __fput+0xcf/0x270
              [54133.244029]  ____fput+0xe/0x10
              [54133.244040]  task_work_run+0x7f/0xa0
              [54133.244056]  do_exit+0x305/0xc40
              [54133.244067]  ? get_signal+0x13b/0xbd0
              [54133.244088]  do_group_exit+0x50/0xd0
              [54133.244103]  get_signal+0x18d/0xbd0
              [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
              [54133.244142]  do_signal+0x34/0x720
              [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
              [54133.244190]  exit_to_usermode_loop+0xc0/0x130
              [54133.244209]  do_syscall_64+0x16b/0x1d0
              [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that we had added a req to ctx->pending_async at the very
      end, but it didn't get a chance to be processed. How could this happen?
      
              fio#cpu0                                        wq#cpu1
      
              io_add_to_prev_work                    io_sq_wq_submit_work
      
                atomic_read() <<< 1
      
                                                        atomic_dec_return() << 1->0
                                                        list_empty();    <<< true;
      
                list_add_tail()
                atomic_read() << 0 or 1?
      
      As atomic_ops.rst states, atomic_read does not guarantee that the
      runtime modification by any other thread is visible yet, so we must take
      care of that with a proper implicit or explicit memory barrier.
      
      This issue was detected with the help of Jackie's <liuyun01@kylinos.cn>
      
      Fixes: 31b515106428 ("io_uring: allow workqueue item to handle multiple buffered requests")
      Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      6a9af3c7
    • O
      signal: simplify set_user_sigmask/restore_user_sigmask · a4dd0237
      Oleg Nesterov 提交于
      commit b772434be0891ed1081a08ae7cfd4666728f8e82 upstream.
      
      task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
      syscall paths.  This means that set_user_sigmask() can save ->blocked in
      ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
      was modified.
      
      This way the callers do not need 2 sigset_t's passed to set/restore and
      restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
      into the trivial helper which just calls restore_saved_sigmask().
      
      Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Wong <e@80x24.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      a4dd0237
    • Z
      io_uring: fix counter inc/dec mismatch in async_list · d7f2d971
      Zhengyuan Liu 提交于
      commit f7b76ac9d17e16e44feebb6d2749fec92bfd6dd4 upstream.
      
      We could queue a work for each req in defer and link list without
      increasing async_list->cnt, so we shouldn't decrease it while exiting
      from workqueue as well if we didn't process the req in async list.
      
      Thanks to Jens Axboe <axboe@kernel.dk> for his guidance.
      
      Fixes: 31b515106428 ("io_uring: allow workqueue item to handle multiple buffered requests")
      Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      d7f2d971
    • Z
      io_uring: fix the sequence comparison in io_sequence_defer · 170a1188
      Zhengyuan Liu 提交于
      commit dbd0f6d6c2a11eb9c31ca9cd454f95bb5713e92e upstream.
      
      sq->cached_sq_head and cq->cached_cq_tail are both unsigned int. If
      cached_sq_head overflows before cached_cq_tail, then we may miss a
      barrier req. As cached_cq_tail always follows cached_sq_head, the NQ
      should be enough.
      
      Cc: stable@vger.kernel.org
      Fixes: de0617e46717 ("io_uring: add support for marking commands as draining")
      Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      170a1188
    • J
      io_uring: fix io_sq_thread_stop running in front of io_sq_thread · db4c235e
      Jackie Liu 提交于
      commit a4c0b3decb33fb4a2b5ecc6234a50680f0b21e7d upstream.
      
      INFO: task syz-executor.5:8634 blocked for more than 143 seconds.
             Not tainted 5.2.0-rc5+ #3
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor.5  D25632  8634   8224 0x00004004
      Call Trace:
        context_switch kernel/sched/core.c:2818 [inline]
        __schedule+0x658/0x9e0 kernel/sched/core.c:3445
        schedule+0x131/0x1d0 kernel/sched/core.c:3509
        schedule_timeout+0x9a/0x2b0 kernel/time/timer.c:1783
        do_wait_for_common+0x35e/0x5a0 kernel/sched/completion.c:83
        __wait_for_common kernel/sched/completion.c:104 [inline]
        wait_for_common kernel/sched/completion.c:115 [inline]
        wait_for_completion+0x47/0x60 kernel/sched/completion.c:136
        kthread_stop+0xb4/0x150 kernel/kthread.c:559
        io_sq_thread_stop fs/io_uring.c:2252 [inline]
        io_finish_async fs/io_uring.c:2259 [inline]
        io_ring_ctx_free fs/io_uring.c:2770 [inline]
        io_ring_ctx_wait_and_kill+0x268/0x880 fs/io_uring.c:2834
        io_uring_release+0x5d/0x70 fs/io_uring.c:2842
        __fput+0x2e4/0x740 fs/file_table.c:280
        ____fput+0x15/0x20 fs/file_table.c:313
        task_work_run+0x17e/0x1b0 kernel/task_work.c:113
        tracehook_notify_resume include/linux/tracehook.h:185 [inline]
        exit_to_usermode_loop arch/x86/entry/common.c:168 [inline]
        prepare_exit_to_usermode+0x402/0x4f0 arch/x86/entry/common.c:199
        syscall_return_slowpath+0x110/0x440 arch/x86/entry/common.c:279
        do_syscall_64+0x126/0x140 arch/x86/entry/common.c:304
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x412fb1
      Code: 80 3b 7c 0f 84 c7 02 00 00 c7 85 d0 00 00 00 00 00 00 00 48 8b 05 cf
      a6 24 00 49 8b 14 24 41 b9 cb 2a 44 00 48 89 ee 48 89 df <48> 85 c0 4c 0f
      45 c8 45 31 c0 31 c9 e8 0e 5b 00 00 85 c0 41 89 c7
      RSP: 002b:00007ffe7ee6a180 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000412fb1
      RDX: 0000001b2d920000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 0000000000000001 R08: 00000000f3a3e1f8 R09: 00000000f3a3e1fc
      R10: 00007ffe7ee6a260 R11: 0000000000000293 R12: 000000000075c9a0
      R13: 000000000075c9a0 R14: 0000000000024c00 R15: 000000000075bf2c
      
      =============================================
      
      There is an wrong logic, when kthread_park running
      in front of io_sq_thread.
      
      CPU#0					CPU#1
      
      io_sq_thread_stop:			int kthread(void *_create):
      
      kthread_park()
      					__kthread_parkme(self);	 <<< Wrong
      kthread_stop()
          << wait for self->exited
          << clear_bit KTHREAD_SHOULD_PARK
      
      					ret = threadfn(data);
      					   |
      					   |- io_sq_thread
      					       |- kthread_should_park()	<< false
      					       |- schedule() <<< nobody wake up
      
      stuck CPU#0				stuck CPU#1
      
      So, use a new variable sqo_thread_started to ensure that io_sq_thread
      run first, then io_sq_thread_stop.
      
      Reported-by: syzbot+94324416c485d422fe15@syzkaller.appspotmail.com
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      db4c235e
    • J
      io_uring: add support for recvmsg() · 7cabfcb1
      Jens Axboe 提交于
      commit aa1fa28fc73ea6b740ee7b62bf3b07141883dbb8 upstream.
      
      This is done through IORING_OP_RECVMSG. This opcode uses the same
      sqe->msg_flags that IORING_OP_SENDMSG added, and we pass in the
      msghdr struct in the sqe->addr field as well.
      
      We use MSG_DONTWAIT to force an inline fast path if recvmsg() doesn't
      block, and punt to async execution if it would have.
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      7cabfcb1
    • J
      io_uring: add support for sendmsg() · ce630b99
      Jens Axboe 提交于
      commit 0fa03c624d8fc9932d0f27c39a9deca6a37e0e17 upstream.
      
      This is done through IORING_OP_SENDMSG. There's a new sqe->msg_flags
      for the flags argument, and the msghdr struct is passed in the
      sqe->addr field.
      
      We use MSG_DONTWAIT to force an inline fast path if sendmsg() doesn't
      block, and punt to async execution if it would have.
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      ce630b99
    • C
      block: never take page references for ITER_BVEC · 3f9da4d9
      Christoph Hellwig 提交于
      Cherry-pick from commit b620743077e291ae7d0debd21f50413a8c266229 upstream.
      
      If we pass pages through an iov_iter we always already have a reference
      in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
      reference to pages by default for bvec backed iov_iters.
      
      [Joseph] Resolve conflicts since we don't have:
      81ba6abd2bcd "block: loop: mark bvec as ITER_BVEC_FLAG_NO_REF"
      7321ecbfc7cf "block: change how we get page references in bio_iov_iter_get_pages"
      Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      3f9da4d9
  2. 03 2月, 2020 24 次提交
  3. 19 1月, 2020 8 次提交