1. 03 6月, 2021 2 次提交
  2. 26 4月, 2021 1 次提交
  3. 22 4月, 2021 7 次提交
  4. 19 4月, 2021 1 次提交
  5. 13 4月, 2021 4 次提交
    • J
      io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return · c968d1d8
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.26
      commit 76f496681d6a125d28321deda355ca14d0e4ad23
      bugzilla: 51363
      
      --------------------------------
      
      [ Upstream commit b5b0ecb7 ]
      
      The callback can only be armed, if we get -EIOCBQUEUED returned. It's
      important that we clear the WAITQ bit for other cases, otherwise we can
      queue for async retry and filemap will assume that we're armed and
      return -EAGAIN instead of just blocking for the IO.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c968d1d8
    • J
      io_uring: don't attempt IO reissue from the ring exit path · 99f98c5a
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.26
      commit 3c08f772ad0db70876021aa5d276e14747f77512
      bugzilla: 51363
      
      --------------------------------
      
      [ Upstream commit 7c977a58 ]
      
      If we're exiting the ring, just let the IO fail with -EAGAIN as nobody
      will care anyway. It's not the right context to reissue from.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      99f98c5a
    • P
      io_uring: fix inconsistent lock state · 58a77c0f
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.26
      commit 1c20e9040f49687ba2ccc2ffd4411351a6c2ebff
      bugzilla: 51363
      
      --------------------------------
      
      [ Upstream commit 9ae1f8dd ]
      
      WARNING: inconsistent lock state
      
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes:
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&fs->lock);
        <Interrupt>
          lock(&fs->lock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor217/8450:
       #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442
      
      stack backtrace:
      CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
      [...]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_req_clean_work fs/io_uring.c:1398 [inline]
       io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
       __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046
       io_free_req fs/io_uring.c:2269 [inline]
       io_double_put_req fs/io_uring.c:2392 [inline]
       io_put_req+0xf9/0x570 fs/io_uring.c:2388
       io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497
       __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
       __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
       hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102
       asm_call_irq_on_stack+0xf/0x20
       </IRQ>
       __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline]
       run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline]
       sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
      RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
      RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199
       spin_unlock_irq include/linux/spinlock.h:404 [inline]
       io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525
       __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594
       io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639
       io_queue_link_head fs/io_uring.c:6650 [inline]
       io_submit_sqe fs/io_uring.c:6697 [inline]
       io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960
       __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Don't free requests from under hrtimer context (softirq) as it may sleep
      or take spinlocks improperly (e.g. non-irq versions).
      
      Cc: stable@vger.kernel.org # 5.6+
      Reported-by: syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      58a77c0f
    • J
      io_uring: ensure that SQPOLL thread is started for exit · 6d7783c6
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.26
      commit 6cae8095490caae12875300243ec94b39b6a2a78
      bugzilla: 51363
      
      --------------------------------
      
      commit 3ebba796 upstream.
      
      If we create it in a disabled state because IORING_SETUP_R_DISABLED is
      set on ring creation, we need to ensure that we've kicked the thread if
      we're exiting before it's been explicitly disabled. Otherwise we can run
      into a deadlock where exit is waiting go park the SQPOLL thread, but the
      SQPOLL thread itself is waiting to get a signal to start.
      
      That results in the below trace of both tasks hung, waiting on each other:
      
      INFO: task syz-executor458:8401 blocked for more than 143 seconds.
            Not tainted 5.11.0-next-20210226-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor458 state:D stack:27536 pid: 8401 ppid:  8400 flags:0x00004004
      Call Trace:
       context_switch kernel/sched/core.c:4324 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
       schedule+0xcf/0x270 kernel/sched/core.c:5154
       schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
       io_sq_thread_park fs/io_uring.c:7115 [inline]
       io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103
       io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745
       __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840
       io_uring_files_cancel include/linux/io_uring.h:47 [inline]
       do_exit+0x299/0x2a60 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       __do_sys_exit_group kernel/exit.c:933 [inline]
       __se_sys_exit_group kernel/exit.c:931 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x43e899
      RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899
      RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000
      R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
      INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds.
      task:iou-sqp-8401    state:D stack:30272 pid: 8402 ppid:  8400 flags:0x00004004
      Call Trace:
       context_switch kernel/sched/core.c:4324 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
       schedule+0xcf/0x270 kernel/sched/core.c:5154
       schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
       io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds.
      
      Reported-by: syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6d7783c6
  6. 09 4月, 2021 2 次提交
    • J
      io_uring: ignore double poll add on the same waitqueue head · aaa28cf6
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.22
      commit a2501d87663b84c288188d63da4632018b627829
      bugzilla: 50796
      
      --------------------------------
      
      commit 1c3b3e65 upstream.
      
      syzbot reports a deadlock, attempting to lock the same spinlock twice:
      
      ============================================
      WARNING: possible recursive locking detected
      5.11.0-syzkaller #0 Not tainted
      --------------------------------------------
      swapper/1/0 is trying to acquire lock:
      ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
      ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960
      
      but task is already holding lock:
      ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&runtime->sleep);
        lock(&runtime->sleep);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by swapper/1/0:
       #0: ffff888147474908 (&group->lock){..-.}-{2:2}, at: _snd_pcm_stream_lock_irqsave+0x9f/0xd0 sound/core/pcm_native.c:170
       #1: ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137
      
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0xfa/0x151 lib/dump_stack.c:120
       print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
       check_deadlock kernel/locking/lockdep.c:2872 [inline]
       validate_chain kernel/locking/lockdep.c:3661 [inline]
       __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
       lock_acquire kernel/locking/lockdep.c:5510 [inline]
       lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960
       __wake_up_common+0x147/0x650 kernel/sched/wait.c:108
       __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
       snd_pcm_update_state+0x46a/0x540 sound/core/pcm_lib.c:203
       snd_pcm_update_hw_ptr0+0xa75/0x1a50 sound/core/pcm_lib.c:464
       snd_pcm_period_elapsed+0x160/0x250 sound/core/pcm_lib.c:1805
       dummy_hrtimer_callback+0x94/0x1b0 sound/drivers/dummy.c:378
       __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
       __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
       hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600
       __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
       invoke_softirq kernel/softirq.c:221 [inline]
       __irq_exit_rcu kernel/softirq.c:422 [inline]
       irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
       sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
       </IRQ>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
      RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
      RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline]
      RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline]
      RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
      RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516
      Code: dd 38 6e f8 84 db 75 ac e8 54 32 6e f8 e8 0f 1c 74 f8 e9 0c 00 00 00 e8 45 32 6e f8 0f 00 2d 4e 4a c5 00 e8 39 32 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 14 3a 6e f8 48 85 db
      RSP: 0018:ffffc90000d47d18 EFLAGS: 00000293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff8880115c3780 RSI: ffffffff89052537 RDI: 0000000000000000
      RBP: ffff888141127064 R08: 0000000000000001 R09: 0000000000000001
      R10: ffffffff81794168 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff888141127000 R14: ffff888141127064 R15: ffff888143331804
       acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:647
       cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
       cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
       call_cpuidle kernel/sched/idle.c:158 [inline]
       cpuidle_idle_call kernel/sched/idle.c:239 [inline]
       do_idle+0x3e1/0x590 kernel/sched/idle.c:300
       cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397
       start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272
       secondary_startup_64_no_verify+0xb0/0xbb
      
      which is due to the driver doing poll_wait() twice on the same
      wait_queue_head. That is perfectly valid, but from checking the rest
      of the kernel tree, it's the only driver that does this.
      
      We can handle this just fine, we just need to ignore the second addition
      as we'll get woken just fine on the first one.
      
      Cc: stable@vger.kernel.org # 5.8+
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Reported-by: syzbot+28abd693db9e92c160d8@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      aaa28cf6
    • H
      io_uring: fix possible deadlock in io_uring_poll · dffdcdcd
      Hao Xu 提交于
      stable inclusion
      from stable-5.10.20
      commit 81dfee4731c0e91d18b8292158192917144467c1
      bugzilla: 50608
      
      --------------------------------
      
      [ Upstream commit ed670c3f ]
      
      Abaci reported follow issue:
      
      [   30.615891] ======================================================
      [   30.616648] WARNING: possible circular locking dependency detected
      [   30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
      [   30.618035] ------------------------------------------------------
      [   30.618914] a.out/1128 is trying to acquire lock:
      [   30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
      [   30.620505]
      [   30.620505] but task is already holding lock:
      [   30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.622349]
      [   30.622349] which lock already depends on the new lock.
      [   30.622349]
      [   30.623289]
      [   30.623289] the existing dependency chain (in reverse order) is:
      [   30.624243]
      [   30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
      [   30.625263]        lock_acquire+0x2c7/0x390
      [   30.625868]        __mutex_lock+0xae/0x9f0
      [   30.626451]        io_cqring_overflow_flush.part.95+0x6d/0x70
      [   30.627278]        io_uring_poll+0xcb/0xd0
      [   30.627890]        ep_item_poll.isra.14+0x4e/0x90
      [   30.628531]        do_epoll_ctl+0xb7e/0x1120
      [   30.629122]        __x64_sys_epoll_ctl+0x70/0xb0
      [   30.629770]        do_syscall_64+0x2d/0x40
      [   30.630332]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.631187]
      [   30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
      [   30.631985]        check_prevs_add+0x226/0xb00
      [   30.632584]        __lock_acquire+0x1237/0x13a0
      [   30.633207]        lock_acquire+0x2c7/0x390
      [   30.633740]        __mutex_lock+0xae/0x9f0
      [   30.634258]        __ep_eventpoll_poll+0x9f/0x220
      [   30.634879]        __io_arm_poll_handler+0xbf/0x220
      [   30.635462]        io_issue_sqe+0xa6b/0x13e0
      [   30.635982]        __io_queue_sqe+0x10b/0x550
      [   30.636648]        io_queue_sqe+0x235/0x470
      [   30.637281]        io_submit_sqes+0xcce/0xf10
      [   30.637839]        __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.638465]        do_syscall_64+0x2d/0x40
      [   30.638999]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.639643]
      [   30.639643] other info that might help us debug this:
      [   30.639643]
      [   30.640618]  Possible unsafe locking scenario:
      [   30.640618]
      [   30.641402]        CPU0                    CPU1
      [   30.641938]        ----                    ----
      [   30.642664]   lock(&ctx->uring_lock);
      [   30.643425]                                lock(&ep->mtx);
      [   30.644498]                                lock(&ctx->uring_lock);
      [   30.645668]   lock(&ep->mtx);
      [   30.646321]
      [   30.646321]  *** DEADLOCK ***
      [   30.646321]
      [   30.647642] 1 lock held by a.out/1128:
      [   30.648424]  #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.649954]
      [   30.649954] stack backtrace:
      [   30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
      [   30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   30.652290] Call Trace:
      [   30.652688]  dump_stack+0xac/0xe3
      [   30.653164]  check_noncircular+0x11e/0x130
      [   30.653747]  ? check_prevs_add+0x226/0xb00
      [   30.654303]  check_prevs_add+0x226/0xb00
      [   30.654845]  ? add_lock_to_list.constprop.49+0xac/0x1d0
      [   30.655564]  __lock_acquire+0x1237/0x13a0
      [   30.656262]  lock_acquire+0x2c7/0x390
      [   30.656788]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.657379]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.658014]  __mutex_lock+0xae/0x9f0
      [   30.658524]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.659112]  ? mark_held_locks+0x5a/0x80
      [   30.659648]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.660229]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   30.660885]  ? trace_hardirqs_on+0x46/0x110
      [   30.661471]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.662102]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.662696]  __ep_eventpoll_poll+0x9f/0x220
      [   30.663273]  ? __ep_eventpoll_poll+0x220/0x220
      [   30.663875]  __io_arm_poll_handler+0xbf/0x220
      [   30.664463]  io_issue_sqe+0xa6b/0x13e0
      [   30.664984]  ? __lock_acquire+0x782/0x13a0
      [   30.665544]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.666170]  ? __io_queue_sqe+0x10b/0x550
      [   30.666725]  __io_queue_sqe+0x10b/0x550
      [   30.667252]  ? __fget_files+0x131/0x260
      [   30.667791]  ? io_req_prep+0xd8/0x1090
      [   30.668316]  ? io_queue_sqe+0x235/0x470
      [   30.668868]  io_queue_sqe+0x235/0x470
      [   30.669398]  io_submit_sqes+0xcce/0xf10
      [   30.669931]  ? xa_load+0xe4/0x1c0
      [   30.670425]  __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.671051]  ? lockdep_hardirqs_on_prepare+0xde/0x180
      [   30.671719]  ? syscall_enter_from_user_mode+0x2b/0x80
      [   30.672380]  do_syscall_64+0x2d/0x40
      [   30.672901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.673503] RIP: 0033:0x7fd89c813239
      [   30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
      [   30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
      [   30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
      [   30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
      [   30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
      [   30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000
      
      This might happen if we do epoll_wait on a uring fd while reading/writing
      the former epoll fd in a sqe in the former uring instance.
      So let's don't flush cqring overflow list, just do a simple check.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      dffdcdcd
  7. 09 3月, 2021 17 次提交
    • P
      io_uring: drop mm/files between task_work_submit · 05f1f25e
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 5592eae7846ca1279591624ecf89513dfb5840bb
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit aec18a57 ]
      
      Since SQPOLL task can be shared and so task_work entries can be a mix of
      them, we need to drop mm and files before trying to issue next request.
      
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      05f1f25e
    • P
      io_uring: reinforce cancel on flush during exit · 443ab3ec
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 88dbd085a51ec78c83dde79ad63bca8aa4272a9d
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 3a7efd1a ]
      
      What 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
      really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
      That can be achieved by io_uring_cancel_files(), but we'll miss it
      calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
      because it will go through __io_uring_cancel_task_requests().
      
      Just always call io_uring_cancel_files() during cancel, it's good enough
      for now.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      443ab3ec
    • P
      io_uring: fix sqo ownership false positive warning · 356c0370
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit aa435155d396ccccef1af1ba7b1f0b650e1c80e0
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 70b2c60d ]
      
      WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
          io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
      Call Trace:
       io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
       filp_close+0xb4/0x170 fs/open.c:1295
       close_files fs/file.c:403 [inline]
       put_files_struct fs/file.c:418 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:415
       exit_files+0x7e/0xa0 fs/file.c:435
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x427/0x20f0 kernel/signal.c:2773
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Now io_uring_cancel_task_requests() can be called not through file
      notes but directly, remove a WARN_ONCE() there that give us false
      positives. That check is not very important and we catch it in other
      places.
      
      Fixes: 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      356c0370
    • P
      io_uring: fix list corruption for splice file_get · 6497d080
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 8c7febfc919a370b502714958e88b186df5538c4
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit f609cbb8 ]
      
      kernel BUG at lib/list_debug.c:29!
      Call Trace:
       __list_add include/linux/list.h:67 [inline]
       list_add include/linux/list.h:86 [inline]
       io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466
       __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866
       io_splice_prep fs/io_uring.c:3920 [inline]
       io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081
       io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628
       io_submit_sqe fs/io_uring.c:6705 [inline]
       io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953
       __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_file_get() may be called from splice, and so REQ_F_INFLIGHT may
      already be set.
      
      Fixes: 02a13674 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT")
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+6879187cf57845801267@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      6497d080
    • H
      io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE · a320cd86
      Hao Xu 提交于
      stable inclusion
      from stable-5.10.16
      commit 7250f333ce03a41791c00b0975b64799838751a6
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 6195ba09 ]
      
      Abaci reported the follow warning:
      
      [   27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0
      [   27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0
      [   27.077604] Modules linked in:
      [   27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1
      [   27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   27.080852] RIP: 0010:__might_sleep+0x80/0xa0
      [   27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff  0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7
      [   27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286
      [   27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000
      [   27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570
      [   27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001
      [   27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000
      [   27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800
      [   27.091058] FS:  00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      [   27.092775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0
      [   27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   27.097011] Call Trace:
      [   27.097685]  __mutex_lock+0x5d/0xa30
      [   27.098565]  ? prepare_to_wait_exclusive+0x71/0xc0
      [   27.099412]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.100441]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.101537]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   27.102656]  ? trace_hardirqs_on+0x46/0x110
      [   27.103459]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.104317]  io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.105113]  io_cqring_wait+0x36e/0x4d0
      [   27.105770]  ? find_held_lock+0x28/0xb0
      [   27.106370]  ? io_uring_remove_task_files+0xa0/0xa0
      [   27.107076]  __x64_sys_io_uring_enter+0x4fb/0x640
      [   27.107801]  ? rcu_read_lock_sched_held+0x59/0xa0
      [   27.108562]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.109684]  ? syscall_enter_from_user_mode+0x26/0x70
      [   27.110731]  do_syscall_64+0x2d/0x40
      [   27.111296]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   27.112056] RIP: 0033:0x7f7b13dc8239
      [   27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa
      [   27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239
      [   27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003
      [   27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008
      [   27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480
      [   27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000
      [   27.121490] irq event stamp: 5635
      [   27.121946] hardirqs last  enabled at (5643): [] console_unlock+0x5c4/0x740
      [   27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740
      [   27.125192] softirqs last  enabled at (5272): [] __do_softirq+0x3c5/0x5aa
      [   27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20
      [   27.127634] ---[ end trace 289d7e28fa60f928 ]---
      
      This is caused by calling io_cqring_overflow_flush() which may sleep
      after calling prepare_to_wait_exclusive() which set task state to
      TASK_INTERRUPTIBLE
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      a320cd86
    • P
      io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE · db6b5315
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit d300d03a93a221e8665ae4a26985b07f7bdc375b
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit ca70f00b ]
      
      do not call blocking ops when !TASK_RUNNING; state=2 set at
      	[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
      	kernel/sched/wait.c:262
      WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
      	__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      Call Trace:
       __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
       __mutex_lock kernel/locking/mutex.c:1103 [inline]
       mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
       io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
       io_run_cancel fs/io-wq.c:856 [inline]
       io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
       io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
       io_uring_cancel_files fs/io_uring.c:8874 [inline]
       io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
       __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
       io_uring_files_cancel include/linux/io_uring.h:51 [inline]
       do_exit+0x2e6/0x2490 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x16b5/0x2030 kernel/signal.c:2770
       arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
      counting scheme, so it does all the heavy work before setting
      TASK_UNINTERRUPTIBLE.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      [axboe: fix inverted task check]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      db6b5315
    • P
      io_uring: replace inflight_wait with tctx->wait · d4409157
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 52382df81d292607f0dfabf6daa32b7d5e56b633
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit c98de08c ]
      
      As tasks now cancel only theirs requests, and inflight_wait is awaited
      only in io_uring_cancel_files(), which should be called with ->in_idle
      set, instead of keeping a separate inflight_wait use tctx->wait.
      
      That will add some spurious wakeups but actually is safer from point of
      not hanging the task.
      
      e.g.
      task1                   | IRQ
                              | *start* io_complete_rw_common(link)
                              |        link: req1 -> req2 -> req3(with files)
      *cancel_files()         |
      io_wq_cancel(), etc.    |
                              | put_req(link), adds to io-wq req2
      schedule()              |
      
      So, task1 will never try to cancel req2 or req3. If req2 is
      long-standing (e.g. read(empty_pipe)), this may hang.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      d4409157
    • P
      io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE · b2c9b0da
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit b462a7beab3fb9fdec832bcf064077828125abf0
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit a1bb3cd5 ]
      
      If the tctx inflight number haven't changed because of cancellation,
      __io_uring_task_cancel() will continue leaving the task in
      TASK_UNINTERRUPTIBLE state, that's not expected by
      __io_uring_files_cancel(). Ensure we always call finish_wait() before
      retrying.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b2c9b0da
    • J
      io_uring: if we see flush on exit, cancel related tasks · 8cf46738
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.16
      commit f0ff1a95bfa873e9b5e5883cc07d37fc4ae6bbca
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 84965ff8 ]
      
      Ensure we match tasks that belong to a dead or dying task as well, as we
      need to reap those in addition to those belonging to the exiting task.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NJosef Grieb <josef.grieb@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      8cf46738
    • J
      io_uring: account io_uring internal files as REQ_F_INFLIGHT · e45cea26
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.16
      commit d16692a34e8e60c76e0064ee7805bd5db1b0ef3b
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 02a13674 ]
      
      We need to actively cancel anything that introduces a potential circular
      loop, where io_uring holds a reference to itself. If the file in question
      is an io_uring file, then add the request to the inflight list.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      e45cea26
    • P
      io_uring: fix files cancellation · 508d0afa
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 1e7eb063a0f084cbed2cd8db39e9644642130ff0
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit bee749b1 ]
      
      io_uring_cancel_files()'s task check condition mistakenly got flipped.
      
      1. There can't be a request in the inflight list without
      IO_WQ_WORK_FILES, kill this check to keep the whole condition simpler.
      2. Also, don't call the function for files==NULL to not do such a check,
      all that staff is already handled well by its counter part,
      __io_uring_cancel_task_requests().
      
      With that just flip the task check.
      
      Also, it iowq-cancels all request of current task there, don't forget to
      set right ->files into struct io_task_cancel.
      
      Fixes: c1973b38bf639 ("io_uring: cancel only requests of current task")
      Reported-by: syzbot+c0d52d0b3c0c3ffb9525@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      508d0afa
    • P
      io_uring: always batch cancel in *cancel_files() · 1dd57a2c
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit dbdcde4422dfb1a3da6d41abffc546c74190c25a
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit f6edbabb ]
      
      Instead of iterating over each request and cancelling it individually in
      io_uring_cancel_files(), try to cancel all matching requests and use
      ->inflight_list only to check if there anything left.
      
      In many cases it should be faster, and we can reuse a lot of code from
      task cancellation.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      1dd57a2c
    • P
      io_uring: pass files into kill timeouts/poll · 994da681
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit f8fbdbb6079314f5f4076303cb0552f815a47aa0
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 6b81928d ]
      
      Make io_poll_remove_all() and io_kill_timeouts() to match against files
      as well. A preparation patch, effectively not used by now.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      994da681
    • P
      io_uring: don't iterate io_uring_cancel_files() · efff6923
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit 49250f33bb436a29387f80cc64d1f40eba1ae19e
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit b52fda00 ]
      
      io_uring_cancel_files() guarantees to cancel all matching requests,
      that's not necessary to do that in a loop. Move it up in the callchain
      into io_uring_cancel_task_requests().
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      efff6923
    • P
      io_uring: add a {task,files} pair matching helper · 250a87c6
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit f6d93f855553b96d7b53ceddc0438d28de5b94df
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 08d23634 ]
      
      Add io_match_task() that matches both task and files.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      250a87c6
    • P
      io_uring: simplify io_task_match() · 131cd6f3
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.16
      commit fe9334186a50166f4d5f1e9bfedd257d22e6c4a9
      bugzilla: 48168
      
      --------------------------------
      
      [ Upstream commit 06de5f59 ]
      
      If IORING_SETUP_SQPOLL is set all requests belong to the corresponding
      SQPOLL task, so skip task checking in that case and always match.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      131cd6f3
    • X
      io_uring: don't modify identity's files uncess identity is cowed · 99876e02
      Xiaoguang Wang 提交于
      stable inclusion
      from stable-5.10.15
      commit 4f25d448d947fcf3afd288acb842dc4abeb4134c
      bugzilla: 48167
      
      --------------------------------
      
      commit d7e10d47 upstream.
      
      Abaci Robot reported following panic:
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0
      Oops: 0002 [#1] SMP PTI
      CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:put_files_struct+0x1b/0x120
      Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff  41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c
      RSP: 0000:ffffc90002147d48 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000
      RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000
      RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500
      R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800
      R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0
      FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __io_clean_op+0x10c/0x2a0
       io_dismantle_req+0x3c7/0x600
       __io_free_req+0x34/0x280
       io_put_req+0x63/0xb0
       io_worker_handle_work+0x60e/0x830
       ? io_wqe_worker+0x135/0x520
       io_wqe_worker+0x158/0x520
       ? __kthread_parkme+0x96/0xc0
       ? io_worker_handle_work+0x830/0x830
       kthread+0x134/0x180
       ? kthread_create_worker_on_cpu+0x90/0x90
       ret_from_fork+0x1f/0x30
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace c358ca86af95b1e7 ]---
      
      I guess case below can trigger above panic: there're two threads which
      operates different io_uring ctxs and share same sqthread identity, and
      later one thread exits, io_uring_cancel_task_requests() will clear
      task->io_uring->identity->files to be NULL in sqpoll mode, then another
      ctx that uses same identity will panic.
      
      Indeed we don't need to clear task->io_uring->identity->files here,
      io_grab_identity() should handle identity->files changes well, if
      task->io_uring->identity->files is not equal to current->files,
      io_cow_identity() should handle this changes well.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      99876e02
  8. 09 2月, 2021 1 次提交
  9. 08 2月, 2021 5 次提交
    • P
      io_uring: fix sleeping under spin in __io_clean_op · 805fa5dc
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit d92d00861e98db178bd721876d0a06d1e8d5ff1a
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 9d5c8190 ]
      
      [   27.629441] BUG: sleeping function called from invalid context
      	at fs/file.c:402
      [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
      	pid: 1012, name: io_wqe_worker-0
      [   27.633220] 1 lock held by io_wqe_worker-0/1012:
      [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
      	{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
      [   27.649249] Call Trace:
      [   27.649874]  dump_stack+0xac/0xe3
      [   27.650666]  ___might_sleep+0x284/0x2c0
      [   27.651566]  put_files_struct+0xb8/0x120
      [   27.652481]  __io_clean_op+0x10c/0x2a0
      [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
      [   27.654399]  __io_req_complete.part.102+0x41/0x70
      [   27.655464]  io_openat2+0x151/0x300
      [   27.656297]  io_issue_sqe+0x6c/0x14e0
      [   27.660991]  io_wq_submit_work+0x7f/0x240
      [   27.662890]  io_worker_handle_work+0x501/0x8a0
      [   27.664836]  io_wqe_worker+0x158/0x520
      [   27.667726]  kthread+0x134/0x180
      [   27.669641]  ret_from_fork+0x1f/0x30
      
      Instead of cleaning files on overflow, return back overflow cancellation
      into io_uring_cancel_files(). Previously it was racy to clean
      REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
      repetitive attempts targeting all matching requests.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      805fa5dc
    • P
      io_uring: dont kill fasync under completion_lock · b3a9880e
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 7bccd1c19128140b9fefaa43808924c6932bef5b
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 4aa84f2f ]
      
            CPU0                    CPU1
             ----                    ----
        lock(&new->fa_lock);
                                     local_irq_disable();
                                     lock(&ctx->completion_lock);
                                     lock(&new->fa_lock);
        <Interrupt>
          lock(&ctx->completion_lock);
      
       *** DEADLOCK ***
      
      Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
      so it doesn't hold completion_lock while doing it. That saves from the
      reported deadlock, and it's just nice to shorten the locking time and
      untangle nested locks (compl_lock -> wq_head::lock).
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b3a9880e
    • P
      io_uring: fix skipping disabling sqo on exec · 417ce493
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 186725a80c4e931b6fe31b94d66c989d5f2354c1
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 0b5cd6c3 ]
      
      If there are no requests at the time __io_uring_task_cancel() is called,
      tctx_inflight() returns zero and and it terminates not getting a chance
      to go through __io_uring_files_cancel() and do
      io_disable_sqo_submit(). And we absolutely want them disabled by the
      time cancellation ends.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      417ce493
    • P
      io_uring: fix uring_flush in exit_files() warning · 8940e1a8
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 4325cb49 ]
      
      WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
      	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      Call Trace:
       filp_close+0xb4/0x170 fs/open.c:1280
       close_files fs/file.c:401 [inline]
       put_files_struct fs/file.c:416 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:413
       exit_files+0x7e/0xa0 fs/file.c:433
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x3e9/0x20a0 kernel/signal.c:2770
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      An SQPOLL ring creator task may have gotten rid of its file note during
      exit and called io_disable_sqo_submit(), but the io_uring is still left
      referenced through fdtable, which will be put during close_files() and
      cause a false positive warning.
      
      First split the warning into two for more clarity when is hit, and the
      add sqo_dead check to handle the described case.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8940e1a8
    • P
      io_uring: fix false positive sqo warning on flush · 79ee5666
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 0682759126bc761c325325ca809ae99c93fda2a0
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 6b393a1f ]
      
      WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
      	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
      Call Trace:
       io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
       filp_close+0xb4/0x170 fs/open.c:1280
       close_fd+0x5c/0x80 fs/file.c:626
       __do_sys_close fs/open.c:1299 [inline]
       __se_sys_close fs/open.c:1297 [inline]
       __x64_sys_close+0x2f/0xa0 fs/open.c:1297
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_uring's final close() may be triggered by any task not only the
      creator. It's well handled by io_uring_flush() including SQPOLL case,
      though a warning in io_disable_sqo_submit() will fallaciously fire by
      moving this warning out to the only call site that matters.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      79ee5666