• X
    io_uring: check kthread stopped flag when sq thread is unparked · c52fc272
    Xiaoguang Wang 提交于
    stable inclusion
    from stable-5.10.5
    commit b5a2f093b6b16db004619d6403f68c75ee85d794
    bugzilla: 46931
    
    --------------------------------
    
    commit 65b2b213 upstream.
    
    syzbot reports following issue:
    INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
    task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
    Call Trace:
     context_switch kernel/sched/core.c:3773 [inline]
     __schedule+0x893/0x2170 kernel/sched/core.c:4522
     schedule+0xcf/0x270 kernel/sched/core.c:4600
     schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
     do_wait_for_common kernel/sched/completion.c:85 [inline]
     __wait_for_common kernel/sched/completion.c:106 [inline]
     wait_for_common kernel/sched/completion.c:117 [inline]
     wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
     kthread_stop+0x17a/0x720 kernel/kthread.c:596
     io_put_sq_data fs/io_uring.c:7193 [inline]
     io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
     io_finish_async fs/io_uring.c:7297 [inline]
     io_sq_offload_create fs/io_uring.c:8015 [inline]
     io_uring_create fs/io_uring.c:9433 [inline]
     io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
     do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45deb9
    Code: Unable to access opcode bytes at RIP 0x45de8f.
    RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
    RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
    RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
    RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
    R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
    INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
          Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    
    Currently we don't have a reproducer yet, but seems that there is a
    race in current codes:
    => io_put_sq_data
          ctx_list is empty now.       |
    ==> kthread_park(sqd->thread);     |
                                       | T1: sq thread is parked now.
    ==> kthread_stop(sqd->thread);     |
        KTHREAD_SHOULD_STOP is set now.|
    Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
    
    ===> kthread_unpark(k);            |
                                       | T2: sq thread is now unparkd, run again.
                                       |
                                       | T3: sq thread is now preempted out.
                                       |
    ===> wake_up_process(k);           |
                                       |
                                       | T4: Since sqd ctx_list is empty, needs_sched will be true,
                                       | then sq thread sets task state to TASK_INTERRUPTIBLE,
                                       | and schedule, now sq thread will never be waken up.
    ===> wait_for_completion           |
    
    I have artificially used mdelay() to simulate above race, will get same
    stack like this syzbot report, but to be honest, I'm not sure this code
    race triggers syzbot report.
    
    To fix this possible code race, when sq thread is unparked, need to check
    whether sq thread has been stopped.
    
    Reported-by: syzbot+03beeb595f074db9cfd1@syzkaller.appspotmail.com
    Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
    Signed-off-by: NJens Axboe <axboe@kernel.dk>
    Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: NChen Jun <chenjun102@huawei.com>
    c52fc272
io_uring.c 238.1 KB