• I
    Btrfs: eliminate races in worker stopping code · 964fb15a
    Ilya Dryomov 提交于
    The current implementation of worker threads in Btrfs has races in
    worker stopping code, which cause all kinds of panics and lockups when
    running btrfs/011 xfstest in a loop.  The problem is that
    btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
    check_busy_worker and __btrfs_start_workers.
    
    E.g., check_idle_worker race flow:
    
           btrfs_stop_workers():            check_idle_worker(aworker):
    - grabs the lock
    - splices the idle list into the
      working list
    - removes the first worker from the
      working list
    - releases the lock to wait for
      its kthread's completion
                                      - grabs the lock
                                      - if aworker is on the working list,
                                        moves aworker from the working list
                                        to the idle list
                                      - releases the lock
    - grabs the lock
    - puts the worker
    - removes the second worker from the
      working list
                                  ......
            btrfs_stop_workers returns, aworker is on the idle list
                     FS is umounted, memory is freed
                                  ......
                  aworker is waken up, fireworks ensue
    
    With this applied, I wasn't able to trigger the problem in 48 hours,
    whereas previously I could reliably reproduce at least one of these
    races within an hour.
    Reported-by: NDavid Sterba <dsterba@suse.cz>
    Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
    Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
    964fb15a
async-thread.c 18.3 KB