• O
    btrfs: fix double __endio_write_update_ordered in direct I/O · c36cac28
    Omar Sandoval 提交于
    In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
    we complete the ordered extent range. However, we don't mark that the
    range doesn't need to be cleaned up from btrfs_direct_IO() until later.
    Therefore, if we fail to allocate the btrfs_dio_private, we complete the
    ordered extent range twice. We could fix this by updating
    unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
    that creating the btrfs_dio_private and submitting the bios are
    separate, and once the btrfs_dio_private is created, cleanup always
    happens through the btrfs_dio_private.
    
    The logic around unsubmitted_oe_range_end and unsubmitted_oe_range_start
    is really subtle. We have the following:
    
      1. btrfs_direct_IO sets those two to the same value.
    
      2. When we call __blockdev_direct_IO unless
         btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
         modify unsubmitted_oe_range_start so that start < end. Cleanup
         won't happen.
    
      3. We come into btrfs_submit_direct - if it dip allocation fails we'd
         return with oe_range_end now modified so cleanup will happen.
    
      4. If we manage to allocate the dip we reset the unsubmitted range
         members to be equal so that cleanup happens from
         btrfs_endio_direct_write.
    
    This 4-step logic is not really obvious, especially given it's scattered
    across 3 functions.
    
    Fixes: f28a4928 ("Btrfs: fix leaking of ordered extents after direct IO write error")
    Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: NNikolay Borisov <nborisov@suse.com>
    Signed-off-by: NOmar Sandoval <osandov@fb.com>
    [ add range start/end logic explanation from Nikolay ]
    Signed-off-by: NDavid Sterba <dsterba@suse.com>
    c36cac28
inode.c 289.0 KB