• J
    btrfs: don't allow large NOWAIT direct reads · 79d3d1d1
    Josef Bacik 提交于
    Dylan and Jens reported a problem where they had an io_uring test that
    was returning short reads, and bisected it to ee5b46a3 ("btrfs:
    increase direct io read size limit to 256 sectors").
    
    The root cause is their test was doing larger reads via io_uring with
    NOWAIT and async.  This was triggering a page fault during the direct
    read, however the first page was able to work just fine and thus we
    submitted a 4k read for a larger iocb.
    
    Btrfs allows for partial IO's in this case specifically because we don't
    allow page faults, and thus we'll attempt to do any io that we can,
    submit what we could, come back and fault in the rest of the range and
    try to do the remaining IO.
    
    However for !is_sync_kiocb() we'll call ->ki_complete() as soon as the
    partial dio is done, which is incorrect.  In the sync case we can exit
    the iomap code, submit more io's, and return with the amount of IO we
    were able to complete successfully.
    
    We were always doing short reads in this case, but for NOWAIT we were
    getting saved by the fact that we were limiting direct reads to
    sectorsize, and if we were larger than that we would return EAGAIN.
    
    Fix the regression by simply returning EAGAIN in the NOWAIT case with
    larger reads, that way io_uring can retry and get the larger IO and have
    the fault logic handle everything properly.
    
    This still leaves the AIO short read case, but that existed before this
    change.  The way to properly fix this would be to handle partial iocb
    completions, but that's a lot of work, for now deal with the regression
    in the most straightforward way possible.
    Reported-by: NDylan Yudaken <dylany@fb.com>
    Fixes: ee5b46a3 ("btrfs: increase direct io read size limit to 256 sectors")
    Reviewed-by: NFilipe Manana <fdmanana@suse.com>
    Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: NDavid Sterba <dsterba@suse.com>
    79d3d1d1
inode.c 326.7 KB