1. 01 3月, 2019 1 次提交
    • C
      fs: fix guard_bio_eod to check for real EOD errors · dce30ca9
      Carlos Maiolino 提交于
      guard_bio_eod() can truncate a segment in bio to allow it to do IO on
      odd last sectors of a device.
      
      It already checks if the IO starts past EOD, but it does not consider
      the possibility of an IO request starting within device boundaries can
      contain more than one segment past EOD.
      
      In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
      underflow bvec->bv_len.
      
      Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
      
      This situation has been found on filesystems such as isofs and vfat,
      which doesn't check the device size before mount, if the device is
      smaller than the filesystem itself, a readahead on such filesystem,
      which spans EOD, can trigger this situation, leading a call to
      zero_user() with a wrong size possibly corrupting memory.
      
      I didn't see any crash, or didn't let the system run long enough to
      check if memory corruption will be hit somewhere, but adding
      instrumentation to guard_bio_end() to check truncated_bytes size, was
      enough to see the error.
      
      The following script can trigger the error.
      
      MNT=/mnt
      IMG=./DISK.img
      DEV=/dev/loop0
      
      mkfs.vfat $IMG
      mount $IMG $MNT
      cp -R /etc $MNT &> /dev/null
      umount $MNT
      
      losetup -D
      
      losetup --find --show --sizelimit 16247280 $IMG
      mount $DEV $MNT
      
      find $MNT -type f -exec cat {} + >/dev/null
      
      Kudos to Eric Sandeen for coming up with the reproducer above
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dce30ca9
  2. 15 2月, 2019 1 次提交
  3. 07 2月, 2019 1 次提交
    • T
      fs: ratelimit __find_get_block_slow() failure message. · 43636c80
      Tetsuo Handa 提交于
      When something let __find_get_block_slow() hit all_mapped path, it calls
      printk() for 100+ times per a second. But there is no need to print same
      message with such high frequency; it is just asking for stall warning, or
      at least bloating log files.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.873324][T15342] b_state=0x00000029, b_size=512
        [  399.878403][T15342] device loop0 blocksize: 4096
        [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.890400][T15342] b_state=0x00000029, b_size=512
        [  399.895595][T15342] device loop0 blocksize: 4096
        [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.907471][T15342] b_state=0x00000029, b_size=512
        [  399.912506][T15342] device loop0 blocksize: 4096
      
      This patch reduces frequency to up to once per a second, in addition to
      concatenating three lines into one.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      43636c80
  4. 05 1月, 2019 1 次提交
  5. 08 12月, 2018 1 次提交
  6. 02 11月, 2018 1 次提交
  7. 21 10月, 2018 1 次提交
  8. 22 9月, 2018 1 次提交
  9. 30 8月, 2018 1 次提交
  10. 18 8月, 2018 1 次提交
  11. 20 6月, 2018 2 次提交
  12. 02 6月, 2018 1 次提交
  13. 12 4月, 2018 2 次提交
  14. 11 4月, 2018 1 次提交
  15. 06 4月, 2018 1 次提交
  16. 19 3月, 2018 1 次提交
    • M
      buffer.c: call thaw_super during emergency thaw · 08fdc8a0
      Mateusz Guzik 提交于
      There are 2 distinct freezing mechanisms - one operates on block
      devices and another one directly on super blocks. Both end up with the
      same result, but thaw of only one of these does not thaw the other.
      
      In particular fsfreeze --freeze uses the ioctl variant going to the
      super block. Since prior to this patch emergency thaw was not doing
      a relevant thaw, filesystems frozen with this method remained
      unaffected.
      
      The patch is a hack which adds blind unfreezing.
      
      In order to keep the super block write-locked the whole time the code
      is shuffled around and the newly introduced __iterate_supers is
      employed.
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      08fdc8a0
  17. 26 1月, 2018 1 次提交
  18. 07 1月, 2018 1 次提交
  19. 16 11月, 2017 1 次提交
  20. 11 11月, 2017 1 次提交
    • G
      fs: guard_bio_eod() needs to consider partitions · 67f2519f
      Greg Edwards 提交于
      guard_bio_eod() needs to look at the partition capacity, not just the
      capacity of the whole device, when determining if truncation is
      necessary.
      
      [   60.268688] attempt to access beyond end of device
      [   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
      [   60.268693] buffer_io_error: 2 callbacks suppressed
      [   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read
      
      Fixes: 74d46992 ("block: replace bi_bdev with a gendisk pointer and partitions index")
      Cc: stable@vger.kernel.org # v4.13
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NGreg Edwards <gedwards@ddn.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      67f2519f
  21. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  22. 03 10月, 2017 3 次提交
  23. 02 10月, 2017 1 次提交
  24. 07 9月, 2017 4 次提交
  25. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  26. 11 7月, 2017 1 次提交
  27. 06 7月, 2017 2 次提交
  28. 05 7月, 2017 1 次提交
    • A
      fs: generic_block_bmap(): initialize all of the fields in the temp bh · 2a527d68
      Alexander Potapenko 提交于
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports the
      use of uninitialized memory in ext4_update_bh_state():
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 3 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0-rc6+ #597
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008
       ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8
       ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f30856>] dump_stack+0xa6/0xc0 lib/dump_stack.c:51
       [<ffffffff812fc1fc>] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:?
       [<ffffffff812fc33b>] __msan_warning+0x2b/0x40 ??:?
       [<     inline     >] ext4_update_bh_state fs/ext4/inode.c:727
       [<ffffffff8169742a>] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759
       [<ffffffff81696d4c>] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769
       [<ffffffff814a2d36>] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991
       [<ffffffff816ca30e>] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177
      ...
      origin description: ----tmp@generic_block_bmap
      ==================================================================
      
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      The local |tmp| is created in generic_block_bmap() and then passed into
      ext4_bmap() => ext4_get_block() => _ext4_get_block() =>
      ext4_update_bh_state(). Along the way tmp.b_page is never initialized
      before ext4_update_bh_state() checks its value.
      
      [ Use the approach suggested by Kees Cook of initializing the whole bh
        structure.]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2a527d68
  29. 03 7月, 2017 1 次提交
    • A
      vfs: Add page_cache_seek_hole_data helper · 334fd34d
      Andreas Gruenbacher 提交于
      Both ext4 and xfs implement seeking for the next hole or piece of data
      in unwritten extents by scanning the page cache, and both versions share
      the same bug when iterating the buffers of a page: the start offset into
      the page isn't taken into account, so when a page fits more than two
      filesystem blocks, things will go wrong.  For example, on a filesystem
      with a block size of 1k, the following command will fail:
      
        xfs_io -f -c "falloc 0 4k" \
                  -c "pwrite 1k 1k" \
                  -c "pwrite 3k 1k" \
                  -c "seek -a -r 0" foo
      
      In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
      SEEK_DATA) will return the correct result.
      
      Introduce a generic vfs helper for seeking in the page cache that gets
      this right.  The next commits will replace the filesystem specific
      implementations.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      [hch: dropped the export]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      334fd34d
  30. 28 6月, 2017 1 次提交
  31. 09 6月, 2017 1 次提交
  32. 09 5月, 2017 1 次提交