1. 30 4月, 2017 1 次提交
  2. 28 2月, 2017 1 次提交
  3. 10 2月, 2017 1 次提交
    • J
      ext4: fix stripe-unaligned allocations · d9b22cf9
      Jan Kara 提交于
      When a filesystem is created using:
      
      	mkfs.ext4 -b 4096 -E stride=512 <dev>
      
      and we try to allocate 64MB extent, we will end up directly in
      ext4_mb_complex_scan_group(). This is because the request is detected
      as power-of-two allocation (so we start in ext4_mb_regular_allocator()
      with ac_criteria == 0) however the check before
      ext4_mb_simple_scan_group() refuses the direct buddy scan because the
      allocation request is too large. Since cr == 0, the check whether we
      should use ext4_mb_scan_aligned() fails as well and we fall back to
      ext4_mb_complex_scan_group().
      
      Fix the problem by checking for upper limit on power-of-two requests
      directly when detecting them.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d9b22cf9
  4. 28 1月, 2017 1 次提交
  5. 23 1月, 2017 1 次提交
    • T
      ext4: replace BUG_ON with WARN_ON in mb_find_extent() · 43c73221
      Theodore Ts'o 提交于
      The last BUG_ON in mb_find_extent() is apparently triggering in some
      rare cases.  Most of the time it indicates a bug in the buddy bitmap
      algorithms, but there are some weird cases where it can trigger when
      buddy bitmap is still in memory, but the block bitmap has to be read
      from disk, and there is disk or memory corruption such that the block
      bitmap and the buddy bitmap are out of sync.
      
      Google-Bug-Id: #33702157
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      43c73221
  6. 15 11月, 2016 2 次提交
  7. 15 7月, 2016 1 次提交
    • V
      ext4: fix reference counting bug on block allocation error · 554a5ccc
      Vegard Nossum 提交于
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      554a5ccc
  8. 27 6月, 2016 1 次提交
  9. 30 5月, 2016 1 次提交
  10. 06 5月, 2016 2 次提交
    • N
      ext4: silence UBSAN in ext4_mb_init() · 935244cd
      Nicolai Stange 提交于
      Currently, in ext4_mb_init(), there's a loop like the following:
      
        do {
          ...
          offset += 1 << (sb->s_blocksize_bits - i);
          i++;
        } while (i <= sb->s_blocksize_bits + 1);
      
      Note that the updated offset is used in the loop's next iteration only.
      
      However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
      the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
      and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
        shift exponent 4294967295 is too large for 32-bit type 'int'
        [...]
        Call Trace:
         [<ffffffff818c4d25>] dump_stack+0xbc/0x117
         [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
         [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
         [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
         [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
         [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
         [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
         [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
         [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
         [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
         [...]
      
      Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of offset is never used again.
      
      Silence UBSAN by introducing another variable, offset_incr, holding the
      next increment to apply to offset and adjust that one by right shifting it
      by one position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      935244cd
    • N
      ext4: address UBSAN warning in mb_find_order_for_block() · b5cb316c
      Nicolai Stange 提交于
      Currently, in mb_find_order_for_block(), there's a loop like the following:
      
        while (order <= e4b->bd_blkbits + 1) {
          ...
          bb += 1 << (e4b->bd_blkbits - order);
        }
      
      Note that the updated bb is used in the loop's next iteration only.
      
      However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
      the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
        shift exponent -1 is negative
        [...]
        Call Trace:
         [<ffffffff818c4d35>] dump_stack+0xbc/0x117
         [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
         [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
         [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
         [...]
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of bb is never used again.
      
      Silence UBSAN by introducing another variable, bb_incr, holding the next
      increment to apply to bb and adjust that one by right shifting it by one
      position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b5cb316c
  11. 27 4月, 2016 1 次提交
  12. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 14 3月, 2016 1 次提交
  14. 10 3月, 2016 1 次提交
  15. 22 2月, 2016 1 次提交
  16. 12 2月, 2016 1 次提交
  17. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  18. 19 10月, 2015 1 次提交
  19. 18 10月, 2015 2 次提交
    • D
      ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc · 9c02ac97
      Daeho Jeong 提交于
      When you repeatly execute xfstest generic/269 with bigalloc_1k option
      enabled using the below command:
      
      "./kvm-xfstests -c bigalloc_1k -m nodelalloc -C 1000 generic/269"
      
      you can easily see the below bug message.
      
      "JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);"
      
      This means that an already revoked buffer is erroneously revoked again
      and it is caused by doing revoke for the buffer at the wrong position
      in ext4_free_blocks(). We need to re-position the buffer revoke
      procedure for an unspecified buffer after checking the cluster boundary
      for bigalloc option. If not, some part of the cluster can be doubly
      revoked.
      Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
      9c02ac97
    • D
      ext4: make the bitmap read routines return real error codes · 9008a58e
      Darrick J. Wong 提交于
      Make the bitmap reaading routines return real error codes (EIO,
      EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
      userspace for more precise diagnosis work.
      
      In particular, this means that mballoc no longer claims that we're out
      of memory if the block bitmaps become corrupt.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      9008a58e
  20. 24 9月, 2015 1 次提交
  21. 06 7月, 2015 1 次提交
  22. 15 6月, 2015 1 次提交
  23. 08 6月, 2015 2 次提交
    • L
      ext4: return error code from ext4_mb_good_group() · 42ac1848
      Lukas Czerner 提交于
      Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
      the allocation group is suitable for use or not. However we might get
      various errors and fail while initializing new group including -EIO
      which would never get propagated up the call chain. This might lead to
      an endless loop at writeback when we're trying to find a good group to
      allocate from and we fail to initialize new group (read error for
      example).
      
      Fix this by returning proper error code from ext4_mb_good_group() and
      using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
      we will always return only the first occurred error from
      ext4_mb_good_group() and we only propagate it back  to the caller if we
      do not get any other errors and we fail to allocate any blocks.
      
      Note that with other modes than errors=continue, we will fail
      immediately in ext4_mb_good_group() in case of error, however with
      errors=continue we should try to continue using the file system, that's
      why we're not going to fail immediately when we see an error from
      ext4_mb_good_group(), but rather when we fail to find a suitable block
      group to allocate from due to an problem in group initialization.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42ac1848
    • L
      ext4: try to initialize all groups we can in case of failure on ppc64 · bbdc322f
      Lukas Czerner 提交于
      Currently on the machines with page size > block size when initializing
      block group buddy cache we initialize it for all the block group bitmaps
      in the page. However in the case of read error, checksum error, or if
      a single bitmap is in any way corrupted we would fail to initialize all
      of the bitmaps. This is problematic because we will not have access to
      the other allocation groups even though those might be perfectly fine
      and usable.
      
      Fix this by reading all the bitmaps instead of error out on the first
      problem and simply skip the bitmaps which were either not read properly,
      or are not valid.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bbdc322f
  24. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  25. 26 11月, 2014 2 次提交
  26. 21 11月, 2014 1 次提交
  27. 02 10月, 2014 1 次提交
  28. 05 9月, 2014 2 次提交
  29. 27 8月, 2014 1 次提交
  30. 24 8月, 2014 1 次提交
    • T
      ext4: fix BUG_ON in mb_free_blocks() · c99d1e6e
      Theodore Ts'o 提交于
      If we suffer a block allocation failure (for example due to a memory
      allocation failure), it's possible that we will call
      ext4_discard_allocated_blocks() before we've actually allocated any
      blocks.  In that case, fe_len and fe_start in ac->ac_f_ex will still
      be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
      triggering the BUG_ON on mb_free_blocks():
      
      	BUG_ON(last >= (sb->s_blocksize << 3));
      
      Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
      is zero.
      
      Also fix a missing ext4_mb_unload_buddy() call in
      ext4_discard_allocated_blocks().
      
      Google-Bug-Id: 16844242
      
      Fixes: 86f0afd4Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c99d1e6e
  31. 31 7月, 2014 1 次提交
    • T
      ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct · 86f0afd4
      Theodore Ts'o 提交于
      If there is a failure while allocating the preallocation structure, a
      number of blocks can end up getting marked in the in-memory buddy
      bitmap, and then not getting released.  This can result in the
      following corruption getting reported by the kernel:
      
      EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
      12793 clusters in bitmap, 12729 in gd
      
      In that case, we need to release the blocks using mb_free_blocks().
      
      Tested: fs smoke test; also demonstrated that with injected errors,
      	the file system is no longer getting corrupted
      
      Google-Bug-Id: 16657874
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      86f0afd4
  32. 28 7月, 2014 1 次提交
  33. 15 7月, 2014 1 次提交