1. 22 5月, 2017 1 次提交
    • K
      ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors · 9651e6b2
      Konstantin Khlebnikov 提交于
      I've got another report about breaking ext4 by ENOMEM error returned from
      ext4_mb_load_buddy() caused by memory shortage in memory cgroup.
      This time inside ext4_discard_preallocations().
      
      This patch replaces ext4_error() with ext4_warning() where errors returned
      from ext4_mb_load_buddy() are not fatal and handled by caller:
      * ext4_mb_discard_group_preallocations() - called before generating ENOSPC,
        we'll try to discard other group or return ENOSPC into user-space.
      * ext4_trim_all_free() - just stop trimming and return ENOMEM from ioctl.
      
      Some callers cannot handle errors, thus __GFP_NOFAIL is used for them:
      * ext4_discard_preallocations()
      * ext4_mb_discard_lg_preallocations()
      
      Fixes: adb7ef60 ("ext4: use __GFP_NOFAIL in ext4_free_blocks()")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      9651e6b2
  2. 09 5月, 2017 1 次提交
    • M
      mm: introduce kv[mz]alloc helpers · a7c3e901
      Michal Hocko 提交于
      Patch series "kvmalloc", v5.
      
      There are many open coded kmalloc with vmalloc fallback instances in the
      tree.  Most of them are not careful enough or simply do not care about
      the underlying semantic of the kmalloc/page allocator which means that
      a) some vmalloc fallbacks are basically unreachable because the kmalloc
      part will keep retrying until it succeeds b) the page allocator can
      invoke a really disruptive steps like the OOM killer to move forward
      which doesn't sound appropriate when we consider that the vmalloc
      fallback is available.
      
      As it can be seen implementing kvmalloc requires quite an intimate
      knowledge if the page allocator and the memory reclaim internals which
      strongly suggests that a helper should be implemented in the memory
      subsystem proper.
      
      Most callers, I could find, have been converted to use the helper
      instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
      in the networking stack which I have converted as well and Eric Dumazet
      was not opposed [2] to convert them as well.
      
      [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
      
      This patch (of 9):
      
      Using kmalloc with the vmalloc fallback for larger allocations is a
      common pattern in the kernel code.  Yet we do not have any common helper
      for that and so users have invented their own helpers.  Some of them are
      really creative when doing so.  Let's just add kv[mz]alloc and make sure
      it is implemented properly.  This implementation makes sure to not make
      a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
      to not warn about allocation failures.  This also rules out the OOM
      killer as the vmalloc is a more approapriate fallback than a disruptive
      user visible action.
      
      This patch also changes some existing users and removes helpers which
      are specific for them.  In some cases this is not possible (e.g.
      ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
      require GFP_NO{FS,IO} context which is not vmalloc compatible in general
      (note that the page table allocation is GFP_KERNEL).  Those need to be
      fixed separately.
      
      While we are at it, document that __vmalloc{_node} about unsupported gfp
      mask because there seems to be a lot of confusion out there.
      kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
      superset) flags to catch new abusers.  Existing ones would have to die
      slowly.
      
      [sfr@canb.auug.org.au: f2fs fixup]
        Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7c3e901
  3. 30 4月, 2017 2 次提交
  4. 28 2月, 2017 1 次提交
  5. 10 2月, 2017 1 次提交
    • J
      ext4: fix stripe-unaligned allocations · d9b22cf9
      Jan Kara 提交于
      When a filesystem is created using:
      
      	mkfs.ext4 -b 4096 -E stride=512 <dev>
      
      and we try to allocate 64MB extent, we will end up directly in
      ext4_mb_complex_scan_group(). This is because the request is detected
      as power-of-two allocation (so we start in ext4_mb_regular_allocator()
      with ac_criteria == 0) however the check before
      ext4_mb_simple_scan_group() refuses the direct buddy scan because the
      allocation request is too large. Since cr == 0, the check whether we
      should use ext4_mb_scan_aligned() fails as well and we fall back to
      ext4_mb_complex_scan_group().
      
      Fix the problem by checking for upper limit on power-of-two requests
      directly when detecting them.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d9b22cf9
  6. 28 1月, 2017 1 次提交
  7. 23 1月, 2017 1 次提交
    • T
      ext4: replace BUG_ON with WARN_ON in mb_find_extent() · 43c73221
      Theodore Ts'o 提交于
      The last BUG_ON in mb_find_extent() is apparently triggering in some
      rare cases.  Most of the time it indicates a bug in the buddy bitmap
      algorithms, but there are some weird cases where it can trigger when
      buddy bitmap is still in memory, but the block bitmap has to be read
      from disk, and there is disk or memory corruption such that the block
      bitmap and the buddy bitmap are out of sync.
      
      Google-Bug-Id: #33702157
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      43c73221
  8. 15 11月, 2016 2 次提交
  9. 15 7月, 2016 1 次提交
    • V
      ext4: fix reference counting bug on block allocation error · 554a5ccc
      Vegard Nossum 提交于
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      554a5ccc
  10. 27 6月, 2016 1 次提交
  11. 30 5月, 2016 1 次提交
  12. 06 5月, 2016 2 次提交
    • N
      ext4: silence UBSAN in ext4_mb_init() · 935244cd
      Nicolai Stange 提交于
      Currently, in ext4_mb_init(), there's a loop like the following:
      
        do {
          ...
          offset += 1 << (sb->s_blocksize_bits - i);
          i++;
        } while (i <= sb->s_blocksize_bits + 1);
      
      Note that the updated offset is used in the loop's next iteration only.
      
      However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
      the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
      and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
        shift exponent 4294967295 is too large for 32-bit type 'int'
        [...]
        Call Trace:
         [<ffffffff818c4d25>] dump_stack+0xbc/0x117
         [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
         [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
         [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
         [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
         [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
         [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
         [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
         [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
         [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
         [...]
      
      Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of offset is never used again.
      
      Silence UBSAN by introducing another variable, offset_incr, holding the
      next increment to apply to offset and adjust that one by right shifting it
      by one position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      935244cd
    • N
      ext4: address UBSAN warning in mb_find_order_for_block() · b5cb316c
      Nicolai Stange 提交于
      Currently, in mb_find_order_for_block(), there's a loop like the following:
      
        while (order <= e4b->bd_blkbits + 1) {
          ...
          bb += 1 << (e4b->bd_blkbits - order);
        }
      
      Note that the updated bb is used in the loop's next iteration only.
      
      However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
      the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
        shift exponent -1 is negative
        [...]
        Call Trace:
         [<ffffffff818c4d35>] dump_stack+0xbc/0x117
         [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
         [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
         [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
         [...]
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of bb is never used again.
      
      Silence UBSAN by introducing another variable, bb_incr, holding the next
      increment to apply to bb and adjust that one by right shifting it by one
      position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b5cb316c
  13. 27 4月, 2016 1 次提交
  14. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  15. 14 3月, 2016 1 次提交
  16. 10 3月, 2016 1 次提交
  17. 22 2月, 2016 1 次提交
  18. 12 2月, 2016 1 次提交
  19. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  20. 19 10月, 2015 1 次提交
  21. 18 10月, 2015 2 次提交
    • D
      ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc · 9c02ac97
      Daeho Jeong 提交于
      When you repeatly execute xfstest generic/269 with bigalloc_1k option
      enabled using the below command:
      
      "./kvm-xfstests -c bigalloc_1k -m nodelalloc -C 1000 generic/269"
      
      you can easily see the below bug message.
      
      "JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);"
      
      This means that an already revoked buffer is erroneously revoked again
      and it is caused by doing revoke for the buffer at the wrong position
      in ext4_free_blocks(). We need to re-position the buffer revoke
      procedure for an unspecified buffer after checking the cluster boundary
      for bigalloc option. If not, some part of the cluster can be doubly
      revoked.
      Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
      9c02ac97
    • D
      ext4: make the bitmap read routines return real error codes · 9008a58e
      Darrick J. Wong 提交于
      Make the bitmap reaading routines return real error codes (EIO,
      EFSCORRUPTED, EFSBADCRC) which can then be reflected back to
      userspace for more precise diagnosis work.
      
      In particular, this means that mballoc no longer claims that we're out
      of memory if the block bitmaps become corrupt.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      9008a58e
  22. 24 9月, 2015 1 次提交
  23. 06 7月, 2015 1 次提交
  24. 15 6月, 2015 1 次提交
  25. 08 6月, 2015 2 次提交
    • L
      ext4: return error code from ext4_mb_good_group() · 42ac1848
      Lukas Czerner 提交于
      Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
      the allocation group is suitable for use or not. However we might get
      various errors and fail while initializing new group including -EIO
      which would never get propagated up the call chain. This might lead to
      an endless loop at writeback when we're trying to find a good group to
      allocate from and we fail to initialize new group (read error for
      example).
      
      Fix this by returning proper error code from ext4_mb_good_group() and
      using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
      we will always return only the first occurred error from
      ext4_mb_good_group() and we only propagate it back  to the caller if we
      do not get any other errors and we fail to allocate any blocks.
      
      Note that with other modes than errors=continue, we will fail
      immediately in ext4_mb_good_group() in case of error, however with
      errors=continue we should try to continue using the file system, that's
      why we're not going to fail immediately when we see an error from
      ext4_mb_good_group(), but rather when we fail to find a suitable block
      group to allocate from due to an problem in group initialization.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42ac1848
    • L
      ext4: try to initialize all groups we can in case of failure on ppc64 · bbdc322f
      Lukas Czerner 提交于
      Currently on the machines with page size > block size when initializing
      block group buddy cache we initialize it for all the block group bitmaps
      in the page. However in the case of read error, checksum error, or if
      a single bitmap is in any way corrupted we would fail to initialize all
      of the bitmaps. This is problematic because we will not have access to
      the other allocation groups even though those might be perfectly fine
      and usable.
      
      Fix this by reading all the bitmaps instead of error out on the first
      problem and simply skip the bitmaps which were either not read properly,
      or are not valid.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bbdc322f
  26. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  27. 26 11月, 2014 2 次提交
  28. 21 11月, 2014 1 次提交
  29. 02 10月, 2014 1 次提交
  30. 05 9月, 2014 2 次提交
  31. 27 8月, 2014 1 次提交
  32. 24 8月, 2014 1 次提交
    • T
      ext4: fix BUG_ON in mb_free_blocks() · c99d1e6e
      Theodore Ts'o 提交于
      If we suffer a block allocation failure (for example due to a memory
      allocation failure), it's possible that we will call
      ext4_discard_allocated_blocks() before we've actually allocated any
      blocks.  In that case, fe_len and fe_start in ac->ac_f_ex will still
      be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
      triggering the BUG_ON on mb_free_blocks():
      
      	BUG_ON(last >= (sb->s_blocksize << 3));
      
      Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
      is zero.
      
      Also fix a missing ext4_mb_unload_buddy() call in
      ext4_discard_allocated_blocks().
      
      Google-Bug-Id: 16844242
      
      Fixes: 86f0afd4Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c99d1e6e