1. 21 4月, 2013 1 次提交
  2. 10 4月, 2013 1 次提交
    • L
      ext4: introduce reserved space · 27dd4385
      Lukas Czerner 提交于
      Currently in ENOSPC condition when writing into unwritten space, or
      punching a hole, we might need to split the extent and grow extent tree.
      However since we can not allocate any new metadata blocks we'll have to
      zero out unwritten part of extent or punched out part of extent, or in
      the worst case return ENOSPC even though use actually does not allocate
      any space.
      
      Also in delalloc path we do reserve metadata and data blocks for the
      time we're going to write out, however metadata block reservation is
      very tricky especially since we expect that logical connectivity implies
      physical connectivity, however that might not be the case and hence we
      might end up allocating more metadata blocks than previously reserved.
      So in future, metadata reservation checks should be removed since we can
      not assure that we do not under reserve.
      
      And this is where reserved space comes into the picture. When mounting
      the file system we slice off a little bit of the file system space (2%
      or 4096 clusters, whichever is smaller) which can be then used for the
      cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.
      
      The number of reserved clusters can be set via sysfs, however it can
      never be bigger than number of free clusters in the file system.
      
      Note that this patch fixes the failure of xfstest 274 as expected.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      27dd4385
  3. 04 4月, 2013 2 次提交
    • L
      ext4: introduce ext4_get_group_number() · bd86298e
      Lukas Czerner 提交于
      Currently on many places in ext4 we're using
      ext4_get_group_no_and_offset() even though we're only interested in
      knowing the block group of the particular block, not the offset within
      the block group so we can use more efficient way to compute block
      group.
      
      This patch introduces ext4_get_group_number() which computes block
      group for a given block much more efficiently. Use this function
      instead of ext4_get_group_no_and_offset() everywhere where we're only
      interested in knowing the block group.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bd86298e
    • L
      ext4: make ext4_block_in_group() much more efficient · 68911009
      Lukas Czerner 提交于
      Currently in when getting the block group number for a particular
      block in ext4_block_in_group() we're using
      ext4_get_group_no_and_offset() which uses do_div() to get the block
      group and the remainer which is offset within the group.
      
      We don't need all of that in ext4_block_in_group() as we only need to
      figure out the group number.
      
      This commit changes ext4_block_in_group() to calculate group number
      directly. This shows as a big improvement with regards to cpu
      utilization. Measuring fallocate -l 15T on fresh file system with perf
      showed that 23% of cpu time was spend in the
      ext4_get_group_no_and_offset(). With this change it completely
      disappears from the list only bumping the occurrence of
      ext4_init_block_bitmap() which is the biggest user of
      ext4_block_in_group() by 4%. As the result of this change on my system
      the fallocate call was approx. 10% faster.
      
      However since there is '-g' option in mkfs which allow us setting
      different groups size (mostly for developers) I've introduced new per
      file system flag whether we have a standard block group size or
      not. The flag is used to determine whether we can use the bit shift
      optimization or not.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      68911009
  4. 03 3月, 2013 1 次提交
  5. 23 2月, 2013 1 次提交
    • L
      ext4: fix free clusters calculation in bigalloc filesystem · 304e220f
      Lukas Czerner 提交于
      ext4_has_free_clusters() should tell us whether there is enough free
      clusters to allocate, however number of free clusters in the file system
      is converted to blocks using EXT4_C2B() which is not only wrong use of
      the macro (we should have used EXT4_NUM_B2C) but it's also completely
      wrong concept since everything else is in cluster units.
      
      Moreover when calculating number of root clusters we should be using
      macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
      off by one. However r_blocks_count should always be a multiple of the
      cluster ratio so doing a plain bit shift should be enough here. We
      avoid using EXT4_B2C() because it's confusing.
      
      As a result of the first problem number of free clusters is much bigger
      than it should have been and ext4_has_free_clusters() would return 1 even
      if there is really not enough free clusters available.
      
      Fix this by removing the EXT4_C2B() conversion of free clusters and
      using bit shift when calculating number of root clusters. This bug
      affects number of xfstests tests covering file system ENOSPC situation
      handling. With this patch most of the ENOSPC problems with bigalloc file
      system disappear, especially the errors caused by delayed allocation not
      having enough space when the actual allocation is finally requested.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      304e220f
  6. 13 1月, 2013 1 次提交
    • E
      ext4: check bh in ext4_read_block_bitmap() · 15b49132
      Eryu Guan 提交于
      Validate the bh pointer before using it, since
      ext4_read_block_bitmap_nowait() might return NULL.
      
      I've seen this in fsfuzz testing.
      
       EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0
       ...
       Call Trace:
        [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60
        [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80
        [<ffffffff811d0d36>] ? __getblk+0x36/0x70
        [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210
        [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140
        [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0
        [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100
        [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0
        [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230
        [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490
        [<ffffffff811b8cd7>] evict+0xa7/0x1c0
        [<ffffffff811b8ed3>] iput_final+0xe3/0x170
        [<ffffffff811b8f9e>] iput+0x3e/0x50
        [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90
        [<ffffffff81231d0b>] ext4_create+0xeb/0x170
        [<ffffffff811aae9c>] vfs_create+0xac/0xd0
        [<ffffffff811ac845>] lookup_open+0x185/0x1c0
        [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170
        [<ffffffff811acb54>] do_last+0x2d4/0x7a0
        [<ffffffff811af743>] path_openat+0xb3/0x480
        [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0
        [<ffffffff811afc49>] do_filp_open+0x49/0xa0
        [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150
        [<ffffffff8119da28>] do_sys_open+0x108/0x1f0
        [<ffffffff8119db51>] sys_open+0x21/0x30
        [<ffffffff81618959>] system_call_fastpath+0x16/0x1b
      
      Also fix comment for ext4_read_block_bitmap_nowait()
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      15b49132
  7. 22 10月, 2012 1 次提交
  8. 17 8月, 2012 1 次提交
  9. 01 7月, 2012 1 次提交
  10. 08 6月, 2012 1 次提交
    • T
      ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg · b0dd6b70
      Theodore Ts'o 提交于
      Ext3 filesystems that are converted to use as many ext4 file system
      features as possible will enable uninit_bg to speed up e2fsck times.
      These file systems will have a native ext3 layout of inode tables and
      block allocation bitmaps (as opposed to ext4's flex_bg layout).
      Unfortunately, in these cases, when first allocating a block in an
      uninitialized block group, ext4 would incorrectly calculate the number
      of free blocks in that block group, and then errorneously report that
      the file system was corrupt:
      
      EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd
      
      This problem can be reproduced via:
      
          mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
          mount -t ext4 /dev/vdd /mnt
          fallocate -l 4600m /mnt/test
      
      The problem was caused by a bone headed mistake in the check to see if a
      particular metadata block was part of the block group.
      
      Many thanks to Kees Cook for finding and bisecting the buggy commit
      which introduced this bug (commit fd034a84, present since v3.2).
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Tested-by: NKees Cook <keescook@chromium.org>
      Cc: stable@kernel.org
      b0dd6b70
  11. 16 5月, 2012 1 次提交
  12. 30 4月, 2012 2 次提交
  13. 21 2月, 2012 2 次提交
  14. 05 1月, 2012 1 次提交
  15. 22 11月, 2011 1 次提交
  16. 10 9月, 2011 11 次提交
  17. 28 6月, 2011 1 次提交
  18. 25 5月, 2011 1 次提交
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  19. 09 5月, 2011 1 次提交
  20. 02 5月, 2011 1 次提交
  21. 31 3月, 2011 1 次提交
  22. 22 3月, 2011 1 次提交
  23. 11 1月, 2011 1 次提交
  24. 28 10月, 2010 2 次提交
  25. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  26. 16 5月, 2010 1 次提交
    • E
      ext4: don't use quota reservation for speculative metadata · 72b8ab9d
      Eric Sandeen 提交于
      Because we can badly over-reserve metadata when we
      calculate worst-case, it complicates things for quota, since
      we must reserve and then claim later, retry on EDQUOT, etc.
      Quota is also a generally smaller pool than fs free blocks,
      so this over-reservation hurts more, and more often.
      
      I'm of the opinion that it's not the worst thing to allow
      metadata to push a user slightly over quota.  This simplifies
      the code and avoids the false quota rejections that result
      from worst-case speculation.
      
      This patch stops the speculative quota-charging for
      worst-case metadata requirements, and just charges quota
      when the blocks are allocated at writeout.  It also is
      able to remove the try-again loop on EDQUOT.
      
      This patch has been tested indirectly by running the xfstests
      suite with a hack to mount & enable quota prior to the test.
      
      I also did a more specific test of fragmenting freespace
      and then doing a large delalloc write under quota; quota
      stopped me at the right amount of file IO, and then the
      writeout generated enough metadata (due to the fragmentation)
      that it put me slightly over quota, as expected.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      72b8ab9d