1. 09 10月, 2008 1 次提交
    • A
      ext4: Make sure all the block allocation paths reserve blocks · a30d542a
      Aneesh Kumar K.V 提交于
      With delayed allocation we need to make sure block are reserved before
      we attempt to allocate them. Otherwise we get block allocation failure
      (ENOSPC) during writepages which cannot be handled. This would mean
      silent data loss (We do a printk stating data will be lost). This patch
      updates the DIO and fallocate code path to do block reservation before
      block allocation. This is needed to make sure parallel DIO and fallocate
      request doesn't take block out of delayed reserve space.
      
      When free blocks count go below a threshold we switch to a slow patch
      which looks at other CPU's accumulated percpu counter values.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a30d542a
  2. 09 9月, 2008 2 次提交
  3. 10 10月, 2008 1 次提交
  4. 20 8月, 2008 1 次提交
  5. 03 8月, 2008 2 次提交
    • E
      ext4: lock block groups when initializing · b5f10eed
      Eric Sandeen 提交于
      I noticed when filling a 1T filesystem with 4 threads using the
      fs_mark benchmark:
      
      fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
      
      that I occasionally got checksum mismatch errors:
      
      EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935
      
      etc.  I'd reliably get 4-5 of them during the run.
      
      It appears that the problem is likely a race to init the bg's
      when the uninit_bg feature is enabled.
      
      With the patch below, which adds sb_bgl_locking around initialization,
      I was able to complete several runs with no errors or warnings.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b5f10eed
    • E
      ext4: sync up block and inode bitmap reading functions · e29d1cde
      Eric Sandeen 提交于
      ext4_read_block_bitmap and read_inode_bitmap do essentially
      the same thing, and yet they are structured quite differently.
      I came across this difference while looking at doing bg locking
      during bg initialization.
      
      This patch:
      
      * removes unnecessary casts in the error messages
      * renames read_inode_bitmap to ext4_read_inode_bitmap
      * and more substantially, restructures the inode bitmap
        reading function to be more like the block bitmap counterpart.
      
      The change to the inode bitmap reader simplifies the locking
      to be applied in the next patch.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e29d1cde
  6. 15 7月, 2008 1 次提交
    • M
      ext4: delayed allocation ENOSPC handling · d2a17637
      Mingming Cao 提交于
      This patch does block reservation for delayed
      allocation, to avoid ENOSPC later at page flush time.
      
      Blocks(data and metadata) are reserved at da_write_begin()
      time, the freeblocks counter is updated by then, and the number of
      reserved blocks is store in per inode counter.
              
      At the writepage time, the unused reserved meta blocks are returned
      back. At unlink/truncate time, reserved blocks are properly released.
      
      Updated fix from  Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      to fix the oldallocator block reservation accounting with delalloc, added
      lock to guard the counters and also fix the reservation for meta blocks.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d2a17637
  7. 12 7月, 2008 9 次提交
  8. 04 6月, 2008 1 次提交
  9. 16 5月, 2008 1 次提交
  10. 30 4月, 2008 1 次提交
  11. 17 4月, 2008 3 次提交
  12. 07 2月, 2008 2 次提交
  13. 29 1月, 2008 6 次提交
  14. 14 11月, 2007 1 次提交
    • L
      Revert "ext2/ext3/ext4: add block bitmap validation" · 0b832a4b
      Linus Torvalds 提交于
      This reverts commit 7c9e69fa, fixing up
      conflicts in fs/ext4/balloc.c manually.
      
      The cost of doing the bitmap validation on each lookup - even when the
      bitmap is cached - is absolutely prohibitive.  We could, and probably
      should, do it only when adding the bitmap to the buffer cache.  However,
      right now we are better off just reverting it.
      
      Peter Zijlstra measured the cost of this extra validation as a 85%
      decrease in cached iozone, and while I had a patch that took it down to
      just 17% by not being _quite_ so stupid in the validation, it was still
      a big slowdown that could have been avoided by just doing it right.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andreas Dilger <adilger@clusterfs.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b832a4b
  15. 18 10月, 2007 2 次提交
    • A
      ext4: Convert bg_inode_bitmap and bg_inode_table · 5272f837
      Aneesh Kumar K.V 提交于
      Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
      and bg_inode_table_lo.  This helps in finding BUGs due to
      direct partial access of these split 64 bit values
      
      Also fix one direct partial access
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5272f837
    • A
      Ext4: Uninitialized Block Groups · 717d50e4
      Andreas Dilger 提交于
      In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
      regardless of whether it is in use.  This is this the most time consuming part
      of the filesystem check.  The unintialized block group feature can greatly
      reduce e2fsck time by eliminating checking of uninitialized inodes.
      
      With this feature, there is a a high water mark of used inodes for each block
      group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
      group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
      of each group descriptor is used to ensure that corruption in the group
      descriptor's bit flags does not cause incorrect operation.
      
      The feature is enabled through a mkfs option
      
      	mke2fs /dev/ -O uninit_groups
      
      A patch adding support for uninitialized block groups to e2fsprogs tools has
      been posted to the linux-ext4 mailing list.
      
      The patches have been stress tested with fsstress and fsx.  In performance
      tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
      linearly with the total number of inodes in the filesytem.  In ext4 with the
      uninitialized block groups feature, the e2fsck time is constant, based
      solely on the number of used inodes rather than the total inode count.
      Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
      greatly reduce e2fsck time for users.  With performance improvement of 2-20
      times, depending on how full the filesystem is.
      
      The attached graph shows the major improvements in e2fsck times in filesystems
      with a large total inode count, but few inodes in use.
      
      In each group descriptor if we have
      
      EXT4_BG_INODE_UNINIT set in bg_flags:
              Inode table is not initialized/used in this group. So we can skip
              the consistency check during fsck.
      EXT4_BG_BLOCK_UNINIT set in bg_flags:
              No block in the group is used. So we can skip the block bitmap
              verification for this group.
      
      We also add two new fields to group descriptor as a part of
      uninitialized group patch.
      
              __le16  bg_itable_unused;       /* Unused inodes count */
              __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */
      
      bg_itable_unused:
      
      If we have EXT4_BG_INODE_UNINIT not set in bg_flags
      then bg_itable_unused will give the offset within
      the inode table till the inodes are used. This can be
      used by fsck to skip list of inodes that are marked unused.
      
      bg_checksum:
      Now that we depend on bg_flags and bg_itable_unused to determine
      the block and inode usage, we need to make sure group descriptor
      is not corrupt. We add checksum to group descriptor to
      detect corruption. If the descriptor is found to be corrupt, we
      mark all the blocks and inodes in the group used.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      717d50e4
  16. 17 10月, 2007 3 次提交
  17. 18 7月, 2007 1 次提交
  18. 17 7月, 2007 1 次提交
  19. 01 6月, 2007 1 次提交