1. 22 2月, 2009 1 次提交
    • T
      ext4: Add fallback for find_group_flex · 05bf9e83
      Theodore Ts'o 提交于
      This is a workaround for find_group_flex() which badly needs to be
      replaced.  One of its problems (besides ignoring the Orlov algorithm)
      is that it is a bit hyperactive about returning failure under
      suspicious circumstances.  This can lead to spurious ENOSPC failures
      even when there are inodes still available.
      
      Work around this for now by retrying the search using
      find_group_other() if find_group_flex() returns -1.  If
      find_group_other() succeeds when find_group_flex() has failed, log a
      warning message.
      
      A better block/inode allocator that will fix this problem for real has
      been queued up for the next merge window.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      05bf9e83
  2. 07 1月, 2009 1 次提交
    • T
      ext4: Remove "extents" mount option · 83982b6f
      Theodore Ts'o 提交于
      This mount option is largely superfluous, and in fact the way it was
      implemented was buggy; if a filesystem which did not have the extents
      feature flag was mounted -o extents, the filesystem would attempt to
      create and use extents-based file even though the extents feature flag
      was not eabled.  The simplest thing to do is to nuke the mount option
      entirely.  It's not all that useful to force the non-creation of new
      extent-based files if the filesystem can support it.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83982b6f
  3. 04 1月, 2009 1 次提交
  4. 06 1月, 2009 3 次提交
    • A
      ext4: mark the blocks/inode bitmap beyond end of group as used · 648f5879
      Aneesh Kumar K.V 提交于
      We need to mark the block/inode bitmap beyond the end of the group
      with '1'.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      648f5879
    • A
      ext4: Use new buffer_head flag to check uninit group bitmaps initialization · 2ccb5fb9
      Aneesh Kumar K.V 提交于
      For uninit block group, the on-disk bitmap is not initialized. That
      implies we cannot depend on the uptodate flag on the bitmap
      buffer_head to find bitmap validity.  Use a new buffer_head flag which
      would be set after we properly initialize the bitmap.  This also
      prevents (re-)initializing the uninit group bitmap every time we call 
      ext4_read_block_bitmap().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2ccb5fb9
    • A
      ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() · 39341867
      Aneesh Kumar K.V 提交于
      We need to make sure we update the inode bitmap and clear
      EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
      ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
      whether to initialize the inode bitmap each time it is called.
      (introduced by commit c806e68f.)
      
      ext4_read_inode_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
      	ext4_init_inode_bitmap(sb, bh, block_group, desc);
      
      and ext4_new_inode does
      if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                         ino, inode_bitmap_bh->b_data))
      		   ......
      		   ...
      spin_lock(sb_bgl_lock(sbi, group));
      
      gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
      i.e., on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
      parallel ext4_read_inode_bitmap can zero out the bitmap in between
      the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
      EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
      EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
      # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71
      ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      39341867
  5. 04 1月, 2009 1 次提交
  6. 06 1月, 2009 1 次提交
  7. 01 1月, 2009 1 次提交
  8. 14 11月, 2008 1 次提交
  9. 07 11月, 2008 1 次提交
  10. 06 1月, 2009 2 次提交
  11. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  12. 10 10月, 2008 2 次提交
    • F
      ext4: fix initialization of UNINIT bitmap blocks · c806e68f
      Frederic Bohe 提交于
      This fixes a bug which caused on-line resizing of filesystems with a
      1k blocksize to fail.  The root cause of this bug was the fact that if
      an uninitalized bitmap block gets read in by userspace (which
      e2fsprogs does try to avoid, but can happen when the blocksize is less
      than the pagesize and an adjacent blocks is read into memory)
      ext4_read_block_bitmap() was erroneously depending on the buffer
      uptodate flag to decide whether it needed to initialize the bitmap
      block in memory --- i.e., to set the standard set of blocks in use by
      a block group (superblock, bitmaps, inode table, etc.).  Essentially,
      ext4_read_block_bitmap() assumed it was the only routine that might
      try to read a block containing a block bitmap, which is simply not
      true.  
      
      To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
      must always initialize uninitialized bitmap blocks.  Once a block or
      inode is allocated out of that bitmap, it will be marked as
      initialized in the block group descriptor, so in general this won't
      result any extra unnecessary work.
      Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c806e68f
    • T
      ext4: Remove old legacy block allocator · c2ea3fde
      Theodore Ts'o 提交于
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2ea3fde
  13. 09 9月, 2008 2 次提交
  14. 20 8月, 2008 1 次提交
    • E
      ext4: Fix bug where we return ENOSPC even though we have plenty of inodes · c001077f
      Eric Sandeen 提交于
      The find_group_flex() function starts with best_flex as the
      parent_fbg_group, which happens to have 0 inodes free.  Some of the
      flex groups searched have free blocks and free inodes, but the
      flex_freeb_ratio is < 10, so they're skipped.  Then when a group is
      compared to the current "best" flex group, it does not have more free
      blocks than "best", so it is skipped as well.
      
      This continues until no flex group with free inodes is found which has
      a proper ratio or which has more free blocks than the "best" group,
      and we're left with a "best" group that has 0 inodes free, and we
      return -ENOSPC.
      
      We fix this by changing the logic so that if the current "best" flex
      group has no inodes free, and the current one does have room, it is
      promoted to the next "best."
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c001077f
  15. 03 8月, 2008 2 次提交
    • E
      ext4: lock block groups when initializing · b5f10eed
      Eric Sandeen 提交于
      I noticed when filling a 1T filesystem with 4 threads using the
      fs_mark benchmark:
      
      fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
      
      that I occasionally got checksum mismatch errors:
      
      EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935
      
      etc.  I'd reliably get 4-5 of them during the run.
      
      It appears that the problem is likely a race to init the bg's
      when the uninit_bg feature is enabled.
      
      With the patch below, which adds sb_bgl_locking around initialization,
      I was able to complete several runs with no errors or warnings.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b5f10eed
    • E
      ext4: sync up block and inode bitmap reading functions · e29d1cde
      Eric Sandeen 提交于
      ext4_read_block_bitmap and read_inode_bitmap do essentially
      the same thing, and yet they are structured quite differently.
      I came across this difference while looking at doing bg locking
      during bg initialization.
      
      This patch:
      
      * removes unnecessary casts in the error messages
      * renames read_inode_bitmap to ext4_read_inode_bitmap
      * and more substantially, restructures the inode bitmap
        reading function to be more like the block bitmap counterpart.
      
      The change to the inode bitmap reader simplifies the locking
      to be applied in the next patch.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e29d1cde
  16. 12 7月, 2008 4 次提交
  17. 30 4月, 2008 2 次提交
  18. 22 4月, 2008 1 次提交
  19. 17 4月, 2008 2 次提交
  20. 29 4月, 2008 1 次提交
  21. 26 2月, 2008 1 次提交
  22. 08 2月, 2008 1 次提交
  23. 29 1月, 2008 5 次提交
  24. 18 10月, 2007 2 次提交
    • A
      Ext4: Uninitialized Block Groups · 717d50e4
      Andreas Dilger 提交于
      In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
      regardless of whether it is in use.  This is this the most time consuming part
      of the filesystem check.  The unintialized block group feature can greatly
      reduce e2fsck time by eliminating checking of uninitialized inodes.
      
      With this feature, there is a a high water mark of used inodes for each block
      group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
      group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
      of each group descriptor is used to ensure that corruption in the group
      descriptor's bit flags does not cause incorrect operation.
      
      The feature is enabled through a mkfs option
      
      	mke2fs /dev/ -O uninit_groups
      
      A patch adding support for uninitialized block groups to e2fsprogs tools has
      been posted to the linux-ext4 mailing list.
      
      The patches have been stress tested with fsstress and fsx.  In performance
      tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
      linearly with the total number of inodes in the filesytem.  In ext4 with the
      uninitialized block groups feature, the e2fsck time is constant, based
      solely on the number of used inodes rather than the total inode count.
      Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
      greatly reduce e2fsck time for users.  With performance improvement of 2-20
      times, depending on how full the filesystem is.
      
      The attached graph shows the major improvements in e2fsck times in filesystems
      with a large total inode count, but few inodes in use.
      
      In each group descriptor if we have
      
      EXT4_BG_INODE_UNINIT set in bg_flags:
              Inode table is not initialized/used in this group. So we can skip
              the consistency check during fsck.
      EXT4_BG_BLOCK_UNINIT set in bg_flags:
              No block in the group is used. So we can skip the block bitmap
              verification for this group.
      
      We also add two new fields to group descriptor as a part of
      uninitialized group patch.
      
              __le16  bg_itable_unused;       /* Unused inodes count */
              __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */
      
      bg_itable_unused:
      
      If we have EXT4_BG_INODE_UNINIT not set in bg_flags
      then bg_itable_unused will give the offset within
      the inode table till the inodes are used. This can be
      used by fsck to skip list of inodes that are marked unused.
      
      bg_checksum:
      Now that we depend on bg_flags and bg_itable_unused to determine
      the block and inode usage, we need to make sure group descriptor
      is not corrupt. We add checksum to group descriptor to
      detect corruption. If the descriptor is found to be corrupt, we
      mark all the blocks and inodes in the group used.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      717d50e4
    • C
      ext4: Remove (partial, never completed) fragment support · f077d0d7
      Coly Li 提交于
      Fragment support in ext2/3/4 was never implemented, and it probably will
      never be implemented.   So remove it from ext4.
      Signed-off-by: NColy Li <coyli@suse.de>
      Acked-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f077d0d7