1. 06 1月, 2009 2 次提交
    • A
      ext4: Use new buffer_head flag to check uninit group bitmaps initialization · 2ccb5fb9
      Aneesh Kumar K.V 提交于
      For uninit block group, the on-disk bitmap is not initialized. That
      implies we cannot depend on the uptodate flag on the bitmap
      buffer_head to find bitmap validity.  Use a new buffer_head flag which
      would be set after we properly initialize the bitmap.  This also
      prevents (re-)initializing the uninit group bitmap every time we call 
      ext4_read_block_bitmap().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2ccb5fb9
    • A
      ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() · 39341867
      Aneesh Kumar K.V 提交于
      We need to make sure we update the inode bitmap and clear
      EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
      ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
      whether to initialize the inode bitmap each time it is called.
      (introduced by commit c806e68f.)
      
      ext4_read_inode_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
      	ext4_init_inode_bitmap(sb, bh, block_group, desc);
      
      and ext4_new_inode does
      if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                         ino, inode_bitmap_bh->b_data))
      		   ......
      		   ...
      spin_lock(sb_bgl_lock(sbi, group));
      
      gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
      i.e., on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
      parallel ext4_read_inode_bitmap can zero out the bitmap in between
      the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
      EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
      EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
      # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71
      ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      39341867
  2. 04 1月, 2009 1 次提交
  3. 06 1月, 2009 3 次提交
  4. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  5. 01 1月, 2009 1 次提交
  6. 14 11月, 2008 1 次提交
  7. 07 11月, 2008 1 次提交
  8. 10 10月, 2008 2 次提交
    • F
      ext4: fix initialization of UNINIT bitmap blocks · c806e68f
      Frederic Bohe 提交于
      This fixes a bug which caused on-line resizing of filesystems with a
      1k blocksize to fail.  The root cause of this bug was the fact that if
      an uninitalized bitmap block gets read in by userspace (which
      e2fsprogs does try to avoid, but can happen when the blocksize is less
      than the pagesize and an adjacent blocks is read into memory)
      ext4_read_block_bitmap() was erroneously depending on the buffer
      uptodate flag to decide whether it needed to initialize the bitmap
      block in memory --- i.e., to set the standard set of blocks in use by
      a block group (superblock, bitmaps, inode table, etc.).  Essentially,
      ext4_read_block_bitmap() assumed it was the only routine that might
      try to read a block containing a block bitmap, which is simply not
      true.  
      
      To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
      must always initialize uninitialized bitmap blocks.  Once a block or
      inode is allocated out of that bitmap, it will be marked as
      initialized in the block group descriptor, so in general this won't
      result any extra unnecessary work.
      Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c806e68f
    • T
      ext4: Remove old legacy block allocator · c2ea3fde
      Theodore Ts'o 提交于
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2ea3fde
  9. 09 9月, 2008 2 次提交
  10. 20 8月, 2008 1 次提交
    • E
      ext4: Fix bug where we return ENOSPC even though we have plenty of inodes · c001077f
      Eric Sandeen 提交于
      The find_group_flex() function starts with best_flex as the
      parent_fbg_group, which happens to have 0 inodes free.  Some of the
      flex groups searched have free blocks and free inodes, but the
      flex_freeb_ratio is < 10, so they're skipped.  Then when a group is
      compared to the current "best" flex group, it does not have more free
      blocks than "best", so it is skipped as well.
      
      This continues until no flex group with free inodes is found which has
      a proper ratio or which has more free blocks than the "best" group,
      and we're left with a "best" group that has 0 inodes free, and we
      return -ENOSPC.
      
      We fix this by changing the logic so that if the current "best" flex
      group has no inodes free, and the current one does have room, it is
      promoted to the next "best."
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c001077f
  11. 03 8月, 2008 2 次提交
    • E
      ext4: lock block groups when initializing · b5f10eed
      Eric Sandeen 提交于
      I noticed when filling a 1T filesystem with 4 threads using the
      fs_mark benchmark:
      
      fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
      
      that I occasionally got checksum mismatch errors:
      
      EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935
      
      etc.  I'd reliably get 4-5 of them during the run.
      
      It appears that the problem is likely a race to init the bg's
      when the uninit_bg feature is enabled.
      
      With the patch below, which adds sb_bgl_locking around initialization,
      I was able to complete several runs with no errors or warnings.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b5f10eed
    • E
      ext4: sync up block and inode bitmap reading functions · e29d1cde
      Eric Sandeen 提交于
      ext4_read_block_bitmap and read_inode_bitmap do essentially
      the same thing, and yet they are structured quite differently.
      I came across this difference while looking at doing bg locking
      during bg initialization.
      
      This patch:
      
      * removes unnecessary casts in the error messages
      * renames read_inode_bitmap to ext4_read_inode_bitmap
      * and more substantially, restructures the inode bitmap
        reading function to be more like the block bitmap counterpart.
      
      The change to the inode bitmap reader simplifies the locking
      to be applied in the next patch.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e29d1cde
  12. 12 7月, 2008 4 次提交
  13. 30 4月, 2008 2 次提交
  14. 22 4月, 2008 1 次提交
  15. 17 4月, 2008 2 次提交
  16. 29 4月, 2008 1 次提交
  17. 26 2月, 2008 1 次提交
  18. 08 2月, 2008 1 次提交
  19. 29 1月, 2008 5 次提交
  20. 18 10月, 2007 2 次提交
    • A
      Ext4: Uninitialized Block Groups · 717d50e4
      Andreas Dilger 提交于
      In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
      regardless of whether it is in use.  This is this the most time consuming part
      of the filesystem check.  The unintialized block group feature can greatly
      reduce e2fsck time by eliminating checking of uninitialized inodes.
      
      With this feature, there is a a high water mark of used inodes for each block
      group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
      group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
      of each group descriptor is used to ensure that corruption in the group
      descriptor's bit flags does not cause incorrect operation.
      
      The feature is enabled through a mkfs option
      
      	mke2fs /dev/ -O uninit_groups
      
      A patch adding support for uninitialized block groups to e2fsprogs tools has
      been posted to the linux-ext4 mailing list.
      
      The patches have been stress tested with fsstress and fsx.  In performance
      tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
      linearly with the total number of inodes in the filesytem.  In ext4 with the
      uninitialized block groups feature, the e2fsck time is constant, based
      solely on the number of used inodes rather than the total inode count.
      Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
      greatly reduce e2fsck time for users.  With performance improvement of 2-20
      times, depending on how full the filesystem is.
      
      The attached graph shows the major improvements in e2fsck times in filesystems
      with a large total inode count, but few inodes in use.
      
      In each group descriptor if we have
      
      EXT4_BG_INODE_UNINIT set in bg_flags:
              Inode table is not initialized/used in this group. So we can skip
              the consistency check during fsck.
      EXT4_BG_BLOCK_UNINIT set in bg_flags:
              No block in the group is used. So we can skip the block bitmap
              verification for this group.
      
      We also add two new fields to group descriptor as a part of
      uninitialized group patch.
      
              __le16  bg_itable_unused;       /* Unused inodes count */
              __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */
      
      bg_itable_unused:
      
      If we have EXT4_BG_INODE_UNINIT not set in bg_flags
      then bg_itable_unused will give the offset within
      the inode table till the inodes are used. This can be
      used by fsck to skip list of inodes that are marked unused.
      
      bg_checksum:
      Now that we depend on bg_flags and bg_itable_unused to determine
      the block and inode usage, we need to make sure group descriptor
      is not corrupt. We add checksum to group descriptor to
      detect corruption. If the descriptor is found to be corrupt, we
      mark all the blocks and inodes in the group used.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      717d50e4
    • C
      ext4: Remove (partial, never completed) fragment support · f077d0d7
      Coly Li 提交于
      Fragment support in ext2/3/4 was never implemented, and it probably will
      never be implemented.   So remove it from ext4.
      Signed-off-by: NColy Li <coyli@suse.de>
      Acked-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f077d0d7
  21. 17 10月, 2007 1 次提交
  22. 18 7月, 2007 1 次提交
    • K
      ext4: Add nanosecond timestamps · ef7f3835
      Kalpak Shah 提交于
      This patch adds nanosecond timestamps for ext4. This involves adding
      *time_extra fields to the ext4_inode to extend the timestamps to
      64-bits.  Creation time is also added by this patch.
      
      These extended fields will fit into an inode if the filesystem was
      formatted with large inodes (-I 256 or larger) and there are currently
      no EAs consuming all of the available space. For new inodes we always
      reserve enough space for the kernel's known extended fields, but for
      inodes created with an old kernel this might not have been the case. So
      this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
      flag(ro-compat so that older kernels can't create inodes with a smaller
      extra_isize). which indicates if the fields fitting inside
      s_min_extra_isize are available or not.  If the expansion of inodes if
      unsuccessful then this feature will be disabled.  This feature is only
      enabled if requested by the sysadmin.
      
      None of the extended inode fields is critical for correct filesystem
      operation.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ef7f3835
  23. 12 10月, 2006 2 次提交
    • A
      [PATCH] ext4 64 bit divide fix · f4e5bc24
      Andrew Morton 提交于
      With CONFIG_LBD=n, sector_div() expands to a plain old divide.  But ext4 is
      _not_ passing in a sector_t as the first argument, so...
      
      fs/built-in.o: In function `ext4_get_group_no_and_offset':
      fs/ext4/balloc.c:39: undefined reference to `__umoddi3'
      fs/ext4/balloc.c:41: undefined reference to `__udivdi3'
      fs/built-in.o: In function `find_group_orlov':
      fs/ext4/ialloc.c:278: undefined reference to `__udivdi3'
      fs/built-in.o: In function `ext4_fill_super':
      fs/ext4/super.c:1488: undefined reference to `__udivdi3'
      fs/ext4/super.c:1488: undefined reference to `__umoddi3'
      fs/ext4/super.c:1594: undefined reference to `__udivdi3'
      fs/ext4/super.c:1601: undefined reference to `__umoddi3'
      
      Fix that up by calling do_div() directly.
      
      Also cast the arg to u64.  do_div() is only defined on u64, and ext4_fsblk_t
      is supposed to be opaque.
      
      Note especially the changes to find_group_orlov().  It was attempting to do
      
      	do_div(int, unsigned long long);
      
      which is royally screwed up.  Switched it to plain old divide.
      
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f4e5bc24
    • A
      [PATCH] ext4: move block number hi bits · 8fadc143
      Alexandre Ratchov 提交于
      move '_hi' bits of block numbers in the larger part of the
      block group descriptor structure
      Signed-off-by: NAlexandre Ratchov <alexandre.ratchov@bull.net>
      Signed-off-by: NDave Kleikamp <shaggy@austin.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8fadc143