1. 31 3月, 2009 1 次提交
  2. 13 3月, 2009 1 次提交
    • T
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o 提交于
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  3. 26 3月, 2009 2 次提交
  4. 17 3月, 2009 1 次提交
    • E
      ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976
      Eric Sandeen 提交于
      This is for Red Hat bug 490026: EXT4 panic, list corruption in
      ext4_mb_new_inode_pa
      
      ext4_lock_group(sb, group) is supposed to protect this list for
      each group, and a common code flow to remove an album is like
      this:
      
          ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
          ext4_lock_group(sb, grp);
          list_del(&pa->pa_group_list);
          ext4_unlock_group(sb, grp);
      
      so it's critical that we get the right group number back for
      this prealloc context, to lock the right group (the one 
      associated with this pa) and prevent concurrent list manipulation.
      
      however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
      comment, "-1 is to protect from crossing allocation group".
      
      This makes sense for the group_pa, where pa_pstart is advanced
      by the length which has been used (in ext4_mb_release_context()),
      and when the entire length has been used, pa_pstart has been
      advanced to the first block of the next group.
      
      However, for inode_pa, pa_pstart is never advanced; it's just
      set once to the first block in the group and not moved after
      that.  So in this case, if we subtract one in ext4_mb_put_pa(),
      we are actually locking the *previous* group, and opening the
      race with the other threads which do not subtract off the extra
      block.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d33a1976
  5. 14 3月, 2009 1 次提交
    • E
      ext4: fix bogus BUG_ONs in in mballoc code · 8d03c7a0
      Eric Sandeen 提交于
      Thiemo Nagel reported that:
      
      # dd if=/dev/zero of=image.ext4 bs=1M count=2
      # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \
        -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4
      # mount -o loop image.ext4 mnt/
      # dd if=/dev/zero of=mnt/file
      
      oopsed, with a BUG_ON in ext4_mb_normalize_request because
      size == EXT4_BLOCKS_PER_GROUP
      
      It appears to me (esp. after talking to Andreas) that the BUG_ON
      is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should
      be allowed, though larger sizes do indicate a problem.
      
      Fix that an another (apparently rare) codepath with a similar check.
      Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8d03c7a0
  6. 14 2月, 2009 1 次提交
  7. 11 2月, 2009 1 次提交
  8. 27 1月, 2009 1 次提交
  9. 04 1月, 2009 1 次提交
  10. 06 1月, 2009 5 次提交
  11. 04 1月, 2009 1 次提交
  12. 06 1月, 2009 3 次提交
    • A
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V 提交于
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • A
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V 提交于
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • A
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V 提交于
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  13. 24 11月, 2008 1 次提交
    • A
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V 提交于
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  14. 06 1月, 2009 1 次提交
  15. 26 11月, 2008 1 次提交
  16. 06 1月, 2009 1 次提交
  17. 23 11月, 2008 1 次提交
  18. 05 11月, 2008 1 次提交
    • T
      ext4: Change unsigned long to unsigned int · 498e5f24
      Theodore Ts'o 提交于
      Convert the unsigned longs that are most responsible for bloating the
      stack usage on 64-bit systems.
      
      Nearly all places in the ext3/4 code which uses "unsigned long" is
      probably a bug, since on 32-bit systems a ulong a 32-bits, which means
      we are wasting stack space on 64-bit systems.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      498e5f24
  19. 06 1月, 2009 3 次提交
  20. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  21. 17 12月, 2008 1 次提交
  22. 04 11月, 2008 1 次提交
  23. 17 10月, 2008 1 次提交
    • T
      ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback · 3e624fc7
      Theodore Ts'o 提交于
      The multiblock allocator needs to be able to release blocks (and issue
      a blkdev discard request) when the transaction which freed those
      blocks is committed.  Previously this was done via a polling mechanism
      when blocks are allocated or freed.  A much better way of doing things
      is to create a jbd2 callback function and attaching the list of blocks
      to be freed directly to the transaction structure.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3e624fc7
  24. 18 10月, 2008 1 次提交
  25. 16 10月, 2008 2 次提交
  26. 14 10月, 2008 1 次提交
  27. 10 10月, 2008 2 次提交
    • F
      ext4: fix initialization of UNINIT bitmap blocks · c806e68f
      Frederic Bohe 提交于
      This fixes a bug which caused on-line resizing of filesystems with a
      1k blocksize to fail.  The root cause of this bug was the fact that if
      an uninitalized bitmap block gets read in by userspace (which
      e2fsprogs does try to avoid, but can happen when the blocksize is less
      than the pagesize and an adjacent blocks is read into memory)
      ext4_read_block_bitmap() was erroneously depending on the buffer
      uptodate flag to decide whether it needed to initialize the bitmap
      block in memory --- i.e., to set the standard set of blocks in use by
      a block group (superblock, bitmaps, inode table, etc.).  Essentially,
      ext4_read_block_bitmap() assumed it was the only routine that might
      try to read a block containing a block bitmap, which is simply not
      true.  
      
      To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
      must always initialize uninitialized bitmap blocks.  Once a block or
      inode is allocated out of that bitmap, it will be marked as
      initialized in the block group descriptor, so in general this won't
      result any extra unnecessary work.
      Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c806e68f
    • T
      ext4: Remove old legacy block allocator · c2ea3fde
      Theodore Ts'o 提交于
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2ea3fde
  28. 24 9月, 2008 1 次提交
  29. 23 9月, 2008 1 次提交