1. 27 1月, 2009 1 次提交
  2. 04 1月, 2009 1 次提交
  3. 06 1月, 2009 5 次提交
  4. 04 1月, 2009 1 次提交
  5. 06 1月, 2009 3 次提交
    • A
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V 提交于
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • A
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V 提交于
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • A
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V 提交于
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  6. 24 11月, 2008 1 次提交
    • A
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V 提交于
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  7. 06 1月, 2009 1 次提交
  8. 26 11月, 2008 1 次提交
  9. 06 1月, 2009 1 次提交
  10. 23 11月, 2008 1 次提交
  11. 05 11月, 2008 1 次提交
    • T
      ext4: Change unsigned long to unsigned int · 498e5f24
      Theodore Ts'o 提交于
      Convert the unsigned longs that are most responsible for bloating the
      stack usage on 64-bit systems.
      
      Nearly all places in the ext3/4 code which uses "unsigned long" is
      probably a bug, since on 32-bit systems a ulong a 32-bits, which means
      we are wasting stack space on 64-bit systems.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      498e5f24
  12. 06 1月, 2009 3 次提交
  13. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  14. 17 12月, 2008 1 次提交
  15. 04 11月, 2008 1 次提交
  16. 17 10月, 2008 1 次提交
    • T
      ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback · 3e624fc7
      Theodore Ts'o 提交于
      The multiblock allocator needs to be able to release blocks (and issue
      a blkdev discard request) when the transaction which freed those
      blocks is committed.  Previously this was done via a polling mechanism
      when blocks are allocated or freed.  A much better way of doing things
      is to create a jbd2 callback function and attaching the list of blocks
      to be freed directly to the transaction structure.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3e624fc7
  17. 18 10月, 2008 1 次提交
  18. 16 10月, 2008 2 次提交
  19. 14 10月, 2008 1 次提交
  20. 10 10月, 2008 2 次提交
    • F
      ext4: fix initialization of UNINIT bitmap blocks · c806e68f
      Frederic Bohe 提交于
      This fixes a bug which caused on-line resizing of filesystems with a
      1k blocksize to fail.  The root cause of this bug was the fact that if
      an uninitalized bitmap block gets read in by userspace (which
      e2fsprogs does try to avoid, but can happen when the blocksize is less
      than the pagesize and an adjacent blocks is read into memory)
      ext4_read_block_bitmap() was erroneously depending on the buffer
      uptodate flag to decide whether it needed to initialize the bitmap
      block in memory --- i.e., to set the standard set of blocks in use by
      a block group (superblock, bitmaps, inode table, etc.).  Essentially,
      ext4_read_block_bitmap() assumed it was the only routine that might
      try to read a block containing a block bitmap, which is simply not
      true.  
      
      To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
      must always initialize uninitialized bitmap blocks.  Once a block or
      inode is allocated out of that bitmap, it will be marked as
      initialized in the block group descriptor, so in general this won't
      result any extra unnecessary work.
      Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c806e68f
    • T
      ext4: Remove old legacy block allocator · c2ea3fde
      Theodore Ts'o 提交于
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2ea3fde
  21. 24 9月, 2008 1 次提交
  22. 23 9月, 2008 1 次提交
  23. 14 9月, 2008 2 次提交
  24. 10 10月, 2008 1 次提交
  25. 09 9月, 2008 1 次提交
  26. 09 10月, 2008 1 次提交
    • A
      ext4: Make sure all the block allocation paths reserve blocks · a30d542a
      Aneesh Kumar K.V 提交于
      With delayed allocation we need to make sure block are reserved before
      we attempt to allocate them. Otherwise we get block allocation failure
      (ENOSPC) during writepages which cannot be handled. This would mean
      silent data loss (We do a printk stating data will be lost). This patch
      updates the DIO and fallocate code path to do block reservation before
      block allocation. This is needed to make sure parallel DIO and fallocate
      request doesn't take block out of delayed reserve space.
      
      When free blocks count go below a threshold we switch to a slow patch
      which looks at other CPU's accumulated percpu counter values.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a30d542a
  27. 09 9月, 2008 1 次提交
  28. 19 8月, 2008 1 次提交
  29. 24 7月, 2008 1 次提交
    • A
      ext4: Don't allow lg prealloc list to be grow large. · 6be2ded1
      Aneesh Kumar K.V 提交于
      Currently, the locality group prealloc list is freed only when there
      is a block allocation failure. This can result in large number of
      entries in the preallocation list making ext4_mb_use_preallocated()
      expensive.
      
      To fix this, we convert the locality group prealloc list to a hash
      list. The hash index is the order of number of blocks in the prealloc
      space with a max order of 9. When adding prealloc space to the list we
      make sure total entries for each order does not exceed 8. If it is
      more than 8 we discard few entries and make sure the we have only <= 5
      entries.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6be2ded1