1. 18 8月, 2009 1 次提交
    • J
      ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef
      Jan Kara 提交于
      During truncate we are sometimes forced to start a new transaction as
      the amount of blocks to be journaled is both quite large and hard to
      predict. So far we restarted a transaction while holding i_data_sem
      and that violates lock ordering because i_data_sem ranks below a
      transaction start (and it can lead to a real deadlock with
      ext4_get_blocks() mapping blocks in some page while having a
      transaction open).
      
      We fix the problem by dropping the i_data_sem before restarting the
      transaction and acquire it afterwards. It's slightly subtle that this
      works:
      
      1) By the time ext4_truncate() is called, all the page cache for the
      truncated part of the file is dropped so get_block() should not be
      called on it (we only have to invalidate extent cache after we
      reacquire i_data_sem because some extent from not-truncated part could
      extend also into the part we are going to truncate).
      
      2) Writes, migrate or defrag hold i_mutex so they are stopped for all
      the time of the truncate.
      
      This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      487caeef
  2. 19 9月, 2009 1 次提交
  3. 01 9月, 2009 1 次提交
    • M
      ext4: Compile warning fix when EXT_DEBUG enabled · 84fe3bef
      Mingming 提交于
      When EXT_DEBUG is enabled I received the following compile warning on
      PPC64:
      
        CC [M]  fs/ext4/inode.o
        CC [M]  fs/ext4/extents.o
      fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
      fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
      fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
      fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
      fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
      fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
        CC [M]  fs/ext4/migrate.o
      
      The patch fixes compile warning.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      Index: linux-2.6.31-rc4/fs/ext4/extents.c
      ===================================================================
      84fe3bef
  4. 18 6月, 2009 1 次提交
  5. 11 6月, 2009 1 次提交
  6. 09 6月, 2009 1 次提交
    • J
      ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() · 03f5d8bc
      Jan Kara 提交于
      Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This
      seems to be a relict from some old days and setting disksize in this
      function does not make much sense.  Currently it was set only by
      ext4_getblk().  Since the parameter has some effect only if create ==
      1, it is easy to check by grepping through the sources that the three
      callers which end up calling ext4_getblk() with create == 1
      (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set
      disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      03f5d8bc
  7. 06 7月, 2009 1 次提交
  8. 18 5月, 2009 2 次提交
  9. 15 5月, 2009 1 次提交
    • T
      ext4: Fix race in ext4_inode_info.i_cached_extent · 2ec0ae3a
      Theodore Ts'o 提交于
      If two CPU's simultaneously call ext4_ext_get_blocks() at the same
      time, there is nothing protecting the i_cached_extent structure from
      being used and updated at the same time.  This could potentially cause
      the wrong location on disk to be read or written to, including
      potentially causing the corruption of the block group descriptors
      and/or inode table.
      
      This bug has been in the ext4 code since almost the very beginning of
      ext4's development.  Fortunately once the data is stored in the page
      cache cache, ext4_get_blocks() doesn't need to be called, so trying to
      replicate this problem to the point where we could identify its root
      cause was *extremely* difficult.  Many thanks to Kevin Shanahan for
      working over several months to be able to reproduce this easily so we
      could finally nail down the cause of the corruption.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      2ec0ae3a
  10. 14 5月, 2009 3 次提交
  11. 03 5月, 2009 1 次提交
  12. 02 5月, 2009 1 次提交
  13. 13 5月, 2009 1 次提交
    • A
      ext4: Mark the unwritten buffer_head as mapped during write_begin · 29fa89d0
      Aneesh Kumar K.V 提交于
      Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
      (unnecessary) calls to get_block() during the call to the write(2)
      system call.  Setting BH_Unwritten buffer heads as BH_Mapped requires
      that the writepages() functions can handle BH_Unwritten buffer_heads.
      
      After this commit, things work as follows:
      
      ext4_ext_get_block() returns unmapped, unwritten, buffer head when
      called with create = 0 for prealloc space. This makes sure we handle
      the read path and non-delayed allocation case correctly.  Even though
      the buffer head is marked unmapped we have valid b_blocknr and b_bdev
      values in the buffer_head.
      
      ext4_da_get_block_prep() called for block resrevation will now return
      mapped, unwritten, new buffer_head for prealloc space. This avoids
      multiple calls to get_block() for write to same offset. By making such
      buffers as BH_New, we also assure that sub-block zeroing of buffered
      writes happens correctly.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      29fa89d0
  14. 14 5月, 2009 1 次提交
  15. 23 4月, 2009 1 次提交
  16. 15 4月, 2009 1 次提交
  17. 05 4月, 2009 1 次提交
  18. 11 3月, 2009 1 次提交
    • E
      ext4: fix header check in ext4_ext_search_right() for deep extent trees. · 395a87bf
      Eric Sandeen 提交于
      The ext4_ext_search_right() function is confusing; it uses a
      "depth" variable which is 0 at the root and maximum at the leaves, 
      but the on-disk metadata uses a "depth" (actually eh_depth) which
      is opposite: maximum at the root, and 0 at the leaves.
      
      The ext4_ext_check_header() function is given a depth and checks
      the header agaisnt that depth; it expects the on-disk semantics,
      but we are giving it the opposite in the while loop in this 
      function.  We should be giving it the on-disk notion of "depth"
      which we can get from (p_depth - depth) - and if you look, the last
      (more commonly hit) call to ext4_ext_check_header() does just this.
      
      Sending in the wrong depth results in (incorrect) messages
      about corruption:
      
      EXT4-fs error (device sdb1): ext4_ext_search_right: bad header
      in inode #2621457: unexpected eh_depth - magic f30a, entries 340,
      max 340(0), depth 1(2)
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12821Reported-by: NDavid Dindorp <ddi@dubex.dk>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      395a87bf
  19. 28 3月, 2009 1 次提交
  20. 12 3月, 2009 1 次提交
  21. 13 3月, 2009 1 次提交
    • T
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o 提交于
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  22. 27 1月, 2009 1 次提交
  23. 09 1月, 2009 1 次提交
  24. 07 1月, 2009 1 次提交
    • T
      ext4: Remove "extents" mount option · 83982b6f
      Theodore Ts'o 提交于
      This mount option is largely superfluous, and in fact the way it was
      implemented was buggy; if a filesystem which did not have the extents
      feature flag was mounted -o extents, the filesystem would attempt to
      create and use extents-based file even though the extents feature flag
      was not eabled.  The simplest thing to do is to nuke the mount option
      entirely.  It's not all that useful to force the non-creation of new
      extent-based files if the filesystem can support it.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83982b6f
  25. 23 11月, 2008 1 次提交
  26. 05 11月, 2008 2 次提交
  27. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  28. 26 11月, 2008 1 次提交
  29. 13 12月, 2008 1 次提交
    • T
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o 提交于
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  30. 08 12月, 2008 1 次提交
    • T
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o 提交于
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  31. 07 10月, 2008 1 次提交
  32. 10 10月, 2008 1 次提交
  33. 14 9月, 2008 1 次提交
    • A
      ext4: Properly update i_disksize. · cf17fea6
      Aneesh Kumar K.V 提交于
      With delayed allocation we use i_data_sem to update i_disksize.  We need
      to update i_disksize only if the new size specified is greater than the
      current value and we need to make sure we don't race with other
      i_disksize update.  With delayed allocation we will switch to the
      write_begin function for non-delayed allocation if we are low on free
      blocks.  This means the write_begin function for non-delayed allocation
      also needs to use the same locking.
      
      We also need to check and update i_disksize even if the new size is less
      that inode.i_size because of delayed allocation.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cf17fea6
  34. 09 9月, 2008 2 次提交
  35. 20 8月, 2008 1 次提交
    • M
      ext4: journal credit fix for the delayed allocation's writepages() function · 525f4ed8
      Mingming Cao 提交于
      Previous delalloc writepages implementation started a new transaction
      outside of a loop which called get_block() to do the block allocation.
      Since we didn't know exactly how many blocks would need to be allocated,
      the estimated journal credits required was very conservative and caused
      many issues.
      
      With the reworked delayed allocation, a new transaction is created for
      each get_block(), thus we don't need to guess how many credits for the
      multiple chunk of allocation.  We start every transaction with enough
      credits for inserting a single exent.  When estimate the credits for
      indirect blocks to allocate a chunk of blocks, we need to know the
      number of data blocks to allocate.  We use the total number of reserved
      delalloc datablocks; if that is too big, for non-extent files, we need
      to limit the number of blocks to EXT4_MAX_TRANS_BLOCKS.
      
      Code cleanup from Aneesh.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      525f4ed8