1. 05 3月, 2010 1 次提交
    • J
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang 提交于
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      744692dc
  2. 03 3月, 2010 1 次提交
  3. 02 3月, 2010 2 次提交
  4. 25 2月, 2010 1 次提交
  5. 16 2月, 2010 2 次提交
  6. 01 1月, 2010 1 次提交
    • T
      ext4: Calculate metadata requirements more accurately · 9d0be502
      Theodore Ts'o 提交于
      In the past, ext4_calc_metadata_amount(), and its sub-functions
      ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
      badly over-estimated the number of metadata blocks that might be
      required for delayed allocation blocks.  This didn't matter as much
      when functions which managed the reserved metadata blocks were more
      aggressive about dropping reserved metadata blocks as delayed
      allocation blocks were written, but unfortunately they were too
      aggressive.  This was fixed in commit 0637c6f4, but as a result the
      over-estimation by ext4_calc_metadata_amount() would lead to reserving
      2-3 times the number of pending delayed allocation blocks as
      potentially required metadata blocks.  So if there are 1 megabytes of
      blocks which have been not yet been allocation, up to 3 megabytes of
      space would get reserved out of the user's quota and from the file
      system free space pool until all of the inode's data blocks have been
      allocated.
      
      This commit addresses this problem by much more accurately estimating
      the number of metadata blocks that will be required.  It will still
      somewhat over-estimate the number of blocks needed, since it must make
      a worst case estimate not knowing which physical blocks will be
      needed, but it is much more accurate than before.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9d0be502
  7. 23 12月, 2009 2 次提交
  8. 16 12月, 2009 1 次提交
    • A
      tree-wide: convert open calls to remove spaces to skip_spaces() lib function · e7d2860b
      André Goddard Rosa 提交于
      Makes use of skip_spaces() defined in lib/string.c for removing leading
      spaces from strings all over the tree.
      
      It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
         text    data     bss     dec     hex filename
        64688     584     592   65864   10148 (TOTALS-BEFORE)
        64641     584     592   65817   10119 (TOTALS-AFTER)
      
      Also, while at it, if we see (*str && isspace(*str)), we can be sure to
      remove the first condition (*str) as the second one (isspace(*str)) also
      evaluates to 0 whenever *str == 0, making it redundant. In other words,
      "a char equals zero is never a space".
      
      Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
      and found occurrences of this pattern on 3 more files:
          drivers/leds/led-class.c
          drivers/leds/ledtrig-timer.c
          drivers/video/output.c
      
      @@
      expression str;
      @@
      
      ( // ignore skip_spaces cases
      while (*str &&  isspace(*str)) { \(str++;\|++str;\) }
      |
      - *str &&
      isspace(*str)
      )
      Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d2860b
  9. 21 12月, 2009 1 次提交
  10. 10 12月, 2009 2 次提交
  11. 09 12月, 2009 2 次提交
    • J
      ext4: Wait for proper transaction commit on fsync · b436b9be
      Jan Kara 提交于
      We cannot rely on buffer dirty bits during fsync because pdflush can come
      before fsync is called and clear dirty bits without forcing a transaction
      commit. What we do is that we track which transaction has last changed
      the inode and which transaction last changed allocation and force it to
      disk on fsync.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b436b9be
    • J
      ext4: wait for log to commit when umounting · d4edac31
      Josef Bacik 提交于
      There is a potential race when a transaction is committing right when
      the file system is being umounting.  This could reduce in a race
      because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
      before the commit code calls a callback so the mballoc code can
      release freed blocks in the transaction, resulting in a panic trying
      to access the freed s_group_info.
      
      The fix is to wait for the transaction to finish committing before we
      shutdown the multiblock allocator.  
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d4edac31
  12. 08 12月, 2009 1 次提交
  13. 20 11月, 2009 2 次提交
  14. 23 11月, 2009 3 次提交
  15. 03 11月, 2009 1 次提交
    • L
      Revert "ext4: Remove journal_checksum mount option and enable it by default" · d4da6c9c
      Linus Torvalds 提交于
      This reverts commit d0646f7b, as
      requested by Eric Sandeen.
      
      It can basically cause an ext4 filesystem to miss recovery (and thus get
      mounted with errors) if the journal checksum does not match.
      
      Quoth Eric:
      
         "My hand-wavy hunch about what is happening is that we're finding a
          bad checksum on the last partially-written transaction, which is
          not surprising, but if we have a wrapped log and we're doing the
          initial scan for head/tail, and we abort scanning on that bad
          checksum, then we are essentially running an unrecovered filesystem.
      
          But that's hand-wavy and I need to go look at the code.
      
          We lived without journal checksums on by default until now, and at
          this point they're doing more harm than good, so we should revert
          the default-changing commit until we can fix it and do some good
          power-fail testing with the fixes in place."
      
      See
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=14354
      
      for all the gory details.
      Requested-by: NEric Sandeen <sandeen@redhat.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Alexey Fisher <bug-track@fisher-privat.net>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mathias Burén <mathias.buren@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4da6c9c
  16. 01 10月, 2009 1 次提交
  17. 30 9月, 2009 2 次提交
    • T
      ext4: Use tracepoints for mb_history trace file · 296c355c
      Theodore Ts'o 提交于
      The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
      number of problems: it required a largish amount of memory to be
      allocated for each ext4 filesystem, and the s_mb_history_lock
      introduced a CPU contention problem.  
      
      By ripping out the mb_history code and replacing it with ftrace
      tracepoints, and we get more functionality: timestamps, event
      filtering, the ability to correlate mballoc history with other ext4
      tracepoints, etc.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      296c355c
    • T
      ext4, jbd2: Drop unneeded printks at mount and unmount time · 90576c0b
      Theodore Ts'o 提交于
      There are a number of kernel printk's which are printed when an ext4
      filesystem is mounted and unmounted.  Disable them to economize space
      in the system logs.  In addition, disabling the mballoc stats by
      default saves a number of unneeded atomic operations for every block
      allocation or deallocation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      90576c0b
  18. 29 9月, 2009 3 次提交
    • C
      ext4: Handle nested ext4_journal_start/stop calls without a journal · d3d1faf6
      Curt Wohlgemuth 提交于
      This patch fixes a problem with handling nested calls to
      ext4_journal_start/ext4_journal_stop, when there is no journal present.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3d1faf6
    • M
      ext4: async direct IO for holes and fallocate support · 8d5d02e6
      Mingming Cao 提交于
      For async direct IO that covers holes or fallocate, the end_io
      callback function now queued the convertion work on workqueue but
      don't flush the work rightaway as it might take too long to afford.
      
      But when fsync is called after all the data is completed, user expects
      the metadata also being updated before fsync returns.
      
      Thus we need to flush the conversion work when fsync() is called.
      This patch keep track of a listed of completed async direct io that
      has a work queued on workqueue.  When fsync() is called, it will go
      through the list and do the conversion.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      8d5d02e6
    • M
      ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff
      Mingming Cao 提交于
      Currently the DIO VFS code passes create = 0 when writing to the
      middle of file.  It does this to avoid block allocation for holes, so
      as not to expose stale data out when there is a parallel buffered read
      (which does not hold the i_mutex lock).  Direct I/O writes into holes
      falls back to buffered IO for this reason.
      
      Since preallocated extents are treated as holes when doing a
      get_block() look up (buffer is not mapped), direct IO over fallocate
      also falls back to buffered IO.  Thus ext4 actually silently falls
      back to buffered IO in above two cases, which is undesirable.
      
      To fix this, this patch creates unitialized extents when a direct I/O
      write into holes in sparse files, and registering an end_io callback which
      converts the uninitialized extent to an initialized extent after the
      I/O is completed.
      Singed-Off-By: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4c0425ff
  19. 30 9月, 2009 1 次提交
    • T
      ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b
      Theodore Ts'o 提交于
      Work around problems in the writeback code to force out writebacks in
      larger chunks than just 4mb, which is just too small.  This also works
      around limitations in the ext4 block allocator, which can't allocate
      more than 2048 blocks at a time.  So we need to defeat the round-robin
      characteristics of the writeback code and try to write out as many
      blocks in one inode before allowing the writeback code to move on to
      another inode.  We add a a new per-filesystem tunable,
      max_writeback_mb_bump, which caps this to a default of 128mb per
      inode.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      55138e0b
  20. 22 9月, 2009 2 次提交
  21. 17 9月, 2009 1 次提交
    • E
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen 提交于
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
  22. 15 9月, 2009 1 次提交
  23. 12 9月, 2009 1 次提交
    • T
      ext4: Fix initalization of s_flex_groups · 7ad9bb65
      Theodore Ts'o 提交于
      The s_flex_groups array should have been initialized using atomic_add
      to sum up the free counts from the block groups that make up a
      flex_bg.  By using atomic_set, the value of the s_flex_groups array
      was set to the values of the last block group in the flex_bg.  
      
      The impact of this bug is that the block and inode allocation
      algorithms might not pick the best flex_bg for new allocation.
      
      Thanks to Damien Guibouret for pointing out this problem!
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7ad9bb65
  24. 11 9月, 2009 1 次提交
  25. 06 9月, 2009 1 次提交
  26. 18 8月, 2009 2 次提交
    • E
      ext4: Add feature set check helper for mount & remount paths · a13fb1a4
      Eric Sandeen 提交于
      A user reported that although his root ext4 filesystem was mounting
      fine, other filesystems would not mount, with the:
      
      "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"
      
      error on his 32-bit box built without CONFIG_LBDAF.  This is because
      the test at mount time for this situation was not being re-checked
      on remount, and the normal boot process makes an ro->rw transition,
      so this was being missed.
      
      Refactor to make a common helper function to test the filesystem
      features against the type of mount request (RO vs. RW) so that we 
      stay consistent.
      
      Addresses Red-Hat-Bugzilla: #517650
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a13fb1a4
    • E
      ext4: reject too-large filesystems on 32-bit kernels · bf43d84b
      Eric Sandeen 提交于
      ext4 will happily mount a > 16T filesystem on a 32-bit box, but
      this is not safe; writes to the block device will wrap past 16T
      and the page cache can't index past 16T (232 index * 4k pages).
      
      Adding another test to the existing "too many sectors" test
      should do the trick.
      
      Add a comment, a relevant return value, and fix the reference
      to the CONFIG_LBD(AF) option as well.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bf43d84b
  27. 28 7月, 2009 1 次提交