1. 26 8月, 2009 1 次提交
    • A
      ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84
      Aneesh Kumar K.V 提交于
      We need to unlock the new inode before iput.  This patch fixes the
      following warning when calling chattr +e to migrate a file to use
      extents.  It also fixes problems in when e4defrag attempts to
      defragment an inode.
      
      [  470.400044] ------------[ cut here ]------------
      [  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
      [  470.400072] Hardware name: N/A
      .....
      ...
      [  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
      [  470.400359] Call Trace:
      [  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
      [  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
      [  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
      [  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
      [  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
      [  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
      [  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
      [  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
      [  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
      [  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
      [  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
      [  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
      [  470.400557] ---[ end trace ab85723542352dac ]---
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a8526e84
  2. 18 8月, 2009 5 次提交
    • E
      ext4: Add feature set check helper for mount & remount paths · a13fb1a4
      Eric Sandeen 提交于
      A user reported that although his root ext4 filesystem was mounting
      fine, other filesystems would not mount, with the:
      
      "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"
      
      error on his 32-bit box built without CONFIG_LBDAF.  This is because
      the test at mount time for this situation was not being re-checked
      on remount, and the normal boot process makes an ro->rw transition,
      so this was being missed.
      
      Refactor to make a common helper function to test the filesystem
      features against the type of mount request (RO vs. RW) so that we 
      stay consistent.
      
      Addresses Red-Hat-Bugzilla: #517650
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a13fb1a4
    • E
      simplify some logic in ext4_mb_normalize_request · 38877f4e
      Eric Sandeen 提交于
      While reading through some of the mballoc code it seems that a couple
      spots in the size normalization function could be streamlined.
      
      The test for non-overlapping PAs can be or'd for the start & end
      conditions, and the tests for adjacent PAs can be else-if'd - 
      it's essentially independently testing:
      
      	if (A + B <= C)
      		...
      	if (A > C)
      		...
      
      These cannot both be true so it seems like the else-if might
      be slightly more efficient and/or informative.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      38877f4e
    • E
      ext4: open-code ext4_mb_update_group_info · 0373130d
      Eric Sandeen 提交于
      ext4_mb_update_group_info is only called in one place, and it's
      extremely simple.  There's no reason to have it in a separate function
      in a separate file as far as I can tell, it just obfuscates what's
      really going on.
      
      Perhaps it was intended to keep the grp->bb_* manipulation local to
      mballoc.c but we're already accessing other grp-> fields in balloc.c
      directly so this seems ok.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0373130d
    • E
      ext4: reject too-large filesystems on 32-bit kernels · bf43d84b
      Eric Sandeen 提交于
      ext4 will happily mount a > 16T filesystem on a 32-bit box, but
      this is not safe; writes to the block device will wrap past 16T
      and the page cache can't index past 16T (232 index * 4k pages).
      
      Adding another test to the existing "too many sectors" test
      should do the trick.
      
      Add a comment, a relevant return value, and fix the reference
      to the CONFIG_LBD(AF) option as well.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bf43d84b
    • J
      ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef
      Jan Kara 提交于
      During truncate we are sometimes forced to start a new transaction as
      the amount of blocks to be journaled is both quite large and hard to
      predict. So far we restarted a transaction while holding i_data_sem
      and that violates lock ordering because i_data_sem ranks below a
      transaction start (and it can lead to a real deadlock with
      ext4_get_blocks() mapping blocks in some page while having a
      transaction open).
      
      We fix the problem by dropping the i_data_sem before restarting the
      transaction and acquire it afterwards. It's slightly subtle that this
      works:
      
      1) By the time ext4_truncate() is called, all the page cache for the
      truncated part of the file is dropped so get_block() should not be
      called on it (we only have to invalidate extent cache after we
      reacquire i_data_sem because some extent from not-truncated part could
      extend also into the part we are going to truncate).
      
      2) Writes, migrate or defrag hold i_mutex so they are stopped for all
      the time of the truncate.
      
      This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      487caeef
  3. 19 9月, 2009 1 次提交
  4. 01 9月, 2009 1 次提交
    • M
      ext4: Compile warning fix when EXT_DEBUG enabled · 84fe3bef
      Mingming 提交于
      When EXT_DEBUG is enabled I received the following compile warning on
      PPC64:
      
        CC [M]  fs/ext4/inode.o
        CC [M]  fs/ext4/extents.o
      fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
      fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
      fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
      fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
      fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
      fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
        CC [M]  fs/ext4/migrate.o
      
      The patch fixes compile warning.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      Index: linux-2.6.31-rc4/fs/ext4/extents.c
      ===================================================================
      84fe3bef
  5. 19 9月, 2009 1 次提交
    • T
      ext4: Avoid group preallocation for closed files · 50797481
      Theodore Ts'o 提交于
      Currently the group preallocation code tries to find a large (512)
      free block from which to do per-cpu group allocation for small files.
      The problem with this scheme is that it leaves the filesystem horribly
      fragmented.  In the worst case, if the filesystem is unmounted and
      remounted (after a system shutdown, for example) we forget the fact
      that wee were using a particular (now-partially filled) 512 block
      extent.  So the next time we try to allocate space for a small file,
      we will find *another* completely free 512 block chunk to allocate
      small files.  Given that there are 32,768 blocks in a block group,
      after 64 iterations of "mount, write one 4k file in a directory,
      unmount", the block group will have 64 files, each separated by 511
      blocks, and the block group will no longer have any free 512
      completely free chunks of blocks for group preallocation space.
      
      So if we try to allocate blocks for a file that has been closed, such
      that we know the final size of the file, and the filesystem is not
      busy, avoid using group preallocation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      50797481
  6. 10 8月, 2009 2 次提交
    • T
      ext4: Fix bugs in mballoc's stream allocation mode · 4ba74d00
      Theodore Ts'o 提交于
      The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
      screwed up.  These fields were getting unconditionally all the time,
      set even when stream allocation had not taken place, and if they were
      being used when the file was smaller than s_mb_stream_request, which
      is when the allocation should _not_ be doing stream allocation.
      
      Fix this by determining whether or not we stream allocation should
      take place once, in ext4_mb_group_or_file(), and setting a flag which
      gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
      This simplifies the code and assures that we are consistently using
      (or not using) the stream allocation logic.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4ba74d00
    • T
      ext4: Display the mballoc flags in mb_history in hex instead of decimal · 0ef90db9
      Theodore Ts'o 提交于
      Displaying the flags in base 16 makes it easier to see which flags
      have been set.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0ef90db9
  7. 19 9月, 2009 1 次提交
  8. 11 8月, 2009 2 次提交
  9. 28 7月, 2009 1 次提交
  10. 06 7月, 2009 2 次提交
  11. 17 7月, 2009 1 次提交
    • C
      ext4: More buffer head reference leaks · 6487a9d3
      Curt Wohlgemuth 提交于
      After the patch I posted last week regarding buffer head ref leaks in
      no-journal mode, I looked at all the code that uses buffer heads and
      searched for more potential leaks.
      
      The patch below fixes the issues I found; these can occur even when a
      journal is present.
      
      The change to inode.c fixes a double release if
      ext4_journal_get_create_access() fails.
      
      The changes to namei.c are more complicated.  add_dirent_to_buf() will
      release the input buffer head EXCEPT when it returns -ENOSPC.  There are
      some callers of this routine that don't always do the brelse() in the event
      that -ENOSPC is returned.  Unfortunately, to put this fix into ext4_add_entry()
      required capturing the return value of make_indexed_dir() and
      add_dirent_to_buf().
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6487a9d3
  12. 28 7月, 2009 2 次提交
  13. 17 7月, 2009 1 次提交
  14. 14 9月, 2009 1 次提交
  15. 09 9月, 2009 1 次提交
  16. 13 7月, 2009 4 次提交
    • T
      ext4: Fix ext4_mb_initialize_context() to initialize all fields · 833576b3
      Theodore Ts'o 提交于
      Pavel Roskin pointed out that kmemcheck indicated that
      ext4_mb_store_history() was accessing uninitialized values of
      ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc
      history.  Fix this by initializing the entire structure to all zeros
      first.
      
      Also, two fields were getting doubly initialized by the caller of
      ext4_mb_initialize_context, so remove them for efficiency's sake.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      833576b3
    • P
      ext4: fix null handler of ioctls in no journal mode · ac046f1d
      Peng Tao 提交于
      The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not
      flush the journal in no_journal mode.  Otherwise, running resize2fs on
      a mounted no_journal partition triggers the following error messages:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000014
      IP: [<c039d282>] _spin_lock+0x8/0x19
      *pde = 00000000 
      Oops: 0002 [#1] SMP
      Signed-off-by: NPeng Tao <bergwolf@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ac046f1d
    • C
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth 提交于
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
    • A
      headers: smp_lock.h redux · 405f5571
      Alexey Dobriyan 提交于
      * Remove smp_lock.h from files which don't need it (including some headers!)
      * Add smp_lock.h to files which do need it
      * Make smp_lock.h include conditional in hardirq.h
        It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
      
        This will make hardirq.h inclusion cheaper for every PREEMPT=n config
        (which includes allmodconfig/allyesconfig, BTW)
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      405f5571
  17. 24 6月, 2009 2 次提交
  18. 19 6月, 2009 1 次提交
  19. 17 6月, 2009 1 次提交
    • T
      ext4: avoid unnecessary spinlock in critical POSIX ACL path · 210ad6ae
      Theodore Ts'o 提交于
      If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
      to do POSIX ACL checks on any files not owned by the caller, and it does
      this for every single pathname component that it looks up.
      
      That obviously can be pretty expensive if the filesystem isn't careful
      about it, especially with locking. That's doubly sad, since the common
      case tends to be that there are no ACL's associated with the files in
      question.
      
      ext4 already caches the ACL data so that it doesn't have to look it up
      over and over again, but it does so by taking the inode->i_lock spinlock
      on every lookup. Which is a noticeable overhead even if it's a private
      lock, especially on CPU's where the serialization is expensive (eg Intel
      Netburst aka 'P4').
      
      For the special case of not actually having any ACL's, all that locking is
      unnecessary. Even if somebody else were to be changing the ACL's on
      another CPU, we simply don't care - if we've seen a NULL ACL, we might as
      well use it.
      
      So just load the ACL speculatively without any locking, and if it was
      NULL, just use it. If it's non-NULL (either because we had a cached
      entry, or because the cache hasn't been filled in at all), it means that
      we'll need to get the lock and re-load it properly.
      
      (This commit was ported from a patch originally authored by Linus for
      ext3.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      210ad6ae
  20. 15 6月, 2009 6 次提交
  21. 13 6月, 2009 3 次提交