1. 25 4月, 2009 3 次提交
    • T
      ext4: Do not try to validate extents on special files · c4b5a614
      Theodore Ts'o 提交于
      The EXTENTS_FL flag should never be set on special files, but if it
      is, don't bother trying to validate that the extents tree is valid,
      since only files, directories, and non-fast symlinks will ever have an
      extent data structure.  We perhaps should flag the filesystem as being
      corrupted if we see a special file (named pipes, device nodes, Unix
      domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
      currently check this case, so we'll just ignore this for now, since
      it's harmless.
      
      Without this fix, a special device with the extents flag is flagged as
      an error by the kernel, so it is impossible to access or delete the
      inode, but e2fsck doesn't see it as a problem, leading to
      confused/frustrated users.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4b5a614
    • T
      ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is present · a9e81742
      Theodore Ts'o 提交于
      Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
      bit is set.  The field is normally zero, but older versions of e2fsck
      didn't automatically check to make sure of this, so in the spirit of
      "be liberal in what you accept", don't look at i_file_acl_high unless
      we are using a 64-bit filesystem.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a9e81742
    • T
      ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inode · 485c26ec
      Theodore Ts'o 提交于
      If the block containing external extended attributes (which is stored
      in i_file_acl and i_file_acl_high) is larger than the on-disk
      filesystem, the process which tried to access the extended attributes
      will endlessly issue kernel printks complaining that
      "__find_get_block_slow() failed", locking up that CPU until the system
      is forcibly rebooted.
      
      So when we read in the inode, make sure the i_file_acl value is legal,
      and if not, flag the filesystem as being corrupted.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      485c26ec
  2. 08 4月, 2009 1 次提交
  3. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  4. 31 3月, 2009 1 次提交
  5. 26 3月, 2009 3 次提交
  6. 17 3月, 2009 1 次提交
    • T
      ext4: Add auto_da_alloc mount option · afd4672d
      Theodore Ts'o 提交于
      Add a mount option which allows the user to disable automatic
      allocation of blocks whose allocation by delayed allocation when the
      file was originally truncated or when the file is renamed over an
      existing file.  This feature is intended to save users from the
      effects of naive application writers, but it reduces the effectiveness
      of the delayed allocation code.  This mount option disables this
      safety feature, which may be desirable for prodcutions systems where
      the risk of unclean shutdowns or unexpected system crashes is low.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      afd4672d
  7. 31 3月, 2009 1 次提交
  8. 28 3月, 2009 1 次提交
  9. 26 2月, 2009 1 次提交
  10. 24 2月, 2009 1 次提交
    • T
      ext4: Automatically allocate delay allocated blocks on close · 7d8f9f7d
      Theodore Ts'o 提交于
      When closing a file that had been previously truncated, force any
      delay allocated blocks that to be allocated so that if the filesystem
      is mounted with data=ordered, the data blocks will be pushed out to
      disk along with the journal commit.  Many application programs expect
      this, so we do this to avoid zero length files if the system crashes
      unexpectedly.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7d8f9f7d
  11. 26 2月, 2009 1 次提交
    • T
      ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl · ccd2506b
      Theodore Ts'o 提交于
      Add an ioctl which forces all of the delay allocated blocks to be
      allocated.  This also provides a function ext4_alloc_da_blocks() which
      will be used by the following commits to force files to be fully
      allocated to preserve application-expected ext3 behaviour.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ccd2506b
  12. 24 2月, 2009 1 次提交
  13. 23 2月, 2009 2 次提交
  14. 28 3月, 2009 1 次提交
  15. 13 3月, 2009 1 次提交
    • T
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o 提交于
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  16. 23 2月, 2009 1 次提交
  17. 14 2月, 2009 1 次提交
  18. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  19. 30 1月, 2009 1 次提交
    • T
      ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7
      Theodore Ts'o 提交于
      The code to support journal-less ext4 operation added a BUG to
      ext4_bmap() which fired if there was no journal and the
      EXT4_STATE_JDATA bit was set in the i_state field.  This caused
      running the filefrag program (which uses the FIMBAP ioctl) to trigger
      a BUG().
      
      The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
      harmless for the bit to be set.  We could add a check in
      __ext4_journalled_writepage() and ext4_journalled_write_end() to only
      set the EXT4_STATE_JDATA bit if the journal is present, but that adds
      an extra test and jump instruction.  It's easier to simply remove the
      BUG check.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12568Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      b9ec63f7
  20. 20 1月, 2009 1 次提交
  21. 18 1月, 2009 1 次提交
    • T
      ext4: only use i_size_high for regular files · 06a279d6
      Theodore Ts'o 提交于
      Directories are not allowed to be bigger than 2GB, so don't use
      i_size_high for anything other than regular files.  E2fsck should
      complain about these inodes, but the simplest thing to do for the
      kernel is to only use i_size_high for regular files.
      
      This prevents an intentially corrupted filesystem from causing the
      kernel to burn a huge amount of CPU and issuing error messages such
      as:
      
      EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max
      
      Thanks to David Maciejak from Fortinet's FortiGuard Global Security
      Research Team for reporting this issue.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12375Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      06a279d6
  22. 07 1月, 2009 1 次提交
  23. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  24. 04 1月, 2009 1 次提交
  25. 06 1月, 2009 1 次提交
  26. 01 1月, 2009 1 次提交
  27. 07 11月, 2008 1 次提交
  28. 23 11月, 2008 1 次提交
  29. 07 11月, 2008 1 次提交
    • T
      ext4: calculate journal credits correctly · ac51d837
      Theodore Ts'o 提交于
      This fixes a 2.6.27 regression which was introduced in commit a02908f1.
      
      We weren't passing the chunk parameter down to the two subections,
      ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
      the result that massively overestimate the amount of credits needed by
      ext4_da_writepages, especially in the non-extents case.  This causes
      failures especially on /boot partitions, which tend to be small and
      non-extent using since GRUB doesn't handle extents.
      
      This patch fixes the bug reported by Joseph Fannin at:
      http://bugzilla.kernel.org/show_bug.cgi?id=11964Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ac51d837
  30. 05 11月, 2008 1 次提交
    • T
      ext4: Change unsigned long to unsigned int · 498e5f24
      Theodore Ts'o 提交于
      Convert the unsigned longs that are most responsible for bloating the
      stack usage on 64-bit systems.
      
      Nearly all places in the ext3/4 code which uses "unsigned long" is
      probably a bug, since on 32-bit systems a ulong a 32-bits, which means
      we are wasting stack space on 64-bit systems.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      498e5f24
  31. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  32. 06 1月, 2009 1 次提交
  33. 05 11月, 2008 1 次提交
    • T
      ext4: tone down ext4_da_writepages warnings · 2a21e37e
      Theodore Ts'o 提交于
      If the filesystem has errors, ext4_da_writepages() will return a *lot*
      of errors, including lots and lots of stack dumps.  While it's true
      that we are dropping user data on the floor, which is unfortunate, the
      stack dumps aren't helpful, and they tend to obscure the true original
      root cause of the problem.  So in the case where the filesystem has
      aborted, return an EROFS right away.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a21e37e
  34. 02 1月, 2009 1 次提交
  35. 30 10月, 2008 1 次提交