1. 14 5月, 2009 2 次提交
  2. 12 5月, 2009 2 次提交
  3. 01 5月, 2009 1 次提交
    • T
      ext4: Avoid races caused by on-line resizing and SMP memory reordering · 8df9675f
      Theodore Ts'o 提交于
      Ext4's on-line resizing adds a new block group and then, only at the
      last step adjusts s_groups_count.  However, it's possible on SMP
      systems that another CPU could see the updated the s_group_count and
      not see the newly initialized data structures for the just-added block
      group.  For this reason, it's important to insert a SMP read barrier
      after reading s_groups_count and before reading any (for example) the
      new block group descriptors allowed by the increased value of
      s_groups_count.
      
      Unfortunately, we rather blatently violate this locking protocol
      documented in fs/ext4/resize.c.  Fortunately, (1) on-line resizes
      happen relatively rarely, and (2) it seems rare that the filesystem
      code will immediately try to use just-added block group before any
      memory ordering issues resolve themselves.  So apparently problems
      here are relatively hard to hit, since ext3 has been vulnerable to the
      same issue for years with no one apparently complaining.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8df9675f
  4. 13 5月, 2009 1 次提交
    • A
      ext4: Mark the unwritten buffer_head as mapped during write_begin · 29fa89d0
      Aneesh Kumar K.V 提交于
      Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
      (unnecessary) calls to get_block() during the call to the write(2)
      system call.  Setting BH_Unwritten buffer heads as BH_Mapped requires
      that the writepages() functions can handle BH_Unwritten buffer_heads.
      
      After this commit, things work as follows:
      
      ext4_ext_get_block() returns unmapped, unwritten, buffer head when
      called with create = 0 for prealloc space. This makes sure we handle
      the read path and non-delayed allocation case correctly.  Even though
      the buffer head is marked unmapped we have valid b_blocknr and b_bdev
      values in the buffer_head.
      
      ext4_da_get_block_prep() called for block resrevation will now return
      mapped, unwritten, new buffer_head for prealloc space. This avoids
      multiple calls to get_block() for write to same offset. By making such
      buffers as BH_New, we also assure that sub-block zeroing of buffered
      writes happens correctly.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      29fa89d0
  5. 14 5月, 2009 1 次提交
  6. 15 5月, 2009 1 次提交
    • A
      ext4: Clear the unwritten buffer_head flag after the extent is initialized · 2a8964d6
      Aneesh Kumar K.V 提交于
      The BH_Unwritten flag indicates that the buffer is allocated on disk
      but has not been written; that is, the disk was part of a persistent
      preallocation area.  That flag should only be set when a get_blocks()
      function is looking up a inode's logical to physical block mapping.
      
      When ext4_get_blocks_wrap() is called with create=1, the uninitialized
      extent is converted into an initialized one, so the BH_Unwritten flag
      is no longer appropriate.  Hence, we need to make sure the
      BH_Unwritten is not left set, since the combination of BH_Mapped and
      BH_Unwritten is not allowed; among other things, it will result ext4's
      get_block() to be called over and over again during the write_begin
      phase of write(2).
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a8964d6
  7. 13 5月, 2009 1 次提交
  8. 14 5月, 2009 1 次提交
  9. 25 4月, 2009 3 次提交
    • T
      ext4: Do not try to validate extents on special files · c4b5a614
      Theodore Ts'o 提交于
      The EXTENTS_FL flag should never be set on special files, but if it
      is, don't bother trying to validate that the extents tree is valid,
      since only files, directories, and non-fast symlinks will ever have an
      extent data structure.  We perhaps should flag the filesystem as being
      corrupted if we see a special file (named pipes, device nodes, Unix
      domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
      currently check this case, so we'll just ignore this for now, since
      it's harmless.
      
      Without this fix, a special device with the extents flag is flagged as
      an error by the kernel, so it is impossible to access or delete the
      inode, but e2fsck doesn't see it as a problem, leading to
      confused/frustrated users.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4b5a614
    • T
      ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is present · a9e81742
      Theodore Ts'o 提交于
      Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
      bit is set.  The field is normally zero, but older versions of e2fsck
      didn't automatically check to make sure of this, so in the spirit of
      "be liberal in what you accept", don't look at i_file_acl_high unless
      we are using a 64-bit filesystem.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a9e81742
    • T
      ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inode · 485c26ec
      Theodore Ts'o 提交于
      If the block containing external extended attributes (which is stored
      in i_file_acl and i_file_acl_high) is larger than the on-disk
      filesystem, the process which tried to access the extended attributes
      will endlessly issue kernel printks complaining that
      "__find_get_block_slow() failed", locking up that CPU until the system
      is forcibly rebooted.
      
      So when we read in the inode, make sure the i_file_acl value is legal,
      and if not, flag the filesystem as being corrupted.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      485c26ec
  10. 08 4月, 2009 1 次提交
  11. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  12. 31 3月, 2009 1 次提交
  13. 26 3月, 2009 3 次提交
  14. 17 3月, 2009 1 次提交
    • T
      ext4: Add auto_da_alloc mount option · afd4672d
      Theodore Ts'o 提交于
      Add a mount option which allows the user to disable automatic
      allocation of blocks whose allocation by delayed allocation when the
      file was originally truncated or when the file is renamed over an
      existing file.  This feature is intended to save users from the
      effects of naive application writers, but it reduces the effectiveness
      of the delayed allocation code.  This mount option disables this
      safety feature, which may be desirable for prodcutions systems where
      the risk of unclean shutdowns or unexpected system crashes is low.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      afd4672d
  15. 31 3月, 2009 1 次提交
  16. 28 3月, 2009 1 次提交
  17. 26 2月, 2009 1 次提交
  18. 24 2月, 2009 1 次提交
    • T
      ext4: Automatically allocate delay allocated blocks on close · 7d8f9f7d
      Theodore Ts'o 提交于
      When closing a file that had been previously truncated, force any
      delay allocated blocks that to be allocated so that if the filesystem
      is mounted with data=ordered, the data blocks will be pushed out to
      disk along with the journal commit.  Many application programs expect
      this, so we do this to avoid zero length files if the system crashes
      unexpectedly.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7d8f9f7d
  19. 26 2月, 2009 1 次提交
    • T
      ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl · ccd2506b
      Theodore Ts'o 提交于
      Add an ioctl which forces all of the delay allocated blocks to be
      allocated.  This also provides a function ext4_alloc_da_blocks() which
      will be used by the following commits to force files to be fully
      allocated to preserve application-expected ext3 behaviour.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ccd2506b
  20. 24 2月, 2009 1 次提交
  21. 23 2月, 2009 2 次提交
  22. 28 3月, 2009 1 次提交
  23. 13 3月, 2009 1 次提交
    • T
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o 提交于
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  24. 23 2月, 2009 1 次提交
  25. 14 2月, 2009 1 次提交
  26. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  27. 30 1月, 2009 1 次提交
    • T
      ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7
      Theodore Ts'o 提交于
      The code to support journal-less ext4 operation added a BUG to
      ext4_bmap() which fired if there was no journal and the
      EXT4_STATE_JDATA bit was set in the i_state field.  This caused
      running the filefrag program (which uses the FIMBAP ioctl) to trigger
      a BUG().
      
      The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
      harmless for the bit to be set.  We could add a check in
      __ext4_journalled_writepage() and ext4_journalled_write_end() to only
      set the EXT4_STATE_JDATA bit if the journal is present, but that adds
      an extra test and jump instruction.  It's easier to simply remove the
      BUG check.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12568Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      b9ec63f7
  28. 20 1月, 2009 1 次提交
  29. 18 1月, 2009 1 次提交
    • T
      ext4: only use i_size_high for regular files · 06a279d6
      Theodore Ts'o 提交于
      Directories are not allowed to be bigger than 2GB, so don't use
      i_size_high for anything other than regular files.  E2fsck should
      complain about these inodes, but the simplest thing to do for the
      kernel is to only use i_size_high for regular files.
      
      This prevents an intentially corrupted filesystem from causing the
      kernel to burn a huge amount of CPU and issuing error messages such
      as:
      
      EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max
      
      Thanks to David Maciejak from Fortinet's FortiGuard Global Security
      Research Team for reporting this issue.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12375Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      06a279d6
  30. 07 1月, 2009 1 次提交
  31. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  32. 04 1月, 2009 1 次提交
  33. 06 1月, 2009 1 次提交