1. 05 3月, 2009 1 次提交
  2. 31 3月, 2009 2 次提交
  3. 01 3月, 2009 1 次提交
  4. 28 3月, 2009 1 次提交
  5. 24 2月, 2009 2 次提交
    • T
      ext4: Automatically allocate delay allocated blocks on rename · 8750c6d5
      Theodore Ts'o 提交于
      When renaming a file such that a link to another inode is overwritten,
      force any delay allocated blocks that to be allocated so that if the
      filesystem is mounted with data=ordered, the data blocks will be
      pushed out to disk along with the journal commit.  Many application
      programs expect this, so we do this to avoid zero length files if the
      system crashes unexpectedly.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8750c6d5
    • T
      ext4: Automatically allocate delay allocated blocks on close · 7d8f9f7d
      Theodore Ts'o 提交于
      When closing a file that had been previously truncated, force any
      delay allocated blocks that to be allocated so that if the filesystem
      is mounted with data=ordered, the data blocks will be pushed out to
      disk along with the journal commit.  Many application programs expect
      this, so we do this to avoid zero length files if the system crashes
      unexpectedly.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7d8f9f7d
  6. 26 2月, 2009 1 次提交
    • T
      ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl · ccd2506b
      Theodore Ts'o 提交于
      Add an ioctl which forces all of the delay allocated blocks to be
      allocated.  This also provides a function ext4_alloc_da_blocks() which
      will be used by the following commits to force files to be fully
      allocated to preserve application-expected ext3 behaviour.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ccd2506b
  7. 24 2月, 2009 1 次提交
  8. 23 2月, 2009 2 次提交
  9. 28 3月, 2009 1 次提交
  10. 12 3月, 2009 1 次提交
  11. 23 2月, 2009 1 次提交
    • B
      ext4: return -EIO not -ESTALE on directory traversal through deleted inode · e6f009b0
      Bryan Donlan 提交于
      ext4_iget() returns -ESTALE if invoked on a deleted inode, in order to
      report errors to NFS properly.  However, in ext4_lookup(), this
      -ESTALE can be propagated to userspace if the filesystem is corrupted
      such that a directory entry references a deleted inode.  This leads to
      a misleading error message - "Stale NFS file handle" - and confusion
      on the part of the admin.
      
      The bug can be easily reproduced by creating a new filesystem, making
      a link to an unused inode using debugfs, then mounting and attempting
      to ls -l said link.
      
      This patch thus changes ext4_lookup to return -EIO if it receives
      -ESTALE from ext4_iget(), as ext4 does for other filesystem metadata
      corruption; and also invokes the appropriate ext*_error functions when
      this case is detected.
      Signed-off-by: NBryan Donlan <bdonlan@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e6f009b0
  12. 13 3月, 2009 1 次提交
    • T
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o 提交于
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  13. 16 2月, 2009 3 次提交
  14. 15 2月, 2009 2 次提交
  15. 07 2月, 2009 2 次提交
  16. 26 3月, 2009 3 次提交
  17. 17 3月, 2009 1 次提交
    • E
      ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976
      Eric Sandeen 提交于
      This is for Red Hat bug 490026: EXT4 panic, list corruption in
      ext4_mb_new_inode_pa
      
      ext4_lock_group(sb, group) is supposed to protect this list for
      each group, and a common code flow to remove an album is like
      this:
      
          ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
          ext4_lock_group(sb, grp);
          list_del(&pa->pa_group_list);
          ext4_unlock_group(sb, grp);
      
      so it's critical that we get the right group number back for
      this prealloc context, to lock the right group (the one 
      associated with this pa) and prevent concurrent list manipulation.
      
      however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
      comment, "-1 is to protect from crossing allocation group".
      
      This makes sense for the group_pa, where pa_pstart is advanced
      by the length which has been used (in ext4_mb_release_context()),
      and when the entire length has been used, pa_pstart has been
      advanced to the first block of the next group.
      
      However, for inode_pa, pa_pstart is never advanced; it's just
      set once to the first block in the group and not moved after
      that.  So in this case, if we subtract one in ext4_mb_put_pa(),
      we are actually locking the *previous* group, and opening the
      race with the other threads which do not subtract off the extra
      block.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d33a1976
  18. 14 3月, 2009 1 次提交
    • E
      ext4: fix bogus BUG_ONs in in mballoc code · 8d03c7a0
      Eric Sandeen 提交于
      Thiemo Nagel reported that:
      
      # dd if=/dev/zero of=image.ext4 bs=1M count=2
      # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \
        -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4
      # mount -o loop image.ext4 mnt/
      # dd if=/dev/zero of=mnt/file
      
      oopsed, with a BUG_ON in ext4_mb_normalize_request because
      size == EXT4_BLOCKS_PER_GROUP
      
      It appears to me (esp. after talking to Andreas) that the BUG_ON
      is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should
      be allowed, though larger sizes do indicate a problem.
      
      Fix that an another (apparently rare) codepath with a similar check.
      Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8d03c7a0
  19. 13 3月, 2009 1 次提交
  20. 11 3月, 2009 1 次提交
    • E
      ext4: fix header check in ext4_ext_search_right() for deep extent trees. · 395a87bf
      Eric Sandeen 提交于
      The ext4_ext_search_right() function is confusing; it uses a
      "depth" variable which is 0 at the root and maximum at the leaves, 
      but the on-disk metadata uses a "depth" (actually eh_depth) which
      is opposite: maximum at the root, and 0 at the leaves.
      
      The ext4_ext_check_header() function is given a depth and checks
      the header agaisnt that depth; it expects the on-disk semantics,
      but we are giving it the opposite in the while loop in this 
      function.  We should be giving it the on-disk notion of "depth"
      which we can get from (p_depth - depth) - and if you look, the last
      (more commonly hit) call to ext4_ext_check_header() does just this.
      
      Sending in the wrong depth results in (incorrect) messages
      about corruption:
      
      EXT4-fs error (device sdb1): ext4_ext_search_right: bad header
      in inode #2621457: unexpected eh_depth - magic f30a, entries 340,
      max 340(0), depth 1(2)
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12821Reported-by: NDavid Dindorp <ddi@dubex.dk>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      395a87bf
  21. 05 3月, 2009 1 次提交
    • E
      ext4: fix ext4_free_inode() vs. ext4_claim_inode() race · 7ce9d5d1
      Eric Sandeen 提交于
      I was seeing fsck errors on inode bitmaps after a 4 thread
      dbench run on a 4 cpu machine:
      
      Inode bitmap differences: -50736 -(50752--50753) etc...
      
      I believe that this is because ext4_free_inode() uses atomic
      bitops, and although ext4_new_inode() *used* to also use atomic 
      bitops for synchronization, commit 
      39341867 changed this to use
      the sb_bgl_lock, so that we could also synchronize against
      read_inode_bitmap and initialization of uninit inode tables.
      
      However, that change left ext4_free_inode using atomic bitops,
      which I think leaves no synchronization between setting & 
      unsetting bits in the inode table.
      
      The below patch fixes it for me, although I wonder if we're 
      getting at all heavy-handed with this spinlock...
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7ce9d5d1
  22. 26 2月, 2009 1 次提交
  23. 28 2月, 2009 1 次提交
  24. 23 2月, 2009 1 次提交
  25. 22 2月, 2009 1 次提交
    • T
      ext4: Add fallback for find_group_flex · 05bf9e83
      Theodore Ts'o 提交于
      This is a workaround for find_group_flex() which badly needs to be
      replaced.  One of its problems (besides ignoring the Orlov algorithm)
      is that it is a bit hyperactive about returning failure under
      suspicious circumstances.  This can lead to spurious ENOSPC failures
      even when there are inodes still available.
      
      Work around this for now by retrying the search using
      find_group_other() if find_group_flex() returns -1.  If
      find_group_other() succeeds when find_group_flex() has failed, log a
      warning message.
      
      A better block/inode allocator that will fix this problem for real has
      been queued up for the next merge window.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      05bf9e83
  26. 16 2月, 2009 1 次提交
  27. 14 2月, 2009 2 次提交
  28. 11 2月, 2009 1 次提交
  29. 10 2月, 2009 1 次提交
    • W
      ext4: Fix to read empty directory blocks correctly in 64k · 7be2baaa
      Wei Yongjun 提交于
      The rec_len field in the directory entry is 16 bits, so there was a
      problem representing rec_len for filesystems with a 64k block size in
      the case where the directory entry takes the entire 64k block.
      Unfortunately, there were two schemes that were proposed; one where
      all zeros meant 65536 and one where all ones (65535) meant 65536.
      E2fsprogs used 0, whereas the kernel used 65535.  Oops.  Fortunately
      this case happens extremely rarely, with the most common case being
      the lost+found directory, created by mke2fs.
      
      So we will be liberal in what we accept, and accept both encodings,
      but we will continue to encode 65536 as 65535.  This will require a
      change in e2fsprogs, but with fortunately ext4 filesystems normally
      have the dir_index feature enabled, which precludes having a
      completely empty directory block.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7be2baaa
  30. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215