1. 10 2月, 2009 1 次提交
    • W
      ext4: Fix to read empty directory blocks correctly in 64k · 7be2baaa
      Wei Yongjun 提交于
      The rec_len field in the directory entry is 16 bits, so there was a
      problem representing rec_len for filesystems with a 64k block size in
      the case where the directory entry takes the entire 64k block.
      Unfortunately, there were two schemes that were proposed; one where
      all zeros meant 65536 and one where all ones (65535) meant 65536.
      E2fsprogs used 0, whereas the kernel used 65535.  Oops.  Fortunately
      this case happens extremely rarely, with the most common case being
      the lost+found directory, created by mke2fs.
      
      So we will be liberal in what we accept, and accept both encodings,
      but we will continue to encode 65536 as 65535.  This will require a
      change in e2fsprogs, but with fortunately ext4 filesystems normally
      have the dir_index feature enabled, which precludes having a
      completely empty directory block.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7be2baaa
  2. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  3. 10 2月, 2009 1 次提交
  4. 30 1月, 2009 1 次提交
    • T
      ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7
      Theodore Ts'o 提交于
      The code to support journal-less ext4 operation added a BUG to
      ext4_bmap() which fired if there was no journal and the
      EXT4_STATE_JDATA bit was set in the i_state field.  This caused
      running the filefrag program (which uses the FIMBAP ioctl) to trigger
      a BUG().
      
      The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
      harmless for the bit to be set.  We could add a check in
      __ext4_journalled_writepage() and ext4_journalled_write_end() to only
      set the EXT4_STATE_JDATA bit if the journal is present, but that adds
      an extra test and jump instruction.  It's easier to simply remove the
      BUG check.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12568Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      b9ec63f7
  5. 27 1月, 2009 2 次提交
  6. 20 1月, 2009 1 次提交
  7. 17 1月, 2009 1 次提交
  8. 18 1月, 2009 1 次提交
    • T
      ext4: only use i_size_high for regular files · 06a279d6
      Theodore Ts'o 提交于
      Directories are not allowed to be bigger than 2GB, so don't use
      i_size_high for anything other than regular files.  E2fsck should
      complain about these inodes, but the simplest thing to do for the
      kernel is to only use i_size_high for regular files.
      
      This prevents an intentially corrupted filesystem from causing the
      kernel to burn a huge amount of CPU and issuing error messages such
      as:
      
      EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max
      
      Thanks to David Maciejak from Fortinet's FortiGuard Global Security
      Research Team for reporting this issue.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12375Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      06a279d6
  9. 10 1月, 2009 1 次提交
    • T
      filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d
      Takashi Sato 提交于
      Currently, ext3 in mainline Linux doesn't have the freeze feature which
      suspends write requests.  So, we cannot take a backup which keeps the
      filesystem's consistency with the storage device's features (snapshot and
      replication) while it is mounted.
      
      In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
      and it would be used to get the consistent backup.
      
      If Linux's standard filesystem ext3 has the freeze feature, we can do it
      without a commercial filesystem.
      
      So I have implemented the ioctls of the freeze feature.
      I think we can take the consistent backup with the following steps.
      1. Freeze the filesystem with the freeze ioctl.
      2. Separate the replication volume or create the snapshot
         with the storage device's feature.
      3. Unfreeze the filesystem with the unfreeze ioctl.
      4. Take the backup from the separated replication volume
         or the snapshot.
      
      This patch:
      
      VFS:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they can return an error.
      Rename write_super_lockfs and unlockfs of the super block operation
      freeze_fs and unfreeze_fs to avoid a confusion.
      
      ext3, ext4, xfs, gfs2, jfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that write_super_lockfs returns an error if needed,
      and unlockfs always returns 0.
      
      reiserfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they always return 0 (success) to keep a current behavior.
      Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4be0c1d
  10. 09 1月, 2009 2 次提交
  11. 07 1月, 2009 2 次提交
  12. 06 1月, 2009 1 次提交
  13. 07 1月, 2009 1 次提交
  14. 06 1月, 2009 3 次提交
  15. 05 1月, 2009 2 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
    • P
      fs: introduce bgl_lock_ptr() · c644f0e4
      Pekka Enberg 提交于
      As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
      <linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
      filesystem specific header files to break the hidden dependency to
      struct ext[234]_sb_info.
      
      Also, while at it, convert the macros to static inlines to try make up
      for all the times I broke Andrew Morton's tree.
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c644f0e4
  16. 04 1月, 2009 1 次提交
  17. 07 1月, 2009 1 次提交
  18. 06 1月, 2009 7 次提交
  19. 04 1月, 2009 1 次提交
  20. 06 1月, 2009 3 次提交
    • A
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V 提交于
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • A
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V 提交于
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • A
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V 提交于
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  21. 01 1月, 2009 2 次提交
  22. 29 12月, 2008 1 次提交
    • J
      Get rid of CONFIG_LSF · b3a6ffe1
      Jens Axboe 提交于
      We have two seperate config entries for large devices/files. One
      is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
      that handles large files. This doesn't make a lot of sense, you typically
      want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
      to indicate that it covers both.
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b3a6ffe1
  23. 11 12月, 2008 2 次提交
    • A
      revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168
      Andrew Morton 提交于
      Revert
      
          commit e8ced39d
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Fri Jul 11 19:27:31 2008 -0400
      
              percpu_counter: new function percpu_counter_sum_and_set
      
      As described in
      
      	revert "percpu counter: clean up percpu_counter_sum_and_set()"
      
      the new percpu_counter_sum_and_set() is racy against updates to the
      cpu-local accumulators on other CPUs.  Revert that change.
      
      This means that ext4 will be slow again.  But correct.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02d21168
    • A
      revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f
      Andrew Morton 提交于
      Revert
      
          commit 1f7c14c6
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Thu Oct 9 12:50:59 2008 -0400
      
              percpu counter: clean up percpu_counter_sum_and_set()
      
      Before this patch we had the following:
      
      percpu_counter_sum(): return the percpu_counter's value
      
      percpu_counter_sum_and_set(): return the percpu_counter's value, copying
      that value into the central value and zeroing the per-cpu counters before
      returning.
      
      After this patch, percpu_counter_sum_and_set() has gone, and
      percpu_counter_sum() gets the old percpu_counter_sum_and_set()
      functionality.
      
      Problem is, as Eric points out, the old percpu_counter_sum_and_set()
      functionality was racy and wrong.  It zeroes out counters on "other" cpus,
      without holding any locks which will prevent races agaist updates from
      those other CPUS.
      
      This patch reverts 1f7c14c6.  This means
      that percpu_counter_sum_and_set() still has the race, but
      percpu_counter_sum() does not.
      
      Note that this is not a simple revert - ext4 has since started using
      percpu_counter_sum() for its dirty_blocks counter as well.
      
      Note that this revert patch changes percpu_counter_sum() semantics.
      
      Before the patch, a call to percpu_counter_sum() will bring the counter's
      central counter mostly up-to-date, so a following percpu_counter_read()
      will return a close value.
      
      After this patch, a call to percpu_counter_sum() will leave the counter's
      central accumulator unaltered, so a subsequent call to
      percpu_counter_read() can now return a significantly inaccurate result.
      
      If there is any code in the tree which was introduced after
      e8ced39d was merged, and which depends
      upon the new percpu_counter_sum() semantics, that code will break.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71c5576f
  24. 24 11月, 2008 1 次提交
    • A
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V 提交于
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b7be019e