1. 09 9月, 2009 1 次提交
  2. 24 8月, 2009 2 次提交
    • J
      ext3: Improve error message that changing journaling mode on remount is not possible · 3c4cec65
      Jan Kara 提交于
      This patch makes the error message about changing journaling mode on remount
      more descriptive. Some people are going to hit this error now due to commit
      bbae8bcc if they configure a kernel to default
      to data=writeback mode. The problem happens if they have data=ordered set for
      the root filesystem in /etc/fstab but not in the kernel command line (and they
      don't use initrd). Their filesystem then gets mounted as data=writeback by
      kernel but then their boot fails because init scripts won't be able to remount
      the filesystem rw. Better error message will hopefully make it easier for them
      to find the error in their setup and bother us less with error reports :).
      Signed-off-by: NJan Kara <jack@suse.cz>
      3c4cec65
    • T
      ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED · 6d418076
      Theodore Ts'o 提交于
      The old description for this configuration option was perhaps not
      completely balanced in terms of describing the tradeoffs of using a
      default of data=writeback vs. data=ordered.  Despite the fact that old
      description very strongly recomended disabling this feature, all of
      the major distributions have elected to preserve the existing 'legacy'
      default, which is a strong hint that it perhaps wasn't telling the
      whole story.
      
      This revised description has been vetted by a number of ext3
      developers as being better at informing the user about the tradeoffs
      of enabling or disabling this configuration feature.
      
      Cc: linux-ext4@vger.kernel.org
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6d418076
  3. 16 7月, 2009 2 次提交
    • J
      ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() · 43237b54
      Jan Kara 提交于
      Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
      be a relict from some old days and setting disksize in this function does not
      make much sence. Currently it was set only by ext3_getblk().  Since the
      parameter has some effect only if create == 1, it is easy to check that the
      three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
      ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      43237b54
    • J
      ext3: Fix truncation of symlinks after failed write · 9eaaa2d5
      Jan Kara 提交于
      Contents of long symlinks is written via standard write methods. So when the
      write fails, we add inode to orphan list. But symlinks don't have .truncate
      method defined so nobody properly removes them from the orphan list (both on
      disk and in memory).
      
      Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
      (which is saner anyway since we don't need anything vmtruncate() does except
      from calling .truncate in these paths).  We also add inode to orphan list only
      if ext3_can_truncate() is true (currently, it can be false for symlinks when
      there are no blocks allocated) - otherwise orphan list processing will complain
      and ext3_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: NJan Kara <jack@suse.cz>
      9eaaa2d5
  4. 24 6月, 2009 2 次提交
  5. 19 6月, 2009 3 次提交
  6. 17 6月, 2009 1 次提交
    • L
      ext3: avoid unnecessary spinlock in critical POSIX ACL path · 9c64daff
      Linus Torvalds 提交于
      If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
      to do POSIX ACL checks on any files not owned by the caller, and it does
      this for every single pathname component that it looks up.
      
      That obviously can be pretty expensive if the filesystem isn't careful
      about it, especially with locking. That's doubly sad, since the common
      case tends to be that there are no ACL's associated with the files in
      question.
      
      ext3 already caches the ACL data so that it doesn't have to look it up
      over and over again, but it does so by taking the inode->i_lock spinlock
      on every lookup. Which is a noticeable overhead even if it's a private
      lock, especially on CPU's where the serialization is expensive (eg Intel
      Netburst aka 'P4').
      
      For the special case of not actually having any ACL's, all that locking is
      unnecessary. Even if somebody else were to be changing the ACL's on
      another CPU, we simply don't care - if we've seen a NULL ACL, we might as
      well use it.
      
      So just load the ACL speculatively without any locking, and if it was
      NULL, just use it. If it's non-NULL (either because we had a cached
      entry, or because the cache hasn't been filled in at all), it means that
      we'll need to get the lock and re-load it properly.
      
      This is noticeable even on Nehalem, which does locking quite well (much
      better than P4). From lmbench:
      
      	Processor, Processes - times in microseconds - smaller is better
      	--------------------------------------------------------------------
      	Host                 OS  Mhz null null      open slct fork exec sh
      	                             call  I/O stat clos TCP  proc proc proc
      	--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
       - before:
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
      	nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
       - after:
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148
      
      where you can see what appears to be a roughly 3% improvement in stat
      and open/close latencies from just the removal of the locking overhead.
      
      Of course, this only matters for files you don't own (the owner never
      needs to do the ACL checks), but that's the common case for libraries,
      header files, and executables. As well as for the base components of any
      absolute pathname, even if you are the owner of the final file.
      
      [ At some point we probably want to move this ACL caching logic entirely
        into the VFS layer (and only call down to the filesystem when
        uncached), but in the meantime this improves ext3 a bit.
      
        A similar fix to btrfs makes a much bigger difference (15x improvement
        in lmbench) due to broken caching. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c64daff
  7. 12 6月, 2009 4 次提交
  8. 23 5月, 2009 1 次提交
  9. 18 5月, 2009 1 次提交
  10. 28 4月, 2009 1 次提交
    • L
      ext3: avoid unnecessary spinlock in critical POSIX ACL path · 96159f25
      Linus Torvalds 提交于
      If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem 
      to do POSIX ACL checks on any files not owned by the caller, and it does 
      this for every single pathname component that it looks up.
      
      That obviously can be pretty expensive if the filesystem isn't careful 
      about it, especially with locking. That's doubly sad, since the common 
      case tends to be that there are no ACL's associated with the files in 
      question.
      
      ext3 already caches the ACL data so that it doesn't have to look it up 
      over and over again, but it does so by taking the inode->i_lock spinlock 
      on every lookup. Which is a noticeable overhead even if it's a private 
      lock, especially on CPU's where the serialization is expensive (eg Intel 
      Netburst aka 'P4').
      
      For the special case of not actually having any ACL's, all that locking is 
      unnecessary. Even if somebody else were to be changing the ACL's on 
      another CPU, we simply don't care - if we've seen a NULL ACL, we might as 
      well use it.
      
      So just load the ACL speculatively without any locking, and if it was 
      NULL, just use it. If it's non-NULL (either because we had a cached 
      entry, or because the cache hasn't been filled in at all), it means that 
      we'll need to get the lock and re-load it properly.
      
      This is noticeable even on Nehalem, which does locking quite well (much 
      better than P4). From lmbench:
      
      	Processor, Processes - times in microseconds - smaller is better
      	--------------------------------------------------------------------
      	Host                 OS  Mhz null null      open slct fork exec sh  
      	                             call  I/O stat clos TCP  proc proc proc
      	--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
       - before:
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
      	nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
       - after:
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
      	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148
      
      where you can see what appears to be a roughly 3% improvement in stat
      and open/close latencies from just the removal of the locking overhead. 
      
      Of course, this only matters for files you don't own (the owner never 
      needs to do the ACL checks), but that's the common case for libraries, 
      header files, and executables. As well as for the base components of any 
      absolute pathname, even if you are the owner of the final file.
      
      [ At some point we probably want to move this ACL caching logic entirely
        into the VFS layer (and only call down to the filesystem when
        uncached), but in the meantime this improves ext3 a bit.
      
        A similar fix to btrfs makes a much bigger difference (15x improvement
        in lmbench) due to broken caching. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      96159f25
  11. 09 4月, 2009 1 次提交
  12. 07 4月, 2009 1 次提交
  13. 03 4月, 2009 6 次提交
  14. 01 4月, 2009 1 次提交
  15. 27 3月, 2009 1 次提交
  16. 26 3月, 2009 2 次提交
  17. 12 2月, 2009 1 次提交
  18. 17 1月, 2009 1 次提交
  19. 10 1月, 2009 1 次提交
    • T
      filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d
      Takashi Sato 提交于
      Currently, ext3 in mainline Linux doesn't have the freeze feature which
      suspends write requests.  So, we cannot take a backup which keeps the
      filesystem's consistency with the storage device's features (snapshot and
      replication) while it is mounted.
      
      In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
      and it would be used to get the consistent backup.
      
      If Linux's standard filesystem ext3 has the freeze feature, we can do it
      without a commercial filesystem.
      
      So I have implemented the ioctls of the freeze feature.
      I think we can take the consistent backup with the following steps.
      1. Freeze the filesystem with the freeze ioctl.
      2. Separate the replication volume or create the snapshot
         with the storage device's feature.
      3. Unfreeze the filesystem with the unfreeze ioctl.
      4. Take the backup from the separated replication volume
         or the snapshot.
      
      This patch:
      
      VFS:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they can return an error.
      Rename write_super_lockfs and unlockfs of the super block operation
      freeze_fs and unfreeze_fs to avoid a confusion.
      
      ext3, ext4, xfs, gfs2, jfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that write_super_lockfs returns an error if needed,
      and unlockfs always returns 0.
      
      reiserfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they always return 0 (success) to keep a current behavior.
      Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4be0c1d
  20. 09 1月, 2009 4 次提交
  21. 06 1月, 2009 2 次提交
  22. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c