1. 17 6月, 2009 1 次提交
    • T
      ext4: avoid unnecessary spinlock in critical POSIX ACL path · 210ad6ae
      Theodore Ts'o 提交于
      If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
      to do POSIX ACL checks on any files not owned by the caller, and it does
      this for every single pathname component that it looks up.
      
      That obviously can be pretty expensive if the filesystem isn't careful
      about it, especially with locking. That's doubly sad, since the common
      case tends to be that there are no ACL's associated with the files in
      question.
      
      ext4 already caches the ACL data so that it doesn't have to look it up
      over and over again, but it does so by taking the inode->i_lock spinlock
      on every lookup. Which is a noticeable overhead even if it's a private
      lock, especially on CPU's where the serialization is expensive (eg Intel
      Netburst aka 'P4').
      
      For the special case of not actually having any ACL's, all that locking is
      unnecessary. Even if somebody else were to be changing the ACL's on
      another CPU, we simply don't care - if we've seen a NULL ACL, we might as
      well use it.
      
      So just load the ACL speculatively without any locking, and if it was
      NULL, just use it. If it's non-NULL (either because we had a cached
      entry, or because the cache hasn't been filled in at all), it means that
      we'll need to get the lock and re-load it properly.
      
      (This commit was ported from a patch originally authored by Linus for
      ext3.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      210ad6ae
  2. 15 6月, 2009 3 次提交
  3. 13 6月, 2009 5 次提交
  4. 18 6月, 2009 1 次提交
  5. 12 6月, 2009 6 次提交
    • A
      Push BKL down into ->remount_fs() · 337eb00a
      Alessio Igor Bogani 提交于
      [xfs, btrfs, capifs, shmem don't need BKL, exempt]
      Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      337eb00a
    • C
      ->write_super lock_super pushdown · ebc1ac16
      Christoph Hellwig 提交于
      Push down lock_super into ->write_super instances and remove it from the
      caller.
      
      Following filesystem don't need ->s_lock in ->write_super and are skipped:
      
       * bfs, nilfs2 - no other uses of s_lock and have internal locks in
      	->write_super
       * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
       * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
       	->write_super
       * xfs - no other uses of s_lock and uses internal lock (buffer lock on
      	superblock buffer) to serialize ->write_super.  Also xfs_fs_write_super
      	is superflous and will go away in the next merge window
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ebc1ac16
    • A
      Push lock_super() into the ->remount_fs() of filesystems that care about it · bbd6851a
      Al Viro 提交于
      Note that since we can't run into contention between remount_fs and write_super
      (due to exclusion on s_umount), we have to care only about filesystems that
      touch lock_super() on their own.  Out of those ext3, ext4, hpfs, sysv and ufs
      do need it; fat doesn't since its ->remount_fs() only accesses assign-once
      data (basically, it's "we have no atime on directories and only have atime on
      files for vfat; force nodiratime and possibly noatime into *flags").
      
      [folded a build fix from hch]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bbd6851a
    • C
      push BKL down into ->put_super · 6cfd0148
      Christoph Hellwig 提交于
      Move BKL into ->put_super from the only caller.  A couple of
      filesystems had trivial enough ->put_super (only kfree and NULLing of
      s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
      hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
      of them probably don't need it, but I'd rather sort that out individually.
      Preferably after all the other BKL pushdowns in that area.
      
      [AV: original used to move lock_super() down as well; these changes are
      removed since we don't do lock_super() at all in generic_shutdown_super()
      now]
      [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6cfd0148
    • A
      No need to do lock_super() for exclusion in generic_shutdown_super() · a9e220f8
      Al Viro 提交于
      We can't run into contention on it.  All other callers of lock_super()
      either hold s_umount (and we have it exclusive) or hold an active
      reference to superblock in question, which prevents the call of
      generic_shutdown_super() while the reference is held.  So we can
      replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
      (and corresponding change for unlock_super(), of course).
      
      Since ext4 expects s_lock held for its put_super, take lock_super()
      into it.  The rest of filesystems do not care at all.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a9e220f8
    • C
      remove ->write_super call in generic_shutdown_super · 8c85e125
      Christoph Hellwig 提交于
      We just did a full fs writeout using sync_filesystem before, and if
      that's not enough for the filesystem it can perform it's own writeout
      in ->put_super, which many filesystems already do.
      
      Move a call to foofs_write_super into every foofs_put_super for now to
      guarantee identical behaviour until it's cleaned up by the individual
      filesystem maintainers.
      
      Exceptions:
      
       - affs already has identical copy & pasted code at the beginning of
         affs_put_super so no need to do it twice.
       - xfs does the right thing without it and I have changes pending for
         the xfs tree touching this are so I don't really need conflicts
         here..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c85e125
  6. 11 6月, 2009 1 次提交
  7. 09 6月, 2009 2 次提交
  8. 05 6月, 2009 3 次提交
  9. 09 6月, 2009 1 次提交
    • J
      ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() · 03f5d8bc
      Jan Kara 提交于
      Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This
      seems to be a relict from some old days and setting disksize in this
      function does not make much sense.  Currently it was set only by
      ext4_getblk().  Since the parameter has some effect only if create ==
      1, it is easy to check by grepping through the sources that the three
      callers which end up calling ext4_getblk() with create == 1
      (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set
      disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      03f5d8bc
  10. 04 6月, 2009 1 次提交
  11. 25 5月, 2009 2 次提交
  12. 23 5月, 2009 1 次提交
  13. 18 5月, 2009 3 次提交
  14. 15 5月, 2009 3 次提交
    • T
      ext4: Fix race in ext4_inode_info.i_cached_extent · 2ec0ae3a
      Theodore Ts'o 提交于
      If two CPU's simultaneously call ext4_ext_get_blocks() at the same
      time, there is nothing protecting the i_cached_extent structure from
      being used and updated at the same time.  This could potentially cause
      the wrong location on disk to be read or written to, including
      potentially causing the corruption of the block group descriptors
      and/or inode table.
      
      This bug has been in the ext4 code since almost the very beginning of
      ext4's development.  Fortunately once the data is stored in the page
      cache cache, ext4_get_blocks() doesn't need to be called, so trying to
      replicate this problem to the point where we could identify its root
      cause was *extremely* difficult.  Many thanks to Kevin Shanahan for
      working over several months to be able to reproduce this easily so we
      could finally nail down the cause of the corruption.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      2ec0ae3a
    • A
      ext4: Clear the unwritten buffer_head flag after the extent is initialized · 2a8964d6
      Aneesh Kumar K.V 提交于
      The BH_Unwritten flag indicates that the buffer is allocated on disk
      but has not been written; that is, the disk was part of a persistent
      preallocation area.  That flag should only be set when a get_blocks()
      function is looking up a inode's logical to physical block mapping.
      
      When ext4_get_blocks_wrap() is called with create=1, the uninitialized
      extent is converted into an initialized one, so the BH_Unwritten flag
      is no longer appropriate.  Hence, we need to make sure the
      BH_Unwritten is not left set, since the combination of BH_Mapped and
      BH_Unwritten is not allowed; among other things, it will result ext4's
      get_block() to be called over and over again during the write_begin
      phase of write(2).
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a8964d6
    • T
      ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state · 2ac3b6e0
      Theodore Ts'o 提交于
      The ext4_get_blocks() function was depending on the value of
      bh_result->b_state as an input parameter to decide whether or not
      update the delalloc accounting statistics by calling
      ext4_da_update_reserve_space().  We now use a separate flag,
      EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE, to requests this update, so that
      all callers of ext4_get_blocks() can clear map_bh.b_state before
      calling ext4_get_blocks() without worrying about any consistency
      issues.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2ac3b6e0
  15. 14 5月, 2009 1 次提交
  16. 13 5月, 2009 1 次提交
  17. 14 5月, 2009 1 次提交
  18. 13 5月, 2009 1 次提交
  19. 14 5月, 2009 3 次提交
    • T
      ext4: Add documentation to the ext4_*get_block* functions · b920c755
      Theodore Ts'o 提交于
      This adds more documentation to various internal functions in
      fs/ext4/inode.c, most notably ext4_ind_get_blocks(),
      ext4_da_get_block_write(), ext4_da_get_block_prep(),
      ext4_normal_get_block_write().
      
      In addition, the static function ext4_normal_get_block_write() has
      been renamed noalloc_get_block_write(), since it is used in many
      places far beyond ext4_normal_writepage().
      
      Plenty of warnings have been added to the noalloc_get_block_write()
      function, since the way it is used is amazingly fragile.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b920c755
    • T
      ext4: Define a new set of flags for ext4_get_blocks() · c2177057
      Theodore Ts'o 提交于
      The functions ext4_get_blocks(), ext4_ext_get_blocks(), and
      ext4_ind_get_blocks() used an ad-hoc set of integer variables used as
      boolean flags passed in as arguments.  Use a single flags parameter
      and a setandard set of bitfield flags instead.  This saves space on
      the call stack, and it also makes the code a bit more understandable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2177057
    • T
      ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() · 12b7ac17
      Theodore Ts'o 提交于
      Another function rename for clarity's sake.  The _wrap prefix simply
      confuses people, and didn't add much people trying to follow the code
      paths.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      12b7ac17