1. 23 6月, 2009 1 次提交
    • J
      vfs: Set special lockdep map for dirs only if not set by fs · 9a7aa12f
      Jan Kara 提交于
      Some filesystems need to set lockdep map for i_mutex differently for
      different directories. For example OCFS2 has system directories (for
      orphan inode tracking and for gathering all system files like journal
      or quota files into a single place) which have different locking
      locking rules than standard directories. For a filesystem setting
      lockdep map is naturaly done when the inode is read but we have to
      modify unlock_new_inode() not to overwrite the lockdep map the filesystem
      has set.
      
      Acked-by: peterz@infradead.org
      CC: mingo@redhat.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      9a7aa12f
  2. 13 6月, 2009 1 次提交
  3. 12 6月, 2009 3 次提交
    • N
      fs: introduce mnt_clone_write · 96029c4e
      npiggin@suse.de 提交于
      This patch speeds up lmbench lat_mmap test by about another 2% after the
      first patch.
      
      Before:
       avg = 462.286
       std = 5.46106
      
      After:
       avg = 453.12
       std = 9.58257
      
      (50 runs of each, stddev gives a reasonable confidence)
      
      It does this by introducing mnt_clone_write, which avoids some heavyweight
      operations of mnt_want_write if called on a vfsmount which we know already
      has a write count; and mnt_want_write_file, which can call mnt_clone_write
      if the file is open for write.
      
      After these two patches, mnt_want_write and mnt_drop_write go from 7% on
      the profile down to 1.3% (including mnt_clone_write).
      
      [AV: mnt_want_write_file() should take file alone and derive mnt from it;
      not only all callers have that form, but that's the only mnt about which
      we know that it's already held for write if file is opened for write]
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      96029c4e
    • E
      fsnotify: handle filesystem unmounts with fsnotify marks · 164bc619
      Eric Paris 提交于
      When an fs is unmounted with an fsnotify mark entry attached to one of its
      inodes we need to destroy that mark entry and we also (like inotify) send
      an unmount event.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      164bc619
    • E
      fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49
      Eric Paris 提交于
      This patch creates a way for fsnotify groups to attach marks to inodes.
      These marks have little meaning to the generic fsnotify infrastructure
      and thus their meaning should be interpreted by the group that attached
      them to the inode's list.
      
      dnotify and inotify  will make use of these markings to indicate which
      inodes are of interest to their respective groups.  But this implementation
      has the useful property that in the future other listeners could actually
      use the marks for the exact opposite reason, aka to indicate which inodes
      it had NO interest in.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      3be25f49
  4. 07 6月, 2009 1 次提交
  5. 06 6月, 2009 1 次提交
  6. 09 5月, 2009 1 次提交
  7. 15 4月, 2009 1 次提交
    • M
      splice: add helpers for locking pipe inode · 61e0d47c
      Miklos Szeredi 提交于
      There are lots of sequences like this, especially in splice code:
      
      	if (pipe->inode)
      		mutex_lock(&pipe->inode->i_mutex);
      	/* do something */
      	if (pipe->inode)
      		mutex_unlock(&pipe->inode->i_mutex);
      
      so introduce helpers which do the conditional locking and unlocking.
      Also replace the inode_double_lock() call with a pipe_double_lock()
      helper to avoid spreading the use of this functionality beyond the
      pipe code.
      
      This patch is just a cleanup, and should cause no behavioral changes.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      61e0d47c
  8. 28 3月, 2009 1 次提交
    • N
      fs: avoid I_NEW inodes · aabb8fdb
      Nick Piggin 提交于
      To be on the safe side, it should be less fragile to exclude I_NEW inodes
      from inode list scans by default (unless there is an important reason to
      have them).
      
      Normally they will get excluded (eg.  by zero refcount or writecount etc),
      however it is a bit fragile for list walkers to know exactly what parts of
      the inode state is set up and valid to test when in I_NEW.  So along these
      lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
      checks with them too -- this shouldn't be a problem should it?)
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aabb8fdb
  9. 27 3月, 2009 1 次提交
  10. 26 3月, 2009 1 次提交
  11. 13 3月, 2009 1 次提交
    • N
      fs: new inode i_state corruption fix · 7ef0d737
      Nick Piggin 提交于
      There was a report of a data corruption
      http://lkml.org/lkml/2008/11/14/121.  There is a script included to
      reproduce the problem.
      
      During testing, I encountered a number of strange things with ext3, so I
      tried ext2 to attempt to reduce complexity of the problem.  I found that
      fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
      cleared, even though instrumentation showed that unlock_new_inode had
      already been called for that inode.  This points to memory scribble, or
      synchronisation problme.
      
      i_state of I_NEW inodes is not protected by inode_lock because other
      processes are not supposed to touch them until I_LOCK (and I_NEW) is
      cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
      i_state revealed that generic_sync_sb_inodes is picking up new inodes from
      the inode lists and passing them to __writeback_single_inode without
      waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
      my case it would look like this:
      
      CPU0                            CPU1
      unlock_new_inode()              __sync_single_inode()
       reg <- inode->i_state
       reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
       reg -> inode->i_state          reg -> reg | I_SYNC
                                      reg -> inode->i_state
      
      Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.
      
      Fix for this is rather than wait for I_NEW inodes, just skip over them:
      inodes concurrently being created are not subject to data integrity
      operations, and should not significantly contribute to dirty memory
      either.
      
      After this change, I'm unable to reproduce any of the added warnings or
      hangs after ~1hour of running.  Previously, the new warnings would start
      immediately and hang would happen in under 5 minutes.
      
      I'm also testing on ext3 now, and so far no problems there either.  I
      don't know whether this fixes the problem reported above, but it fixes a
      real problem for me.
      
      Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef0d737
  12. 06 2月, 2009 1 次提交
  13. 10 1月, 2009 1 次提交
  14. 08 1月, 2009 1 次提交
  15. 07 1月, 2009 2 次提交
  16. 06 1月, 2009 1 次提交
  17. 01 1月, 2009 1 次提交
    • A
      nfsd/create race fixes, infrastructure · 261bca86
      Al Viro 提交于
      new helpers - insert_inode_locked() and insert_inode_locked4().
      Hash new inode, making sure that there's no such inode in icache
      already.  If there is and it does not end up unhashed (as would
      happen if we have nfsd trying to resolve a bogus fhandle), fail.
      Otherwise insert our inode into hash and succeed.
      
      In either case have i_state set to new+locked; cleanup ends up
      being simpler with such calling conventions.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      261bca86
  18. 10 11月, 2008 1 次提交
  19. 30 10月, 2008 3 次提交
  20. 15 8月, 2008 1 次提交
    • C
      fs/inode.c: properly init address_space->writeback_index · 7d455e00
      Chris Mason 提交于
      write_cache_pages() uses i_mapping->writeback_index to pick up where it
      left off the last time a given inode was found by pdflush or
      balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
      
      alloc_inode() should set it to a sane value so that writeback doesn't
      start in the middle of a file.  It is somewhat difficult to notice the bug
      since write_cache_pages will loop around to the start of the file and the
      elevator helps hide the resulting seeks.
      
      For whatever reason, Btrfs hits this often.  Unpatched, untarring 30
      copies of the linux kernel in series runs at 47MB/s on a single sata
      drive.  With this fix, it jumps to 62MB/s.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d455e00
  21. 27 7月, 2008 2 次提交
  22. 07 5月, 2008 2 次提交
  23. 29 4月, 2008 1 次提交
  24. 19 4月, 2008 2 次提交
  25. 08 2月, 2008 1 次提交
    • D
      iget: remove iget() and the read_inode() super op as being obsolete · 12debc42
      David Howells 提交于
      Remove the old iget() call and the read_inode() superblock operation it uses
      as these are really obsolete, and the use of read_inode() does not produce
      proper error handling (no distinction between ENOMEM and EIO when marking an
      inode bad).
      
      Furthermore, this removes the temptation to use iget() to find an inode by
      number in a filesystem from code outside that filesystem.
      
      iget_locked() should be used instead.  A new function is added in an earlier
      patch (iget_failed) that is to be called to mark an inode as bad, unlock it
      and release it should the get routine fail.  Mark iget() and read_inode() as
      being obsolete and remove references to them from the documentation.
      
      Typically a filesystem will be modified such that the read_inode function
      becomes an internal iget function, for example the following:
      
      	void thingyfs_read_inode(struct inode *inode)
      	{
      		...
      	}
      
      would be changed into something like:
      
      	struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
      	{
      		struct inode *inode;
      		int ret;
      
      		inode = iget_locked(sb, ino);
      		if (!inode)
      			return ERR_PTR(-ENOMEM);
      		if (!(inode->i_state & I_NEW))
      			return inode;
      
      		...
      		unlock_new_inode(inode);
      		return inode;
      	error:
      		iget_failed(inode);
      		return ERR_PTR(ret);
      	}
      
      and then thingyfs_iget() would be called rather than iget(), for example:
      
      	ret = -EINVAL;
      	inode = iget(sb, ino);
      	if (!inode || is_bad_inode(inode))
      		goto error;
      
      becomes:
      
      	inode = thingyfs_iget(sb, ino);
      	if (IS_ERR(inode)) {
      		ret = PTR_ERR(inode);
      		goto error;
      	}
      
      Note that is_bad_inode() does not need to be called.  The error returned by
      thingyfs_iget() should render it unnecessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12debc42
  26. 29 1月, 2008 2 次提交
  27. 17 10月, 2007 4 次提交
  28. 14 10月, 2007 1 次提交
    • P
      lockdep: annotate dir vs file i_mutex · 14358e6d
      Peter Zijlstra 提交于
      On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
      > The circular lock seems to be this:
      > 
      > #1:
      > 
      >   sys_mmap2:              down_write(&mm->mmap_sem);
      >   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
      > 
      > 
      > #0:
      > 
      >   vfs_readdir:     mutex_lock(&inode->i_mutex);
      >    - during the readdir (filldir64), we take a user fault (missing page?)
      >     and call do_page_fault -
      >   do_page_fault:   down_read(&mm->mmap_sem);
      > 
      > 
      > So it does indeed look like a circular locking. Now the question is, "is
      > this a bug?".  Looking like the inode of #1 must be a file or something
      > else that you can mmap and the inode of #0 seems it must be a directory.
      > I would say "no".
      > 
      > Now if you can readdir on a file or mmap a directory, then this could be
      > an issue.
      > 
      > Otherwise, I'd love to see someone teach lockdep about this issue! ;-)
      
      Make a distinction between file and dir usage of i_mutex.
      The inode should be complete and unused at unlock_new_inode(), re-init
      i_mutex depending on its type.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      14358e6d