1. 28 3月, 2009 1 次提交
    • N
      fs: avoid I_NEW inodes · aabb8fdb
      Nick Piggin 提交于
      To be on the safe side, it should be less fragile to exclude I_NEW inodes
      from inode list scans by default (unless there is an important reason to
      have them).
      
      Normally they will get excluded (eg.  by zero refcount or writecount etc),
      however it is a bit fragile for list walkers to know exactly what parts of
      the inode state is set up and valid to test when in I_NEW.  So along these
      lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
      checks with them too -- this shouldn't be a problem should it?)
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aabb8fdb
  2. 19 2月, 2009 1 次提交
    • I
      inotify: fix GFP_KERNEL related deadlock · f04b30de
      Ingo Molnar 提交于
      Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
      assert:
      
      [ 1093.677775]
      [ 1093.677781] =================================
      [ 1093.680031] [ INFO: inconsistent lock state ]
      [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] ---------------------------------
      [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1093.680031]  (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
      [ 1093.680031]   [<c01696b9>] mark_held_locks+0x43/0x5b
      [ 1093.680031]   [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e
      [ 1093.680031]   [<c01cf8b0>] kmem_cache_alloc+0x20/0x150
      [ 1093.680031]   [<c040d0ec>] idr_pre_get+0x27/0x6c
      [ 1093.680031]   [<c02056e3>] inotify_handle_get_wd+0x25/0xad
      [ 1093.680031]   [<c0205f43>] inotify_add_watch+0x7a/0x129
      [ 1093.680031]   [<c020679e>] sys_inotify_add_watch+0x20f/0x250
      [ 1093.680031]   [<c010389e>] sysenter_do_call+0x12/0x35
      [ 1093.680031]   [<ffffffff>] 0xffffffff
      [ 1093.680031] irq event stamp: 60417
      [ 1093.680031] hardirqs last  enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59
      [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59
      [ 1093.680031] softirqs last  enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b
      [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d
      [ 1093.680031]
      [ 1093.680031] other info that might help us debug this:
      [ 1093.680031] 2 locks held by kswapd0/308:
      [ 1093.680031]  #0:  (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189
      [ 1093.680031]  #1:  (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb
      [ 1093.680031]
      [ 1093.680031] stack backtrace:
      [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] Call Trace:
      [ 1093.680031]  [<c016947a>] valid_state+0x12a/0x13d
      [ 1093.680031]  [<c016954e>] mark_lock+0xc1/0x1e9
      [ 1093.680031]  [<c016a5b4>] ? check_usage_forwards+0x0/0x3f
      [ 1093.680031]  [<c016ab74>] __lock_acquire+0x2c6/0xac8
      [ 1093.680031]  [<c01688d9>] ? register_lock_class+0x17/0x228
      [ 1093.680031]  [<c016b3d3>] lock_acquire+0x5d/0x7a
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08824c4>] __mutex_lock_common+0x3a/0x4cb
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08829ed>] mutex_lock_nested+0x2e/0x36
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c01e6672>] dentry_iput+0x90/0xc2
      [ 1093.680031]  [<c01e67a3>] d_kill+0x21/0x45
      [ 1093.680031]  [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355
      [ 1093.680031]  [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb
      [ 1093.680031]  [<c01b05ed>] shrink_slab+0x121/0x189
      [ 1093.680031]  [<c01b0d12>] kswapd+0x39f/0x561
      [ 1093.680031]  [<c01ae499>] ? isolate_pages_global+0x0/0x233
      [ 1093.680031]  [<c0157eae>] ? autoremove_wake_function+0x0/0x43
      [ 1093.680031]  [<c01b0973>] ? kswapd+0x0/0x561
      [ 1093.680031]  [<c0157daf>] kthread+0x41/0x82
      [ 1093.680031]  [<c0157d6e>] ? kthread+0x0/0x82
      [ 1093.680031]  [<c01043ab>] kernel_thread_helper+0x7/0x10
      
      inotify_handle_get_wd() does idr_pre_get() which does a
      kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
      extreme MM pressure.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f04b30de
  3. 01 1月, 2009 1 次提交
  4. 11 12月, 2008 1 次提交
    • D
      inotify: fix IN_ONESHOT unmount event watcher · 6ee5a399
      Dmitri Monakhov 提交于
      On umount two event will be dispatched to watcher:
      
      1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
      2: remove_watch(watch, dev)
          ->inotify_dev_queue_event(.., IN_IGNORED, ..)
      
      But if watcher has IN_ONESHOT bit set then the watcher will be released
      inside first event.  Which result in accessing invalid object later.  IMHO
      it is not pure regression.  This bug wasn't triggered while initial
      inotify interface testing phase because of another bug in IN_ONESHOT
      handling logic :)
      
        commit ac74c00e
        Author: Ulisses Furquim <ulissesf@gmail.com>
        Date:   Fri Feb 8 04:18:16 2008 -0800
          inotify: fix check for one-shot watches before destroying them
          As the IN_ONESHOT bit is never set when an event is sent we must check it
          in the watch's mask and not in the event's mask.
      
      TESTCASE:
      mkdir mnt
      mount -ttmpfs none mnt
      mkdir mnt/d
      ./inotify mnt/d&
      umount mnt ## << lockup or crash here
      
      TESTSOURCE:
      /* gcc -oinotify inotify.c */
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/inotify.h>
      
      int main(int argc, char **argv)
      {
              char buf[1024];
              struct inotify_event *ie;
              char *p;
              int i;
              ssize_t l;
      
              p = argv[1];
              i = inotify_init();
              inotify_add_watch(i, p, ~0);
      
              l = read(i, buf, sizeof(buf));
              printf("read %d bytes\n", l);
              ie = (struct inotify_event *) buf;
              printf("event mask: %d\n", ie->mask);
      	return 0;
      }
      Signed-off-by: NDmitri Monakhov <dmonakhov@openvz.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Ulisses Furquim <ulissesf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ee5a399
  5. 16 11月, 2008 1 次提交
    • A
      Fix inotify watch removal/umount races · 8f7b0ba1
      Al Viro 提交于
      Inotify watch removals suck violently.
      
      To kick the watch out we need (in this order) inode->inotify_mutex and
      ih->mutex.  That's fine if we have a hold on inode; however, for all
      other cases we need to make damn sure we don't race with umount.  We can
      *NOT* just grab a reference to a watch - inotify_unmount_inodes() will
      happily sail past it and we'll end with reference to inode potentially
      outliving its superblock.
      
      Ideally we just want to grab an active reference to superblock if we
      can; that will make sure we won't go into inotify_umount_inodes() until
      we are done.  Cleanup is just deactivate_super().
      
      However, that leaves a messy case - what if we *are* racing with
      umount() and active references to superblock can't be acquired anymore?
      We can bump ->s_count, grab ->s_umount, which will almost certainly wait
      until the superblock is shut down and the watch in question is pining
      for fjords.  That's fine, but there is a problem - we might have hit the
      window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
      the moment when superblock is past the point of no return and is heading
      for shutdown) and the moment when deactivate_super() acquires
      ->s_umount.
      
      We could just do drop_super() yield() and retry, but that's rather
      antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
      ->s_umount and having found that we'd got there first (i.e.  that
      ->s_root is non-NULL) we know that we won't race with
      inotify_umount_inodes().
      
      So we could grab a reference to watch and do the rest as above, just
      with drop_super() instead of deactivate_super(), right? Wrong.  We had
      to drop ih->mutex before we could grab ->s_umount.  So the watch
      could've been gone already.
      
      That still can be dealt with - we need to save watch->wd, do idr_find()
      and compare its result with our pointer.  If they match, we either have
      the damn thing still alive or we'd lost not one but two races at once,
      the watch had been killed and a new one got created with the same ->wd
      at the same address.  That couldn't have happened in inotify_destroy(),
      but inotify_rm_wd() could run into that.  Still, "new one got created"
      is not a problem - we have every right to kill it or leave it alone,
      whatever's more convenient.
      
      So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
      "grab it and kill it" check.  If it's been our original watch, we are
      fine, if it's a newcomer - nevermind, just pretend that we'd won the
      race and kill the fscker anyway; we are safe since we know that its
      superblock won't be going away.
      
      And yes, this is far beyond mere "not very pretty"; so's the entire
      concept of inotify to start with.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NGreg KH <greg@kroah.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f7b0ba1
  6. 07 2月, 2008 2 次提交
    • N
      inotify: remove debug code · 0d71bd59
      Nick Piggin 提交于
      The inotify debugging code is supposed to verify that the
      DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in
      notifications getting lost nor extra needless locking generated.
      
      Unfortunately there are also some races in the debugging code.  And it isn't
      very good at finding problems anyway.  So remove it for now.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Robert Love <rlove@google.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Yan Zheng <yanzheng@21cn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d71bd59
    • N
      inotify: fix race · d599e36a
      Nick Piggin 提交于
      There is a race between setting an inode's children's "parent watched" flag
      when placing the first watch on a parent, and instantiating new children of
      that parent: a child could miss having its flags set by
      set_dentry_child_flags, but then inotify_d_instantiate might still see
      !inotify_inode_watched.
      
      The solution is to set_dentry_child_flags after adding the watch.  Locking is
      taken care of, because both set_dentry_child_flags and inotify_d_instantiate
      hold dcache_lock and child->d_locks.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Robert Love <rlove@google.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Yan Zheng <yanzheng@21cn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d599e36a
  7. 21 10月, 2007 2 次提交
  8. 09 5月, 2007 1 次提交
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
  9. 04 12月, 2006 1 次提交
  10. 20 6月, 2006 4 次提交
  11. 22 5月, 2006 2 次提交
  12. 11 4月, 2006 1 次提交
  13. 29 3月, 2006 1 次提交
  14. 27 3月, 2006 1 次提交
  15. 26 3月, 2006 1 次提交
    • N
      [PATCH] inotify: lock avoidance with parent watch status in dentry · c32ccd87
      Nick Piggin 提交于
      Previous inotify work avoidance is good when inotify is completely unused,
      but it breaks down if even a single watch is in place anywhere in the
      system.  Robin Holt notices that udev is one such culprit - it slows down a
      512-thread application on a 512 CPU system from 6 seconds to 22 minutes.
      
      Solve this by adding a flag in the dentry that tells inotify whether or not
      its parent inode has a watch on it.  Event queueing to parent will skip
      taking locks if this flag is cleared.  Setting and clearing of this flag on
      all child dentries versus event delivery: this is no in terms of race
      cases, and that was shown to be equivalent to always performing the check.
      
      The essential behaviour is that activity occuring _after_ a watch has been
      added and _before_ it has been removed, will generate events.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Robert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c32ccd87
  16. 23 3月, 2006 2 次提交
  17. 08 2月, 2006 1 次提交
  18. 19 1月, 2006 1 次提交
  19. 13 12月, 2005 1 次提交
  20. 09 11月, 2005 1 次提交
  21. 24 10月, 2005 1 次提交
    • A
      [PATCH] inotify/idr leak fix · 8d3b3591
      Andrew Morton 提交于
      Fix a bug which was reported and diagnosed by
      Stefan Jones <stefan.jones@churchillrandoms.co.uk>
      
      IDR trees include a cache of idr_layer objects.  There's no way to destroy
      this cache, so when we discard an overall idr tree we end up leaking some
      memory.
      
      Add and use idr_destroy() for this.  v9fs and infiniband also need to use
      idr_destroy() to avoid leaks.
      
      Or, we make the cache global, like radix_tree_preload().  Which is probably
      better.  Later.
      
      Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: Robert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d3b3591
  22. 08 9月, 2005 2 次提交
  23. 27 8月, 2005 1 次提交
  24. 16 8月, 2005 1 次提交
  25. 02 8月, 2005 1 次提交
  26. 27 7月, 2005 7 次提交