1. 22 7月, 2009 4 次提交
    • E
      inotify: use GFP_NOFS under potential memory pressure · f44aebcc
      Eric Paris 提交于
      inotify can have a watchs removed under filesystem reclaim.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.31-rc2 #16
      ---------------------------------
      inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
      {IN-RECLAIM_FS-W} state was registered at:
        [<c10536ab>] __lock_acquire+0x2c9/0xac4
        [<c1053f45>] lock_acquire+0x9f/0xc2
        [<c1308872>] __mutex_lock_common+0x2d/0x323
        [<c1308c00>] mutex_lock_nested+0x2e/0x36
        [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
        [<c108bfb6>] shrink_slab+0xe2/0x13c
        [<c108c3e1>] kswapd+0x3d1/0x55d
        [<c10449b5>] kthread+0x66/0x6b
        [<c1003fdf>] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
      
      Two things are needed to fix this.  First we need a method to tell
      fsnotify_create_event() to use GFP_NOFS and second we need to stop using
      one global IN_IGNORED event and allocate them one at a time.  This solves
      current issues with multiple IN_IGNORED on a queue having tail drop
      problems and simplifies the allocations since we don't have to worry about
      two tasks opperating on the IGNORED event concurrently.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      f44aebcc
    • E
      inotify: fix error paths in inotify_update_watch · 7e790dd5
      Eric Paris 提交于
      inotify_update_watch could leave things in a horrid state on a number of
      error paths.  We could try to remove idr entries that didn't exist, we
      could send an IN_IGNORED to userspace for watches that don't exist, and a
      bit of other stupidity.  Clean these up by doing the idr addition before we
      put the mark on the inode since we can clean that up on error and getting
      off the inode's mark list is hard.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      7e790dd5
    • E
      inotify: do not leak inode marks in inotify_add_watch · 75fe2b26
      Eric Paris 提交于
      inotify_add_watch had a couple of problems.  The biggest being that if
      inotify_add_watch was called on the same inode twice (to update or change the
      event mask) a refence was taken on the original inode mark by
      fsnotify_find_mark_entry but was not being dropped at the end of the
      inotify_add_watch call.  Thus if inotify_rm_watch was called although the mark
      was removed from the inode, the refcnt wouldn't hit zero and we would leak
      memory.
      Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      75fe2b26
    • E
      inotify: drop user watch count when a watch is removed · 5549f7cd
      Eric Paris 提交于
      The inotify rewrite forgot to drop the inotify watch use cound when a watch
      was removed.  This means that a single inotify fd can only ever register a
      maximum of /proc/sys/fs/max_user_watches even if some of those had been
      freed.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      5549f7cd
  2. 02 7月, 2009 1 次提交
  3. 20 6月, 2009 1 次提交
    • E
      inotify: inotify_destroy_mark_entry could get called twice · 528da3e9
      Eric Paris 提交于
      inotify_destroy_mark_entry could get called twice for the same mark since it
      is called directly in inotify_rm_watch and when the mark is being destroyed for
      another reason.  As an example assume that the file being watched was just
      deleted so inotify_destroy_mark_entry would get called from the path
      fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() ->
      fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry().  If this
      happened at the same time as userspace tried to remove a watch via
      inotify_rm_watch we could attempt to remove the mark from the idr twice and
      could thus double dec the ref cnt and potentially could be in a use after
      free/double free situation.  The fix is to have inotify_rm_watch use the
      generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the
      inotify_destroy_mark_entry() function can only be called one.
      
      This patch also renames the function to inotify_ingored_remove_idr() so it is
      clear what is actually going on in the function.
      
      Hopefully this fixes:
      [   20.342058] idr_remove called for id=20 which is not allocated.
      [   20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077
      [   20.353933] Call Trace:
      [   20.356410]  [<ffffffff811a82b7>] idr_remove+0x115/0x18f
      [   20.361737]  [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75
      [   20.367061]  [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf
      [   20.373771]  [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf
      [   20.380306]  [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10
      [   20.386238]  [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170
      [   20.393293]  [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf
      [   20.399829]  [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6
      [   20.405850]  [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Tested-by: NPeter Ziljlstra <peterz@infradead.org>
      528da3e9
  4. 12 6月, 2009 1 次提交
    • E
      inotify: reimplement inotify using fsnotify · 63c882a0
      Eric Paris 提交于
      Reimplement inotify_user using fsnotify.  This should be feature for feature
      exactly the same as the original inotify_user.  This does not make any changes
      to the in kernel inotify feature used by audit.  Those patches (and the eventual
      removal of in kernel inotify) will come after the new inotify_user proves to be
      working correctly.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      63c882a0
  5. 07 5月, 2009 1 次提交
    • W
      inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive · 381a80e6
      Wu Fengguang 提交于
      There is what we believe to be a false positive reported by lockdep.
      
      inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
      kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
      dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
      
      The plan is to fix this via lockdep annotation, but that is proving to be
      quite involved.
      
      The patch flips the allocation over to GFP_NFS to shut the warning up, for
      the 2.6.30 release.
      
      Hopefully we will fix this for real in 2.6.31.  I'll queue a patch in -mm
      to switch it back to GFP_KERNEL so we don't forget.
      
        =================================
        [ INFO: inconsistent lock state ]
        2.6.30-rc2-next-20090417 #203
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
        {RECLAIM_FS-ON-W} state was registered at:
          [<ffffffff81079188>] mark_held_locks+0x68/0x90
          [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
          [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
          [<ffffffff81130652>] kernel_event+0xe2/0x190
          [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
          [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
          [<ffffffff8110444d>] vfs_create+0xcd/0x140
          [<ffffffff8110825d>] do_filp_open+0x88d/0xa20
          [<ffffffff810f6b68>] do_sys_open+0x98/0x140
          [<ffffffff810f6c50>] sys_open+0x20/0x30
          [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
        irq event stamp: 690455
        hardirqs last  enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
        hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
        softirqs last  enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
        softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
      
        other info that might help us debug this:
        2 locks held by kswapd0/380:
         #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
         #1:  (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
      
        stack backtrace:
        Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
        Call Trace:
         [<ffffffff810789ef>] print_usage_bug+0x19f/0x200
         [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
         [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
         [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
         [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
         [<ffffffff810f478c>] ? slob_free+0x10c/0x370
         [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81012fe9>] ? sched_clock+0x9/0x10
         [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
         [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
         [<ffffffff8110cb23>] d_kill+0x33/0x60
         [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
         [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
         [<ffffffff810d0cc5>] shrink_slab+0x125/0x180
         [<ffffffff810d1540>] kswapd+0x560/0x7a0
         [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
         [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
         [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
         [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
         [<ffffffff8106555b>] kthread+0x5b/0xa0
         [<ffffffff8100d40a>] child_rip+0xa/0x20
         [<ffffffff8100cdd0>] ? restore_args+0x0/0x30
         [<ffffffff81065500>] ? kthread+0x0/0xa0
         [<ffffffff8100d400>] ? child_rip+0x0/0x20
      
      [eparis@redhat.com: fix audit too]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      381a80e6
  6. 27 1月, 2009 1 次提交
    • V
      inotify: clean up inotify_read and fix locking problems · 3632dee2
      Vegard Nossum 提交于
      If userspace supplies an invalid pointer to a read() of an inotify
      instance, the inotify device's event list mutex is unlocked twice.
      This causes an unbalance which effectively leaves the data structure
      unprotected, and we can trigger oopses by accessing the inotify
      instance from different tasks concurrently.
      
      The best fix (contributed largely by Linus) is a total rewrite
      of the function in question:
      
      On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote:
      > The thing to notice is that:
      >
      >  - locking is done in just one place, and there is no question about it
      >   not having an unlock.
      >
      >  - that whole double-while(1)-loop thing is gone.
      >
      >  - use multiple functions to make nesting and error handling sane
      >
      >  - do error testing after doing the things you always need to do, ie do
      >   this:
      >
      >        mutex_lock(..)
      >        ret = function_call();
      >        mutex_unlock(..)
      >
      >        .. test ret here ..
      >
      >   instead of doing conditional exits with unlocking or freeing.
      >
      > So if the code is written in this way, it may still be buggy, but at least
      > it's not buggy because of subtle "forgot to unlock" or "forgot to free"
      > issues.
      >
      > This _always_ unlocks if it locked, and it always frees if it got a
      > non-error kevent.
      
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Robert Love <rlove@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3632dee2
  7. 14 1月, 2009 2 次提交
  8. 06 1月, 2009 1 次提交
    • M
      inotify: fix type errors in interfaces · 4ae8978c
      Michael Kerrisk 提交于
      The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.
      
      For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
      currently 'u32', it should be '__s32' .  That is Robert's suggestion, and
      is consistent with the other declarations of watch descriptors in the
      kernel source, in particular, the inotify_event structure in
      include/linux/inotify.h:
      
      struct inotify_event {
              __s32           wd;             /* watch descriptor */
              __u32           mask;           /* watch mask */
              __u32           cookie;         /* cookie to synchronize two events */
              __u32           len;            /* length (including nulls) of name */
              char            name[0];        /* stub for possible name */
      };
      
      The patch makes the changes needed for inotify_rm_watch().
      Signed-off-by: NMichael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Robert Love <rlove@google.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ae8978c
  9. 01 1月, 2009 2 次提交
  10. 14 11月, 2008 1 次提交
  11. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  12. 03 10月, 2008 1 次提交
  13. 27 7月, 2008 2 次提交
  14. 25 7月, 2008 3 次提交
    • U
      flag parameters: check magic constants · e38b36f3
      Ulrich Drepper 提交于
      This patch adds test that ensure the boundary conditions for the various
      constants introduced in the previous patches is met.  No code is generated.
      
      [akpm@linux-foundation.org: fix alpha]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e38b36f3
    • U
      flag parameters: NONBLOCK in inotify_init · 510df2dd
      Ulrich Drepper 提交于
      This patch adds non-blocking support for inotify_init1.  The
      additional changes needed are minimal.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_NONBLOCK O_NONBLOCK
      
      int
      main (void)
      {
        int fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("inotify_init1(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_NONBLOCK);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      510df2dd
    • U
      flag parameters: inotify_init · 4006553b
      Ulrich Drepper 提交于
      This patch introduces the new syscall inotify_init1 (note: the 1 stands for
      the one parameter the syscall takes, as opposed to no parameter before).  The
      values accepted for this parameter are function-specific and defined in the
      inotify.h header.  Here the values must match the O_* flags, though.  In this
      patch CLOEXEC support is introduced.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("inotify_init1(0) set close-on-exit");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4006553b
  15. 29 4月, 2008 1 次提交
  16. 15 2月, 2008 3 次提交
  17. 09 2月, 2008 1 次提交
  18. 07 2月, 2008 2 次提交
  19. 17 10月, 2007 1 次提交
  20. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  21. 13 2月, 2007 1 次提交
  22. 09 12月, 2006 1 次提交
  23. 08 12月, 2006 1 次提交
  24. 01 8月, 2006 1 次提交
    • A
      [PATCH] inotify: fix deadlock found by lockdep · 5b6509aa
      Arjan van de Ven 提交于
      This is a real deadlock, a nice complex one:
      (warning: long explanation follows so that Andrew can have a complete
      patch description)
      
      it's an ABCDA deadlock:
      
      A iprune_mutex
      B inode->inotify_mutex
      C ih->mutex
      D dev->ev_mutex
      
      The AB relationship comes straight from invalidate_inodes()
      
      int invalidate_inodes(struct super_block * sb)
      {
              int busy;
              LIST_HEAD(throw_away);
      
              mutex_lock(&iprune_mutex);
              spin_lock(&inode_lock);
              inotify_unmount_inodes(&sb->s_inodes);
      
      where inotify_umount_inodes() takes the
                      mutex_lock(&inode->inotify_mutex);
      
      The BC relationship comes directly from inotify_find_update_watch():
      s32 inotify_find_update_watch(struct inotify_handle *ih, struct inode *inode,
                                    u32 mask)
      {
         ...
              mutex_lock(&inode->inotify_mutex);
              mutex_lock(&ih->mutex);
      
      The CD relationship comes from inotify_rm_wd:
      inotify_rm_wd does
              mutex_lock(&inode->inotify_mutex);
              mutex_lock(&ih->mutex)
      and then calls inotify_remove_watch_locked() which calls
      notify_dev_queue_event() which does
      	        mutex_lock(&dev->ev_mutex);
      
      (this strictly is a BCD relationship)
      
      The DA relationship comes from the most interesting part:
      
        [<ffffffff8022d9f2>] shrink_icache_memory+0x42/0x270
        [<ffffffff80240dc4>] shrink_slab+0x11d/0x1c9
        [<ffffffff802b5104>] try_to_free_pages+0x187/0x244
        [<ffffffff8020efed>] __alloc_pages+0x1cd/0x2e0
        [<ffffffff8025e1f8>] cache_alloc_refill+0x3f8/0x821
        [<ffffffff8020a5e5>] kmem_cache_alloc+0x85/0xcb
        [<ffffffff802db027>] kernel_event+0x2e/0x122
        [<ffffffff8021d61c>] inotify_dev_queue_event+0xcc/0x140
      
      inotify_dev_queue_event schedules a kernel_event which does a
      kmem_cache_alloc( , GFP_KERNEL) which may try to shrink slabs, including
      the inode cache .. which then takes iprune_mutex.
      
      And voila, there is an AB, a BC, a CD relationship (even a direct BCD),
      and also now a DA relationship -> a circular type AB-BA deadlock but
      involving 4 locks.
      
      The solution is simple: kernel_event() is NOT allowed to use GFP_KERNEL,
      but must use GFP_NOFS to not cause recursion into the VFS.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NRobert Love <rml@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b6509aa
  25. 23 6月, 2006 1 次提交
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
  26. 20 6月, 2006 3 次提交