1. 14 5月, 2010 3 次提交
  2. 12 5月, 2010 1 次提交
  3. 01 5月, 2010 1 次提交
    • R
      Inotify: Fix build failure in inotify user support · 12b1b321
      Ralf Baechle 提交于
      CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result
      in the following build failure:
      
          LD      vmlinux
        fs/built-in.o: In function 'sys_inotify_init1':
        (.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd'
        fs/built-in.o: In function `sys_inotify_init1':
        (.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd'
        make[2]: *** [vmlinux] Error 1
        make[1]: *** [sub-make] Error 2
        make: *** [all] Error 2
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12b1b321
  4. 19 2月, 2010 1 次提交
  5. 16 1月, 2010 2 次提交
    • E
      inotify: only warn once for inotify problems · 976ae32b
      Eric Paris 提交于
      inotify will WARN() if it finds that the idr and the fsnotify internals
      somehow got out of sync.  It was only supposed to do this once but due
      to this stupid bug it would warn every single time a problem was
      detected.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      976ae32b
    • E
      inotify: do not reuse watch descriptors · 9e572cc9
      Eric Paris 提交于
      Since commit 7e790dd5 ("inotify: fix
      error paths in inotify_update_watch") inotify changed the manor in which
      it gave watch descriptors back to userspace.  Previous to this commit
      inotify acted like the following:
      
        inotify_add_watch(X, Y, Z) = 1
        inotify_rm_watch(X, 1);
        inotify_add_watch(X, Y, Z) = 2
      
      but after this patch inotify would return watch descriptors like so:
      
        inotify_add_watch(X, Y, Z) = 1
        inotify_rm_watch(X, 1);
        inotify_add_watch(X, Y, Z) = 1
      
      which I saw as equivalent to opening an fd where
      
        open(file) = 1;
        close(1);
        open(file) = 1;
      
      seemed perfectly reasonable.  The issue is that quite a bit of userspace
      apparently relies on the behavior in which watch descriptors will not be
      quickly reused.  KDE relies on it, I know some selinux packages rely on
      it, and I have heard complaints from other random sources such as debian
      bug 558981.
      
      Although the man page implies what we do is ok, we broke userspace so
      this patch almost reverts us to the old behavior.  It is still slightly
      racey and I have patches that would fix that, but they are rather large
      and this will fix it for all real world cases.  The race is as follows:
      
       - task1 creates a watch and blocks in idr_new_watch() before it updates
         the hint.
       - task2 creates a watch and updates the hint.
       - task1 updates the hint with it's older wd
       - task removes the watch created by task2
       - task adds a new watch and will reuse the wd originally given to task2
      
      it requires moving some locking around the hint (last_wd) but this should
      solve it for the real world and be -stable safe.
      
      As a side effect this patch papers over a bug in the lib/idr code which
      is causing a large number WARN's to pop on people's system and many
      reports in kerneloops.org.  I'm working on the root cause of that idr
      bug seperately but this should make inotify immune to that issue.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e572cc9
  6. 17 12月, 2009 2 次提交
  7. 04 12月, 2009 1 次提交
  8. 19 11月, 2009 1 次提交
  9. 12 11月, 2009 1 次提交
  10. 29 8月, 2009 1 次提交
  11. 28 8月, 2009 2 次提交
  12. 27 8月, 2009 4 次提交
  13. 18 8月, 2009 2 次提交
    • E
      inotify: start watch descriptor count at 1 · 08e53fcb
      Eric Paris 提交于
      The inotify_add_watch man page specifies that inotify_add_watch() will
      return a non-negative integer.  However, historically the inotify
      watches started at 1, not at 0.
      
      Turns out that the inotifywait program provided by the inotify-tools
      package doesn't properly handle a 0 watch descriptor.  In 7e790dd5 we
      changed from starting at 1 to starting at 0.  This patch starts at 1,
      just like in previous kernels, but also just like in previous kernels
      it's possible for it to wrap back to 0.  This preserves the kernel
      functionality exactly like it was before the patch (neither method broke
      the spec)
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08e53fcb
    • E
      notify: unused event private race · eef3a116
      Eric Paris 提交于
      inotify decides if private data it passed to get added to an event was
      used by checking list_empty().  But it's possible that the event may
      have been dequeued and the private event removed so it would look empty.
      
      The fix is to use the return code from fsnotify_add_notify_event rather
      than looking at the list.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eef3a116
  14. 22 7月, 2009 5 次提交
    • E
      inotify: use GFP_NOFS under potential memory pressure · f44aebcc
      Eric Paris 提交于
      inotify can have a watchs removed under filesystem reclaim.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.31-rc2 #16
      ---------------------------------
      inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
      {IN-RECLAIM_FS-W} state was registered at:
        [<c10536ab>] __lock_acquire+0x2c9/0xac4
        [<c1053f45>] lock_acquire+0x9f/0xc2
        [<c1308872>] __mutex_lock_common+0x2d/0x323
        [<c1308c00>] mutex_lock_nested+0x2e/0x36
        [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
        [<c108bfb6>] shrink_slab+0xe2/0x13c
        [<c108c3e1>] kswapd+0x3d1/0x55d
        [<c10449b5>] kthread+0x66/0x6b
        [<c1003fdf>] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
      
      Two things are needed to fix this.  First we need a method to tell
      fsnotify_create_event() to use GFP_NOFS and second we need to stop using
      one global IN_IGNORED event and allocate them one at a time.  This solves
      current issues with multiple IN_IGNORED on a queue having tail drop
      problems and simplifies the allocations since we don't have to worry about
      two tasks opperating on the IGNORED event concurrently.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      f44aebcc
    • E
      fsnotify: use def_bool in kconfig instead of letting the user choose · 520dc2a5
      Eric Paris 提交于
      fsnotify doens't give the user anything.  If someone chooses inotify or
      dnotify it should build fsnotify, if they don't select one it shouldn't be
      built.  This patch changes fsnotify to be a def_bool=n and makes everything
      else select it.  Also fixes the issue people complained about on lwn where
      gdm hung because they didn't have inotify and they didn't get the inotify
      build option.....
      Signed-off-by: NEric Paris <eparis@redhat.com>
      520dc2a5
    • E
      inotify: fix error paths in inotify_update_watch · 7e790dd5
      Eric Paris 提交于
      inotify_update_watch could leave things in a horrid state on a number of
      error paths.  We could try to remove idr entries that didn't exist, we
      could send an IN_IGNORED to userspace for watches that don't exist, and a
      bit of other stupidity.  Clean these up by doing the idr addition before we
      put the mark on the inode since we can clean that up on error and getting
      off the inode's mark list is hard.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      7e790dd5
    • E
      inotify: do not leak inode marks in inotify_add_watch · 75fe2b26
      Eric Paris 提交于
      inotify_add_watch had a couple of problems.  The biggest being that if
      inotify_add_watch was called on the same inode twice (to update or change the
      event mask) a refence was taken on the original inode mark by
      fsnotify_find_mark_entry but was not being dropped at the end of the
      inotify_add_watch call.  Thus if inotify_rm_watch was called although the mark
      was removed from the inode, the refcnt wouldn't hit zero and we would leak
      memory.
      Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      75fe2b26
    • E
      inotify: drop user watch count when a watch is removed · 5549f7cd
      Eric Paris 提交于
      The inotify rewrite forgot to drop the inotify watch use cound when a watch
      was removed.  This means that a single inotify fd can only ever register a
      maximum of /proc/sys/fs/max_user_watches even if some of those had been
      freed.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      5549f7cd
  15. 02 7月, 2009 1 次提交
  16. 20 6月, 2009 1 次提交
    • E
      inotify: inotify_destroy_mark_entry could get called twice · 528da3e9
      Eric Paris 提交于
      inotify_destroy_mark_entry could get called twice for the same mark since it
      is called directly in inotify_rm_watch and when the mark is being destroyed for
      another reason.  As an example assume that the file being watched was just
      deleted so inotify_destroy_mark_entry would get called from the path
      fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() ->
      fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry().  If this
      happened at the same time as userspace tried to remove a watch via
      inotify_rm_watch we could attempt to remove the mark from the idr twice and
      could thus double dec the ref cnt and potentially could be in a use after
      free/double free situation.  The fix is to have inotify_rm_watch use the
      generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the
      inotify_destroy_mark_entry() function can only be called one.
      
      This patch also renames the function to inotify_ingored_remove_idr() so it is
      clear what is actually going on in the function.
      
      Hopefully this fixes:
      [   20.342058] idr_remove called for id=20 which is not allocated.
      [   20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077
      [   20.353933] Call Trace:
      [   20.356410]  [<ffffffff811a82b7>] idr_remove+0x115/0x18f
      [   20.361737]  [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75
      [   20.367061]  [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf
      [   20.373771]  [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf
      [   20.380306]  [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10
      [   20.386238]  [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170
      [   20.393293]  [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf
      [   20.399829]  [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6
      [   20.405850]  [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Tested-by: NPeter Ziljlstra <peterz@infradead.org>
      528da3e9
  17. 12 6月, 2009 3 次提交
    • E
      inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD · e42e2773
      Eric Paris 提交于
      inotify and dnotify will both indicate that they want any event which came
      from a child inode.  The fix is to mask off FS_EVENT_ON_CHILD when deciding
      if inotify or dnotify is interested in a given event.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      e42e2773
    • E
      inotify: reimplement inotify using fsnotify · 63c882a0
      Eric Paris 提交于
      Reimplement inotify_user using fsnotify.  This should be feature for feature
      exactly the same as the original inotify_user.  This does not make any changes
      to the in kernel inotify feature used by audit.  Those patches (and the eventual
      removal of in kernel inotify) will come after the new inotify_user proves to be
      working correctly.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      63c882a0
    • E
      fsnotify: unified filesystem notification backend · 90586523
      Eric Paris 提交于
      fsnotify is a backend for filesystem notification.  fsnotify does
      not provide any userspace interface but does provide the basis
      needed for other notification schemes such as dnotify.  fsnotify
      can be extended to be the backend for inotify or the upcoming
      fanotify.  fsnotify provides a mechanism for "groups" to register for
      some set of filesystem events and to then deliver those events to
      those groups for processing.
      
      fsnotify has a number of benefits, the first being actually shrinking the size
      of an inode.  Before fsnotify to support both dnotify and inotify an inode had
      
              unsigned long           i_dnotify_mask; /* Directory notify events */
              struct dnotify_struct   *i_dnotify; /* for directory notifications */
              struct list_head        inotify_watches; /* watches on this inode */
              struct mutex            inotify_mutex;  /* protects the watches list
      
      But with fsnotify this same functionallity (and more) is done with just
      
              __u32                   i_fsnotify_mask; /* all events for this inode */
              struct hlist_head       i_fsnotify_mark_entries; /* marks on this inode */
      
      That's right, inotify, dnotify, and fanotify all in 64 bits.  We used that
      much space just in inotify_watches alone, before this patch set.
      
      fsnotify object lifetime and locking is MUCH better than what we have today.
      inotify locking is incredibly complex.  See 8f7b0ba1 as an example of
      what's been busted since inception.  inotify needs to know internal semantics
      of superblock destruction and unmounting to function.  The inode pinning and
      vfs contortions are horrible.
      
      no fsnotify implementers do allocation under locks.  This means things like
      f04b30de which (due to an overabundance of caution) changes GFP_KERNEL to
      GFP_NOFS can be reverted.  There are no longer any allocation rules when using
      or implementing your own fsnotify listener.
      
      fsnotify paves the way for fanotify.  In brief fanotify is a notification
      mechanism that delivers the lisener both an 'event' and an open file descriptor
      to the object in question.  This means that fanotify is pathname agnostic.
      Some on lkml may not care for the original companies or users that pushed for
      TALPA, but fanotify was designed with flexibility and input for other users in
      mind.  The readahead group expressed interest in fanotify as it could be used
      to profile disk access on boot without breaking the audit system.  The desktop
      search groups have also expressed interest in fanotify as it solves a number
      of the race conditions and problems present with managing inotify when more
      than a limited number of specific files are of interest.  fanotify can provide
      for a userspace access control system which makes it a clean interface for AV
      vendors to hook without trying to do binary patching on the syscall table,
      LSM, and everywhere else they do their things today.  With this patch series
      fanotify can be implemented in less than 1200 lines of easy to review code.
      Almost all of which is the socket based user interface.
      
      This patch series builds fsnotify to the point that it can implement
      dnotify and inotify_user.  Patches exist and will be sent soon after
      acceptance to finish the in kernel inotify conversion (audit) and implement
      fanotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      90586523
  18. 07 5月, 2009 1 次提交
    • W
      inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive · 381a80e6
      Wu Fengguang 提交于
      There is what we believe to be a false positive reported by lockdep.
      
      inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
      kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
      dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
      
      The plan is to fix this via lockdep annotation, but that is proving to be
      quite involved.
      
      The patch flips the allocation over to GFP_NFS to shut the warning up, for
      the 2.6.30 release.
      
      Hopefully we will fix this for real in 2.6.31.  I'll queue a patch in -mm
      to switch it back to GFP_KERNEL so we don't forget.
      
        =================================
        [ INFO: inconsistent lock state ]
        2.6.30-rc2-next-20090417 #203
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
        {RECLAIM_FS-ON-W} state was registered at:
          [<ffffffff81079188>] mark_held_locks+0x68/0x90
          [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
          [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
          [<ffffffff81130652>] kernel_event+0xe2/0x190
          [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
          [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
          [<ffffffff8110444d>] vfs_create+0xcd/0x140
          [<ffffffff8110825d>] do_filp_open+0x88d/0xa20
          [<ffffffff810f6b68>] do_sys_open+0x98/0x140
          [<ffffffff810f6c50>] sys_open+0x20/0x30
          [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
        irq event stamp: 690455
        hardirqs last  enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
        hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
        softirqs last  enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
        softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
      
        other info that might help us debug this:
        2 locks held by kswapd0/380:
         #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
         #1:  (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
      
        stack backtrace:
        Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
        Call Trace:
         [<ffffffff810789ef>] print_usage_bug+0x19f/0x200
         [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
         [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
         [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
         [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
         [<ffffffff810f478c>] ? slob_free+0x10c/0x370
         [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81012fe9>] ? sched_clock+0x9/0x10
         [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
         [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
         [<ffffffff8110cb23>] d_kill+0x33/0x60
         [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
         [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
         [<ffffffff810d0cc5>] shrink_slab+0x125/0x180
         [<ffffffff810d1540>] kswapd+0x560/0x7a0
         [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
         [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
         [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
         [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
         [<ffffffff8106555b>] kthread+0x5b/0xa0
         [<ffffffff8100d40a>] child_rip+0xa/0x20
         [<ffffffff8100cdd0>] ? restore_args+0x0/0x30
         [<ffffffff81065500>] ? kthread+0x0/0xa0
         [<ffffffff8100d400>] ? child_rip+0x0/0x20
      
      [eparis@redhat.com: fix audit too]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      381a80e6
  19. 28 3月, 2009 1 次提交
    • N
      fs: avoid I_NEW inodes · aabb8fdb
      Nick Piggin 提交于
      To be on the safe side, it should be less fragile to exclude I_NEW inodes
      from inode list scans by default (unless there is an important reason to
      have them).
      
      Normally they will get excluded (eg.  by zero refcount or writecount etc),
      however it is a bit fragile for list walkers to know exactly what parts of
      the inode state is set up and valid to test when in I_NEW.  So along these
      lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
      checks with them too -- this shouldn't be a problem should it?)
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aabb8fdb
  20. 19 2月, 2009 1 次提交
    • I
      inotify: fix GFP_KERNEL related deadlock · f04b30de
      Ingo Molnar 提交于
      Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
      assert:
      
      [ 1093.677775]
      [ 1093.677781] =================================
      [ 1093.680031] [ INFO: inconsistent lock state ]
      [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] ---------------------------------
      [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1093.680031]  (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
      [ 1093.680031]   [<c01696b9>] mark_held_locks+0x43/0x5b
      [ 1093.680031]   [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e
      [ 1093.680031]   [<c01cf8b0>] kmem_cache_alloc+0x20/0x150
      [ 1093.680031]   [<c040d0ec>] idr_pre_get+0x27/0x6c
      [ 1093.680031]   [<c02056e3>] inotify_handle_get_wd+0x25/0xad
      [ 1093.680031]   [<c0205f43>] inotify_add_watch+0x7a/0x129
      [ 1093.680031]   [<c020679e>] sys_inotify_add_watch+0x20f/0x250
      [ 1093.680031]   [<c010389e>] sysenter_do_call+0x12/0x35
      [ 1093.680031]   [<ffffffff>] 0xffffffff
      [ 1093.680031] irq event stamp: 60417
      [ 1093.680031] hardirqs last  enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59
      [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59
      [ 1093.680031] softirqs last  enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b
      [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d
      [ 1093.680031]
      [ 1093.680031] other info that might help us debug this:
      [ 1093.680031] 2 locks held by kswapd0/308:
      [ 1093.680031]  #0:  (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189
      [ 1093.680031]  #1:  (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb
      [ 1093.680031]
      [ 1093.680031] stack backtrace:
      [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] Call Trace:
      [ 1093.680031]  [<c016947a>] valid_state+0x12a/0x13d
      [ 1093.680031]  [<c016954e>] mark_lock+0xc1/0x1e9
      [ 1093.680031]  [<c016a5b4>] ? check_usage_forwards+0x0/0x3f
      [ 1093.680031]  [<c016ab74>] __lock_acquire+0x2c6/0xac8
      [ 1093.680031]  [<c01688d9>] ? register_lock_class+0x17/0x228
      [ 1093.680031]  [<c016b3d3>] lock_acquire+0x5d/0x7a
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08824c4>] __mutex_lock_common+0x3a/0x4cb
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08829ed>] mutex_lock_nested+0x2e/0x36
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c01e6672>] dentry_iput+0x90/0xc2
      [ 1093.680031]  [<c01e67a3>] d_kill+0x21/0x45
      [ 1093.680031]  [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355
      [ 1093.680031]  [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb
      [ 1093.680031]  [<c01b05ed>] shrink_slab+0x121/0x189
      [ 1093.680031]  [<c01b0d12>] kswapd+0x39f/0x561
      [ 1093.680031]  [<c01ae499>] ? isolate_pages_global+0x0/0x233
      [ 1093.680031]  [<c0157eae>] ? autoremove_wake_function+0x0/0x43
      [ 1093.680031]  [<c01b0973>] ? kswapd+0x0/0x561
      [ 1093.680031]  [<c0157daf>] kthread+0x41/0x82
      [ 1093.680031]  [<c0157d6e>] ? kthread+0x0/0x82
      [ 1093.680031]  [<c01043ab>] kernel_thread_helper+0x7/0x10
      
      inotify_handle_get_wd() does idr_pre_get() which does a
      kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
      extreme MM pressure.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f04b30de
  21. 27 1月, 2009 1 次提交
    • V
      inotify: clean up inotify_read and fix locking problems · 3632dee2
      Vegard Nossum 提交于
      If userspace supplies an invalid pointer to a read() of an inotify
      instance, the inotify device's event list mutex is unlocked twice.
      This causes an unbalance which effectively leaves the data structure
      unprotected, and we can trigger oopses by accessing the inotify
      instance from different tasks concurrently.
      
      The best fix (contributed largely by Linus) is a total rewrite
      of the function in question:
      
      On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote:
      > The thing to notice is that:
      >
      >  - locking is done in just one place, and there is no question about it
      >   not having an unlock.
      >
      >  - that whole double-while(1)-loop thing is gone.
      >
      >  - use multiple functions to make nesting and error handling sane
      >
      >  - do error testing after doing the things you always need to do, ie do
      >   this:
      >
      >        mutex_lock(..)
      >        ret = function_call();
      >        mutex_unlock(..)
      >
      >        .. test ret here ..
      >
      >   instead of doing conditional exits with unlocking or freeing.
      >
      > So if the code is written in this way, it may still be buggy, but at least
      > it's not buggy because of subtle "forgot to unlock" or "forgot to free"
      > issues.
      >
      > This _always_ unlocks if it locked, and it always frees if it got a
      > non-error kevent.
      
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Robert Love <rlove@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3632dee2
  22. 14 1月, 2009 2 次提交
  23. 06 1月, 2009 1 次提交
    • M
      inotify: fix type errors in interfaces · 4ae8978c
      Michael Kerrisk 提交于
      The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.
      
      For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
      currently 'u32', it should be '__s32' .  That is Robert's suggestion, and
      is consistent with the other declarations of watch descriptors in the
      kernel source, in particular, the inotify_event structure in
      include/linux/inotify.h:
      
      struct inotify_event {
              __s32           wd;             /* watch descriptor */
              __u32           mask;           /* watch mask */
              __u32           cookie;         /* cookie to synchronize two events */
              __u32           len;            /* length (including nulls) of name */
              char            name[0];        /* stub for possible name */
      };
      
      The patch makes the changes needed for inotify_rm_watch().
      Signed-off-by: NMichael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Robert Love <rlove@google.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ae8978c
  24. 01 1月, 2009 1 次提交