1. 10 4月, 2017 5 次提交
    • J
      fsnotify: Remove indirection from mark list addition · 755b5bc6
      Jan Kara 提交于
      Adding notification mark to object list has been currently done through
      fsnotify_add_{inode|vfsmount}_mark() helpers from
      fsnotify_add_mark_locked() which call fsnotify_add_mark_list(). Remove
      this unnecessary indirection to simplify the code.
      
      Pushing all the locking to fsnotify_add_mark_list() also allows us to
      allocate the connector structure with GFP_KERNEL mode.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      755b5bc6
    • J
      fsnotify: Make fsnotify_mark_connector hold inode reference · e911d8af
      Jan Kara 提交于
      Currently inode reference is held by fsnotify marks. Change the rules so
      that inode reference is held by fsnotify_mark_connector structure
      whenever the list is non-empty. This simplifies the code and is more
      logical.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e911d8af
    • J
      fsnotify: Move object pointer to fsnotify_mark_connector · 86ffe245
      Jan Kara 提交于
      Move pointer to inode / vfsmount from mark itself to the
      fsnotify_mark_connector structure. This is another step on the path
      towards decoupling inode / vfsmount lifetime from notification mark
      lifetime.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      86ffe245
    • J
      fsnotify: Move mark list head from object into dedicated structure · 9dd813c1
      Jan Kara 提交于
      Currently notification marks are attached to object (inode or vfsmnt) by
      a hlist_head in the object. The list is also protected by a spinlock in
      the object. So while there is any mark attached to the list of marks,
      the object must be pinned in memory (and thus e.g. last iput() deleting
      inode cannot happen). Also for list iteration in fsnotify() to work, we
      must hold fsnotify_mark_srcu lock so that mark itself and
      mark->obj_list.next cannot get freed. Thus we are required to wait for
      response to fanotify events from userspace process with
      fsnotify_mark_srcu lock held. That causes issues when userspace process
      is buggy and does not reply to some event - basically the whole
      notification subsystem gets eventually stuck.
      
      So to be able to drop fsnotify_mark_srcu lock while waiting for
      response, we have to pin the mark in memory and make sure it stays in
      the object list (as removing the mark waiting for response could lead to
      lost notification events for groups later in the list). However we don't
      want inode reclaim to block on such mark as that would lead to system
      just locking up elsewhere.
      
      This commit is the first in the series that paves way towards solving
      these conflicting lifetime needs. Instead of anchoring the list of marks
      directly in the object, we anchor it in a dedicated structure
      (fsnotify_mark_connector) and just point to that structure from the
      object. The following commits will also add spinlock protecting the list
      and object pointer to the structure.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      9dd813c1
    • J
      fsnotify: Update comments · c1f33073
      Jan Kara 提交于
      Add a comment that lifetime of a notification mark is protected by SRCU
      and remove a comment about clearing of marks attached to the inode. It
      is stale and more uptodate version is at fsnotify_destroy_marks() which
      is the function handling this case.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c1f33073
  2. 24 12月, 2016 1 次提交
    • J
      fsnotify: Remove fsnotify_duplicate_mark() · e3ba7307
      Jan Kara 提交于
      There are only two calls sites of fsnotify_duplicate_mark(). Those are
      in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
      for audit tree, inode pointer and group gets set in
      fsnotify_add_mark_locked() later anyway, mask and free_mark are already
      set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
      actively harmful because following fsnotify_add_mark_locked() will leak
      group reference by overwriting the group pointer. So just remove the two
      calls to fsnotify_duplicate_mark() and the function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      [PM: line wrapping to fit in 80 chars]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      e3ba7307
  3. 20 5月, 2016 1 次提交
    • J
      fsnotify: avoid spurious EMFILE errors from inotify_init() · 35e48176
      Jan Kara 提交于
      Inotify instance is destroyed when all references to it are dropped.
      That not only means that the corresponding file descriptor needs to be
      closed but also that all corresponding instance marks are freed (as each
      mark holds a reference to the inotify instance).  However marks are
      freed only after SRCU period ends which can take some time and thus if
      user rapidly creates and frees inotify instances, number of existing
      inotify instances can exceed max_user_instances limit although from user
      point of view there is always at most one existing instance.  Thus
      inotify_init() returns EMFILE error which is hard to justify from user
      point of view.  This problem is exposed by LTP inotify06 testcase on
      some machines.
      
      We fix the problem by making sure all group marks are properly freed
      while destroying inotify instance.  We wait for SRCU period to end in
      that path anyway since we have to make sure there is no event being
      added to the instance while we are tearing down the instance.  So it
      takes only some plumbing to allow for marks to be destroyed in that path
      as well and not from a dedicated work item.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Tested-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35e48176
  4. 19 2月, 2016 2 次提交
  5. 15 1月, 2016 1 次提交
    • J
      fsnotify: destroy marks with call_srcu instead of dedicated thread · c510eff6
      Jeff Layton 提交于
      At the time that this code was originally written, call_srcu didn't
      exist, so this thread was required to ensure that we waited for that
      SRCU grace period to settle before finally freeing the object.
      
      It does exist now however and we can much more efficiently use call_srcu
      to handle this.  That also allows us to potentially use srcu_barrier to
      ensure that they are all of the callbacks have run before proceeding.
      In order to conserve space, we union the rcu_head with the g_list.
      
      This will be necessary for nfsd which will allocate marks from a
      dedicated slabcache.  We have to be able to ensure that all of the
      objects are destroyed before destroying the cache.  That's fairly
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c510eff6
  6. 05 9月, 2015 2 次提交
  7. 07 8月, 2015 1 次提交
  8. 22 7月, 2015 1 次提交
    • L
      Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()" · d725e66c
      Linus Torvalds 提交于
      This reverts commit a2673b6e.
      
      Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
      
       "Thanks for report! You are right that my patch introduces a race
        between fsnotify kthread and fsnotify_destroy_group() which can result
        in leaking inotify event on group destruction.
      
        I haven't yet decided whether the right fix is not to queue events for
        dying notification group (as that is pointless anyway) or whether we
        should just fix the original problem differently...  Whenever I look
        at fsnotify code mark handling I get lost in the maze of locks, lists,
        and subtle differences between how different notification systems
        handle notification marks :( I'll think about it over night"
      
      and after thinking about it, Jan says:
      
       "OK, I have looked into the code some more and I found another
        relatively simple way of fixing the original oops.  It will be IMHO
        better than trying to fixup this issue which has more potential for
        breakage.  I'll ask Linus to revert the fsnotify fix he already merged
        and send a new fix"
      Reported-by: NKinglong Mee <kinglongmee@gmail.com>
      Requested-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d725e66c
  9. 18 7月, 2015 1 次提交
  10. 14 12月, 2014 2 次提交
  11. 14 11月, 2014 1 次提交
  12. 05 6月, 2014 1 次提交
  13. 10 7月, 2013 1 次提交
  14. 12 12月, 2012 7 次提交
    • L
      fsnotify: change locking order · 6960b0d9
      Lino Sanfilippo 提交于
      On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
      >
      > I finally built and tested a v3.0 kernel with these patches (I know I'm
      > SOOOOOO far behind).  Not what I hoped for:
      >
      > > [  150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds.  Have a nice day...
      > > [  150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
      > > [  150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
      > > [  150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      > > [  150.946012] CPU 0
      > > [  150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
      > > [  150.946012]
      > > [  150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
      > > [  150.946012] RIP: 0010:[<ffffffff810ffd58>]  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012] RSP: 0018:ffff88002c2e5df8  EFLAGS: 00010282
      > > [  150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
      > > [  150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
      > > [  150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
      > > [  150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
      > > [  150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
      > > [  150.946012] FS:  00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
      > > [  150.946012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      > > [  150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
      > > [  150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > > [  150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > > [  150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
      > > [  150.946012] Stack:
      > > [  150.946012]  ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
      > > [  150.946012]  ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
      > > [  150.946012]  ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
      > > [  150.946012] Call Trace:
      > > [  150.946012]  [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130
      > > [  150.946012]  [<ffffffff8115e9be>] evict+0x7e/0x170
      > > [  150.946012]  [<ffffffff8115ee40>] iput_final+0xd0/0x190
      > > [  150.946012]  [<ffffffff8115ef33>] iput+0x33/0x40
      > > [  150.946012]  [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160
      > > [  150.946012]  [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50
      > > [  150.946012]  [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0
      > > [  150.946012]  [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b
      > > [  150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
      > > [  150.946012]  83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
      > > [  150.946012] RIP  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012]  RSP <ffff88002c2e5df8>
      > > [  150.946012] CR2: 0000000000000070
      >
      > Looks at aweful lot like the problem from:
      > http://www.spinics.net/lists/linux-fsdevel/msg46101.html
      >
      
      I tried to reproduce this bug with your test program, but without success.
      However, if I understand correctly, this occurs since we dont hold any locks when
      we call iput() in mark_destroy(), right?
      With the patches you tested, iput() is also not called within any lock, since the
      groups mark_mutex is released temporarily before iput() is called.  This is, since
      the original codes behaviour is similar.
      However since we now have a mutex as the biggest lock, we can do what you
      suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
      call iput() with the mutex held to avoid the race.
      The patch below implements this. It uses nested locking to avoid deadlock in case
      we do the final iput() on an inode which still holds marks and thus would take
      the mutex again when calling fsnotify_inode_delete() in destroy_inode().
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      6960b0d9
    • L
      fsnotify: dont put marks on temporary list when clearing marks by group · 64c20d2a
      Lino Sanfilippo 提交于
      In clear_marks_by_group_flags() the mark list of a group is iterated and the
      marks are put on a temporary list.
      Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
      any more and are able to remove the marks while the mark list is iterated and
      the mark list mutex is held.
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      64c20d2a
    • L
      fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() · d5a335b8
      Lino Sanfilippo 提交于
      This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
      which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
      assume that the caller has already taken the groups mark mutex.
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      d5a335b8
    • L
      fsnotify: pass group to fsnotify_destroy_mark() · e2a29943
      Lino Sanfilippo 提交于
      In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
      but pass the group itself as an additional parameter to the function.
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      e2a29943
    • L
      fsnotify: use a mutex instead of a spinlock to protect a groups mark list · 986ab098
      Lino Sanfilippo 提交于
      Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
      of a spinlock results in more flexibility (i.e it allows to sleep while the
      lock is held).
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      986ab098
    • L
      fsnotify: take groups mark_lock before mark lock · 104d06f0
      Lino Sanfilippo 提交于
      Race-free addition and removal of a mark to a groups mark list would be easier
      if we could lock the mark list of group before we lock the specific mark.
      This patch changes the order used to add/remove marks to/from mark lists from
      
      1. mark->lock
      2. group->mark_lock
      3. inode->i_lock
      
      to
      
      1. group->mark_lock
      2. mark->lock
      3. inode->i_lock
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      104d06f0
    • L
      fsnotify: use reference counting for groups · 23e964c2
      Lino Sanfilippo 提交于
      Get a group ref for each mark that is added to the groups list and release that
      ref when the mark is freed in fsnotify_put_mark().
      We also use get a group reference for duplicated marks and for private event
      data.
      Now we dont free a group any more when the number of marks becomes 0 but when
      the groups ref count does. Since this will only happen when all marks are removed
      from a groups mark list, we dont have to set the groups number of marks to 1 at
      group creation.
      
      Beside clearing all marks in fsnotify_destroy_group() we do also flush the
      groups event queue. This is since events may hold references to groups (due to
      private event data) and we have to put those references first before we get a
      chance to put the final ref, which will result in a call to
      fsnotify_final_destroy_group().
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      23e964c2
  15. 15 1月, 2012 1 次提交
  16. 27 7月, 2011 1 次提交
  17. 31 3月, 2011 1 次提交
  18. 25 3月, 2011 1 次提交
  19. 28 7月, 2010 9 次提交
    • E
      fsnotify: remove global fsnotify groups lists · 02436668
      Eric Paris 提交于
      The global fsnotify groups lists were invented as a way to increase the
      performance of fsnotify by shortcutting events which were not interesting.
      With the changes to walk the object lists rather than global groups lists
      these shortcuts are not useful.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      02436668
    • A
      fsnotify: Exchange list heads instead of moving elements · 8778abb9
      Andreas Gruenbacher 提交于
      Instead of moving list elements from destroy_list to &private_destroy_list,
      exchange the list heads.
      Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      8778abb9
    • E
      fsnotify: srcu to protect read side of inode and vfsmount locks · 75c1be48
      Eric Paris 提交于
      Currently reading the inode->i_fsnotify_marks or
      vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
      read and the write side.  This patch protects the read side of those lists
      with a new single srcu.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      75c1be48
    • E
      fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called · 700307a2
      Eric Paris 提交于
      Currently fsnotify check is mark->group is NULL to decide if
      fsnotify_destroy_mark() has already been called or not.  With the upcoming
      rcu work it is a heck of a lot easier to use an explicit flag than worry
      about group being set to NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      700307a2
    • E
      fsnotify: call iput on inodes when no longer marked · b31d397e
      Eric Paris 提交于
      fsnotify takes an igrab on an inode when it adds a mark.  The code was
      supposed to drop the reference when the mark was removed but didn't.
      This caused problems when an fs was unmounted because those inodes would
      clearly not be gone.  Thus resulting in the most devistating of messages:
      
      VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
      >>> Have a nice day...
      
      Jiri Slaby bisected the problem to a patch in the fsnotify tree.  The
      code snippets below show my stupidity quite clearly.
      
      void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
      {
      	...
      	mark->inode = NULL;
      	...
      }
      
      void fsnotify_destroy_mark(struct fsnotify_mark *mark)
      {
      	struct inode *inode = NULL;
      	...
      	if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
      		fsnotify_destroy_inode_mark(mark);
      		inode = mark->i.inode;
      	}
      	...
      	if (inode)
      		iput(inode);
      	...
      }
      
      Obviously the intent was to capture the inode before it was set to NULL in
      fsnotify_destory_inode_mark() so we wouldn't be leaking inodes forever.
      Instead we leaked them (and exploded on umount)
      Reported-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b31d397e
    • E
      fanotify: clear all fanotify marks · 4d92604c
      Eric Paris 提交于
      fanotify listeners may want to clear all marks.  They may want to do this
      to destroy all of their inode marks which have nothing but ignores.
      Realistically this is useful for av vendors who update policy and want to
      clear all of their cached allows.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      4d92604c
    • E
      fsnotify: ignored_mask - excluding notification · 33af5e32
      Eric Paris 提交于
      The ignored_mask is a new mask which is part of fsnotify marks.  A group's
      should_send_event() function can use the ignored mask to determine that
      certain events are not of interest.  In particular if a group registers a
      mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
      ignored_mask for individual inodes and not send open events for those
      inodes.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      33af5e32
    • E
      fsnotify: allow marks to not pin inodes in core · 90b1e7a5
      Eric Paris 提交于
      inotify marks must pin inodes in core.  dnotify doesn't technically need to
      since they are closed when the directory is closed.  fanotify also need to
      pin inodes in core as it works today.  But the next step is to introduce
      the concept of 'ignored masks' which is actually a mask of events for an
      inode of no interest.  I claim that these should be liberally sent to the
      kernel and should not pin the inode in core.  If the inode is brought back
      in the listener will get an event it may have thought excluded, but this is
      not a serious situation and one any listener should deal with.
      
      This patch lays the ground work for non-pinning inode marks by using lazy
      inode pinning.  We do not pin a mark until it has a non-zero mask entry.  If a
      listener new sets a mask we never pin the inode.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      90b1e7a5
    • E
      fsnotify: vfsmount marks generic functions · 0d48b7f0
      Eric Paris 提交于
      Much like inode-mark.c has all of the code dealing with marks on inodes
      this patch adds a vfsmount-mark.c which has similar code but is intended
      for marks on vfsmounts.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      0d48b7f0