1. 10 4月, 2017 5 次提交
    • J
      fsnotify: Remove indirection from mark list addition · 755b5bc6
      Jan Kara 提交于
      Adding notification mark to object list has been currently done through
      fsnotify_add_{inode|vfsmount}_mark() helpers from
      fsnotify_add_mark_locked() which call fsnotify_add_mark_list(). Remove
      this unnecessary indirection to simplify the code.
      
      Pushing all the locking to fsnotify_add_mark_list() also allows us to
      allocate the connector structure with GFP_KERNEL mode.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      755b5bc6
    • J
      fsnotify: Make fsnotify_mark_connector hold inode reference · e911d8af
      Jan Kara 提交于
      Currently inode reference is held by fsnotify marks. Change the rules so
      that inode reference is held by fsnotify_mark_connector structure
      whenever the list is non-empty. This simplifies the code and is more
      logical.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e911d8af
    • J
      fsnotify: Move object pointer to fsnotify_mark_connector · 86ffe245
      Jan Kara 提交于
      Move pointer to inode / vfsmount from mark itself to the
      fsnotify_mark_connector structure. This is another step on the path
      towards decoupling inode / vfsmount lifetime from notification mark
      lifetime.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      86ffe245
    • J
      fsnotify: Move mark list head from object into dedicated structure · 9dd813c1
      Jan Kara 提交于
      Currently notification marks are attached to object (inode or vfsmnt) by
      a hlist_head in the object. The list is also protected by a spinlock in
      the object. So while there is any mark attached to the list of marks,
      the object must be pinned in memory (and thus e.g. last iput() deleting
      inode cannot happen). Also for list iteration in fsnotify() to work, we
      must hold fsnotify_mark_srcu lock so that mark itself and
      mark->obj_list.next cannot get freed. Thus we are required to wait for
      response to fanotify events from userspace process with
      fsnotify_mark_srcu lock held. That causes issues when userspace process
      is buggy and does not reply to some event - basically the whole
      notification subsystem gets eventually stuck.
      
      So to be able to drop fsnotify_mark_srcu lock while waiting for
      response, we have to pin the mark in memory and make sure it stays in
      the object list (as removing the mark waiting for response could lead to
      lost notification events for groups later in the list). However we don't
      want inode reclaim to block on such mark as that would lead to system
      just locking up elsewhere.
      
      This commit is the first in the series that paves way towards solving
      these conflicting lifetime needs. Instead of anchoring the list of marks
      directly in the object, we anchor it in a dedicated structure
      (fsnotify_mark_connector) and just point to that structure from the
      object. The following commits will also add spinlock protecting the list
      and object pointer to the structure.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      9dd813c1
    • J
      fsnotify: Update comments · c1f33073
      Jan Kara 提交于
      Add a comment that lifetime of a notification mark is protected by SRCU
      and remove a comment about clearing of marks attached to the inode. It
      is stale and more uptodate version is at fsnotify_destroy_marks() which
      is the function handling this case.
      Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c1f33073
  2. 03 4月, 2017 3 次提交
  3. 02 3月, 2017 2 次提交
  4. 09 2月, 2017 1 次提交
  5. 24 1月, 2017 1 次提交
    • N
      inotify: Convert to using per-namespace limits · 1cce1eea
      Nikolay Borisov 提交于
      This patchset converts inotify to using the newly introduced
      per-userns sysctl infrastructure.
      
      Currently the inotify instances/watches are being accounted in the
      user_struct structure. This means that in setups where multiple
      users in unprivileged containers map to the same underlying
      real user (i.e. pointing to the same user_struct) the inotify limits
      are going to be shared as well, allowing one user(or application) to exhaust
      all others limits.
      
      Fix this by switching the inotify sysctls to using the
      per-namespace/per-user limits. This will allow the server admin to
      set sensible global limits, which can further be tuned inside every
      individual user namespace. Additionally, in order to preserve the
      sysctl ABI make the existing inotify instances/watches sysctls
      modify the values of the initial user namespace.
      Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      1cce1eea
  6. 24 12月, 2016 1 次提交
    • J
      fsnotify: Remove fsnotify_duplicate_mark() · e3ba7307
      Jan Kara 提交于
      There are only two calls sites of fsnotify_duplicate_mark(). Those are
      in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
      for audit tree, inode pointer and group gets set in
      fsnotify_add_mark_locked() later anyway, mask and free_mark are already
      set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
      actively harmful because following fsnotify_add_mark_locked() will leak
      group reference by overwriting the group pointer. So just remove the two
      calls to fsnotify_duplicate_mark() and the function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      [PM: line wrapping to fit in 80 chars]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      e3ba7307
  7. 13 12月, 2016 1 次提交
    • J
      fsnotify: Fix possible use-after-free in inode iteration on umount · 5716863e
      Jan Kara 提交于
      fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
      sb->s_inodes list when iterating over all inodes. Furthermore the code has a
      bug that if the current inode is the last on i_sb_list that does not have e.g.
      I_FREEING set, then we leave next_i pointing to inode which may get removed
      from the i_sb_list once we drop s_inode_list_lock thus resulting in
      use-after-free issues (usually manifesting as infinite looping in
      fsnotify_unmount_inodes()).
      
      Fix the problem by keeping current inode pinned somewhat longer. Then we can
      make the code much simpler and standard.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      5716863e
  8. 06 12月, 2016 3 次提交
  9. 08 10月, 2016 5 次提交
  10. 20 9月, 2016 2 次提交
  11. 20 5月, 2016 1 次提交
    • J
      fsnotify: avoid spurious EMFILE errors from inotify_init() · 35e48176
      Jan Kara 提交于
      Inotify instance is destroyed when all references to it are dropped.
      That not only means that the corresponding file descriptor needs to be
      closed but also that all corresponding instance marks are freed (as each
      mark holds a reference to the inotify instance).  However marks are
      freed only after SRCU period ends which can take some time and thus if
      user rapidly creates and frees inotify instances, number of existing
      inotify instances can exceed max_user_instances limit although from user
      point of view there is always at most one existing instance.  Thus
      inotify_init() returns EMFILE error which is hard to justify from user
      point of view.  This problem is exposed by LTP inotify06 testcase on
      some machines.
      
      We fix the problem by making sure all group marks are properly freed
      while destroying inotify instance.  We wait for SRCU period to end in
      that path anyway since we have to make sure there is no event being
      added to the instance while we are tearing down the instance.  So it
      takes only some plumbing to allow for marks to be destroyed in that path
      as well and not from a dedicated work item.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Tested-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35e48176
  12. 19 2月, 2016 2 次提交
  13. 15 1月, 2016 2 次提交
  14. 06 11月, 2015 2 次提交
    • D
      inotify: actually check for invalid bits in sys_inotify_add_watch() · d30e2c05
      Dave Hansen 提交于
      The comment here says that it is checking for invalid bits.  But, the mask
      is *actually* checking to ensure that _any_ valid bit is set, which is
      quite different.
      
      Without this check, an unexpected bit could get set on an inotify object.
      Since these bits are also interpreted by the fsnotify/dnotify code, there
      is the potential for an object to be mishandled inside the kernel.  For
      instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an
      inotify watch is harmless?
      
      Add the actual check which was intended.  Retain the existing inotify bits
      are being added to the watch.  Plus, this is existing behavior which would
      be nice to preserve.
      
      I did a quick sniff test that inotify functions and that my
      'inotify-tools' package passes 'make check'.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d30e2c05
    • D
      inotify: hide internal kernel bits from fdinfo · 69335996
      Dave Hansen 提交于
      There was a report that my patch:
      
          inotify: actually check for invalid bits in sys_inotify_add_watch()
      
      broke CRIU.
      
      The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to
      figure out how to rebuild inotify watches and then passes those flags
      directly back in to the inotify API.  One of those flags
      (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify
      API.  It is used inside the kernel to _implement_ inotify but it is not
      and has never been part of the API.
      
      My patch above ensured that we only allow bits which are part of the API
      (IN_ALL_EVENTS).  This broke CRIU.
      
      FS_EVENT_ON_CHILD is really internal to the kernel.  It is set _anyway_ on
      all inotify marks.  So, CRIU was really just trying to set a bit that was
      already set.
      
      This patch hides that bit from fdinfo.  CRIU will not see the bit, not try
      to set it, and should work as before.  We should not have been exposing
      this bit in the first place, so this is a good patch independent of the
      CRIU problem.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Acked-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69335996
  15. 05 9月, 2015 4 次提交
  16. 18 8月, 2015 1 次提交
  17. 07 8月, 2015 1 次提交
  18. 22 7月, 2015 1 次提交
    • L
      Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()" · d725e66c
      Linus Torvalds 提交于
      This reverts commit a2673b6e.
      
      Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
      
       "Thanks for report! You are right that my patch introduces a race
        between fsnotify kthread and fsnotify_destroy_group() which can result
        in leaking inotify event on group destruction.
      
        I haven't yet decided whether the right fix is not to queue events for
        dying notification group (as that is pointless anyway) or whether we
        should just fix the original problem differently...  Whenever I look
        at fsnotify code mark handling I get lost in the maze of locks, lists,
        and subtle differences between how different notification systems
        handle notification marks :( I'll think about it over night"
      
      and after thinking about it, Jan says:
      
       "OK, I have looked into the code some more and I found another
        relatively simple way of fixing the original oops.  It will be IMHO
        better than trying to fixup this issue which has more potential for
        breakage.  I'll ask Linus to revert the fsnotify fix he already merged
        and send a new fix"
      Reported-by: NKinglong Mee <kinglongmee@gmail.com>
      Requested-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d725e66c
  19. 18 7月, 2015 1 次提交
  20. 17 6月, 2015 1 次提交
    • P
      fs/notify: don't use module_init for non-modular inotify_user code · c013d5a4
      Paul Gortmaker 提交于
      The INOTIFY_USER option is bool, and hence this code is either
      present or absent.  It will never be modular, so using
      module_init as an alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of fs_initcall (which
      makes sense for fs code) will thus change this registration
      from level 6-device to level 5-fs (i.e. slightly earlier).
      However no observable impact of that small difference has
      been observed during testing, or is expected.
      
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c013d5a4