1. 08 10月, 2016 2 次提交
  2. 20 9月, 2016 2 次提交
  3. 20 5月, 2016 1 次提交
    • J
      fsnotify: avoid spurious EMFILE errors from inotify_init() · 35e48176
      Jan Kara 提交于
      Inotify instance is destroyed when all references to it are dropped.
      That not only means that the corresponding file descriptor needs to be
      closed but also that all corresponding instance marks are freed (as each
      mark holds a reference to the inotify instance).  However marks are
      freed only after SRCU period ends which can take some time and thus if
      user rapidly creates and frees inotify instances, number of existing
      inotify instances can exceed max_user_instances limit although from user
      point of view there is always at most one existing instance.  Thus
      inotify_init() returns EMFILE error which is hard to justify from user
      point of view.  This problem is exposed by LTP inotify06 testcase on
      some machines.
      
      We fix the problem by making sure all group marks are properly freed
      while destroying inotify instance.  We wait for SRCU period to end in
      that path anyway since we have to make sure there is no event being
      added to the instance while we are tearing down the instance.  So it
      takes only some plumbing to allow for marks to be destroyed in that path
      as well and not from a dedicated work item.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Tested-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35e48176
  4. 19 2月, 2016 2 次提交
  5. 15 1月, 2016 2 次提交
  6. 06 11月, 2015 2 次提交
    • D
      inotify: actually check for invalid bits in sys_inotify_add_watch() · d30e2c05
      Dave Hansen 提交于
      The comment here says that it is checking for invalid bits.  But, the mask
      is *actually* checking to ensure that _any_ valid bit is set, which is
      quite different.
      
      Without this check, an unexpected bit could get set on an inotify object.
      Since these bits are also interpreted by the fsnotify/dnotify code, there
      is the potential for an object to be mishandled inside the kernel.  For
      instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an
      inotify watch is harmless?
      
      Add the actual check which was intended.  Retain the existing inotify bits
      are being added to the watch.  Plus, this is existing behavior which would
      be nice to preserve.
      
      I did a quick sniff test that inotify functions and that my
      'inotify-tools' package passes 'make check'.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d30e2c05
    • D
      inotify: hide internal kernel bits from fdinfo · 69335996
      Dave Hansen 提交于
      There was a report that my patch:
      
          inotify: actually check for invalid bits in sys_inotify_add_watch()
      
      broke CRIU.
      
      The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to
      figure out how to rebuild inotify watches and then passes those flags
      directly back in to the inotify API.  One of those flags
      (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify
      API.  It is used inside the kernel to _implement_ inotify but it is not
      and has never been part of the API.
      
      My patch above ensured that we only allow bits which are part of the API
      (IN_ALL_EVENTS).  This broke CRIU.
      
      FS_EVENT_ON_CHILD is really internal to the kernel.  It is set _anyway_ on
      all inotify marks.  So, CRIU was really just trying to set a bit that was
      already set.
      
      This patch hides that bit from fdinfo.  CRIU will not see the bit, not try
      to set it, and should work as before.  We should not have been exposing
      this bit in the first place, so this is a good patch independent of the
      CRIU problem.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Acked-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69335996
  7. 05 9月, 2015 4 次提交
  8. 18 8月, 2015 1 次提交
  9. 07 8月, 2015 1 次提交
  10. 22 7月, 2015 1 次提交
    • L
      Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()" · d725e66c
      Linus Torvalds 提交于
      This reverts commit a2673b6e.
      
      Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
      
       "Thanks for report! You are right that my patch introduces a race
        between fsnotify kthread and fsnotify_destroy_group() which can result
        in leaking inotify event on group destruction.
      
        I haven't yet decided whether the right fix is not to queue events for
        dying notification group (as that is pointless anyway) or whether we
        should just fix the original problem differently...  Whenever I look
        at fsnotify code mark handling I get lost in the maze of locks, lists,
        and subtle differences between how different notification systems
        handle notification marks :( I'll think about it over night"
      
      and after thinking about it, Jan says:
      
       "OK, I have looked into the code some more and I found another
        relatively simple way of fixing the original oops.  It will be IMHO
        better than trying to fixup this issue which has more potential for
        breakage.  I'll ask Linus to revert the fsnotify fix he already merged
        and send a new fix"
      Reported-by: NKinglong Mee <kinglongmee@gmail.com>
      Requested-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d725e66c
  11. 18 7月, 2015 1 次提交
  12. 17 6月, 2015 1 次提交
    • P
      fs/notify: don't use module_init for non-modular inotify_user code · c013d5a4
      Paul Gortmaker 提交于
      The INOTIFY_USER option is bool, and hence this code is either
      present or absent.  It will never be modular, so using
      module_init as an alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of fs_initcall (which
      makes sense for fs code) will thus change this registration
      from level 6-device to level 5-fs (i.e. slightly earlier).
      However no observable impact of that small difference has
      been observed during testing, or is expected.
      
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c013d5a4
  13. 13 3月, 2015 1 次提交
    • S
      fanotify: fix event filtering with FAN_ONDIR set · b3c1030d
      Suzuki K. Poulose 提交于
      With FAN_ONDIR set, the user can end up getting events, which it hasn't
      marked.  This was revealed with fanotify04 testcase failure on
      Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
      ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
      
         # /opt/ltp/testcases/bin/fanotify04
         [ ... ]
        fanotify04    7  TPASS  :  event generated properly for type 100000
        fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
        fanotify04    9  TPASS  :  No event as expected
      
      The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
      a fanotify on a dir.  Then does an open(), followed by close() of the
      directory and expects to see an event FAN_OPEN(0x20).  However, the
      fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
      the flaw in the check for event_mask in fanotify_should_send_event() which
      does:
      
      	if (event_mask & marks_mask & ~marks_ignored_mask)
      		return true;
      
      where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
             marks_mask == (FAN_ONDIR | FAN_OPEN),
             marks_ignored_mask == 0
      
      Fix this by masking the outgoing events to the user, as we already take
      care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
      Signed-off-by: NSuzuki K. Poulose <suzuki.poulose@arm.com>
      Tested-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3c1030d
  14. 23 2月, 2015 2 次提交
    • D
      fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4
      David Howells 提交于
      Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
      rather than d_is_dir() when checking a dir watch and give an error on fake
      directories.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      54f2a2f4
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  15. 11 2月, 2015 3 次提交
  16. 09 1月, 2015 1 次提交
  17. 07 1月, 2015 1 次提交
    • P
      rcu: Make SRCU optional by using CONFIG_SRCU · 83fe27ea
      Pranith Kumar 提交于
      SRCU is not necessary to be compiled by default in all cases. For tinification
      efforts not compiling SRCU unless necessary is desirable.
      
      The current patch tries to make compiling SRCU optional by introducing a new
      Kconfig option CONFIG_SRCU which is selected when any of the components making
      use of SRCU are selected.
      
      If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
      
         text    data     bss     dec     hex filename
         2007       0       0    2007     7d7 kernel/rcu/srcu.o
      
      Size of arch/powerpc/boot/zImage changes from
      
         text    data     bss     dec     hex filename
       831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
       829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after
      
      so the savings are about ~2000 bytes.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
      83fe27ea
  18. 14 12月, 2014 2 次提交
  19. 14 11月, 2014 1 次提交
  20. 06 11月, 2014 1 次提交
  21. 04 11月, 2014 1 次提交
  22. 30 10月, 2014 1 次提交
    • J
      fsnotify: next_i is freed during fsnotify_unmount_inodes. · 6424babf
      Jerry Hoemann 提交于
      During file system stress testing on 3.10 and 3.12 based kernels, the
      umount command occasionally hung in fsnotify_unmount_inodes in the
      section of code:
      
                      spin_lock(&inode->i_lock);
                      if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                              spin_unlock(&inode->i_lock);
                              continue;
                      }
      
      As this section of code holds the global inode_sb_list_lock, eventually
      the system hangs trying to acquire the lock.
      
      Multiple crash dumps showed:
      
      The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
      back at itself.  As this is not the value of list upon entry to the
      function, the kernel never exits the loop.
      
      To help narrow down problem, the call to list_del_init in
      inode_sb_list_del was changed to list_del.  This poisons the pointers in
      the i_sb_list and causes a kernel to panic if it transverse a freed
      inode.
      
      Subsequent stress testing paniced in fsnotify_unmount_inodes at the
      bottom of the list_for_each_entry_safe loop showing next_i had become
      free.
      
      We believe the root cause of the problem is that next_i is being freed
      during the window of time that the list_for_each_entry_safe loop
      temporarily releases inode_sb_list_lock to call fsnotify and
      fsnotify_inode_delete.
      
      The code in fsnotify_unmount_inodes attempts to prevent the freeing of
      inode and next_i by calling __iget.  However, the code doesn't do the
      __iget call on next_i
      
      	if i_count == 0 or
      	if i_state & (I_FREEING | I_WILL_FREE)
      
      The patch addresses this issue by advancing next_i in the above two cases
      until we either find a next_i which we can __iget or we reach the end of
      the list.  This makes the handling of next_i more closely match the
      handling of the variable "inode."
      
      The time to reproduce the hang is highly variable (from hours to days.) We
      ran the stress test on a 3.10 kernel with the proposed patch for a week
      without failure.
      
      During list_for_each_entry_safe, next_i is becoming free causing
      the loop to never terminate.  Advance next_i in those cases where
      __iget is not done.
      Signed-off-by: NJerry Hoemann <jerry.hoemann@hp.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ken Helias <kenhelias@firemail.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6424babf
  23. 28 10月, 2014 1 次提交
    • P
      sched, inotify: Deal with nested sleeps · e23738a7
      Peter Zijlstra 提交于
      inotify_read is a wait loop with sleeps in. Wait loops rely on
      task_struct::state and sleeps do too, since that's the only means of
      actually sleeping. Therefore the nested sleeps destroy the wait loop
      state and the wait loop breaks the sleep functions that assume
      TASK_RUNNING (mutex_lock).
      
      Fix this by using the new woken_wake_function and wait_woken() stuff,
      which registers wakeups in wait and thereby allows shrinking the
      task_state::state changes to the actual sleep part.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20140924082242.254858080@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e23738a7
  24. 10 10月, 2014 3 次提交
  25. 11 9月, 2014 2 次提交
新手
引导
客服 返回
顶部