1. 10 4月, 2017 7 次提交
  2. 24 1月, 2017 1 次提交
    • N
      inotify: Convert to using per-namespace limits · 1cce1eea
      Nikolay Borisov 提交于
      This patchset converts inotify to using the newly introduced
      per-userns sysctl infrastructure.
      
      Currently the inotify instances/watches are being accounted in the
      user_struct structure. This means that in setups where multiple
      users in unprivileged containers map to the same underlying
      real user (i.e. pointing to the same user_struct) the inotify limits
      are going to be shared as well, allowing one user(or application) to exhaust
      all others limits.
      
      Fix this by switching the inotify sysctls to using the
      per-namespace/per-user limits. This will allow the server admin to
      set sensible global limits, which can further be tuned inside every
      individual user namespace. Additionally, in order to preserve the
      sysctl ABI make the existing inotify instances/watches sysctls
      modify the values of the initial user namespace.
      Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      1cce1eea
  3. 24 12月, 2016 1 次提交
    • J
      fsnotify: Remove fsnotify_duplicate_mark() · e3ba7307
      Jan Kara 提交于
      There are only two calls sites of fsnotify_duplicate_mark(). Those are
      in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
      for audit tree, inode pointer and group gets set in
      fsnotify_add_mark_locked() later anyway, mask and free_mark are already
      set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
      actively harmful because following fsnotify_add_mark_locked() will leak
      group reference by overwriting the group pointer. So just remove the two
      calls to fsnotify_duplicate_mark() and the function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      [PM: line wrapping to fit in 80 chars]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      e3ba7307
  4. 06 12月, 2016 3 次提交
  5. 08 10月, 2016 2 次提交
  6. 20 9月, 2016 2 次提交
  7. 30 5月, 2016 2 次提交
    • A
      trim fsnotify hooks a bit · affda484
      Al Viro 提交于
      fsnotify_d_move()/__fsnotify_d_instantiate()/__fsnotify_update_dcache_flags()
      are identical to each other, regardless of the config.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      affda484
    • A
      undo "fs: allow d_instantiate to be called with negative parent dentry" · 2853908a
      Al Viro 提交于
      Background: spufs used to mangle the order in which it had been building
      dentry trees.  It was broken in a lot of ways, but most of them required
      the right timing to trigger until an fsnotify change had added one more
      - the one that was always triggered.
      
      Unfortunately, insteading of fixing their long-standing bug the spufs
      folks had chosen to paper over the fsnotify trigger.  Eventually said
      bug had been spotted and killed off, but the pointless check in
      fsnotify has remained, complete with the implication that one *could*
      do that kind of crap.
      
      Again, a parent of any dentry should always be positive.  Any code
      can rely upon that and anything violating that assert is a bug,
      *not* something to be accomodated.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2853908a
  8. 20 5月, 2016 1 次提交
    • J
      fsnotify: avoid spurious EMFILE errors from inotify_init() · 35e48176
      Jan Kara 提交于
      Inotify instance is destroyed when all references to it are dropped.
      That not only means that the corresponding file descriptor needs to be
      closed but also that all corresponding instance marks are freed (as each
      mark holds a reference to the inotify instance).  However marks are
      freed only after SRCU period ends which can take some time and thus if
      user rapidly creates and frees inotify instances, number of existing
      inotify instances can exceed max_user_instances limit although from user
      point of view there is always at most one existing instance.  Thus
      inotify_init() returns EMFILE error which is hard to justify from user
      point of view.  This problem is exposed by LTP inotify06 testcase on
      some machines.
      
      We fix the problem by making sure all group marks are properly freed
      while destroying inotify instance.  We wait for SRCU period to end in
      that path anyway since we have to make sure there is no event being
      added to the instance while we are tearing down the instance.  So it
      takes only some plumbing to allow for marks to be destroyed in that path
      as well and not from a dedicated work item.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Tested-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35e48176
  9. 14 3月, 2016 1 次提交
  10. 19 2月, 2016 1 次提交
    • J
      Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" · 13d34ac6
      Jeff Layton 提交于
      This reverts commit c510eff6 ("fsnotify: destroy marks with
      call_srcu instead of dedicated thread").
      
      Eryu reported that he was seeing some OOM kills kick in when running a
      testcase that adds and removes inotify marks on a file in a tight loop.
      
      The above commit changed the code to use call_srcu to clean up the
      marks.  While that does (in principle) work, the srcu callback job is
      limited to cleaning up entries in small batches and only once per jiffy.
      It's easily possible to overwhelm that machinery with too many call_srcu
      callbacks, and Eryu's reproduer did just that.
      
      There's also another potential problem with using call_srcu here.  While
      you can obviously sleep while holding the srcu_read_lock, the callbacks
      run under local_bh_disable, so you can't sleep there.
      
      It's possible when putting the last reference to the fsnotify_mark that
      we'll end up putting a chain of references including the fsnotify_group,
      uid, and associated keys.  While I don't see any obvious ways that that
      could occurs, it's probably still best to avoid using call_srcu here
      after all.
      
      This patch reverts the above patch.  A later patch will take a different
      approach to eliminated the dedicated thread here.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Reported-by: NEryu Guan <guaneryu@gmail.com>
      Tested-by: NEryu Guan <guaneryu@gmail.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13d34ac6
  11. 15 1月, 2016 1 次提交
    • J
      fsnotify: destroy marks with call_srcu instead of dedicated thread · c510eff6
      Jeff Layton 提交于
      At the time that this code was originally written, call_srcu didn't
      exist, so this thread was required to ensure that we waited for that
      SRCU grace period to settle before finally freeing the object.
      
      It does exist now however and we can much more efficiently use call_srcu
      to handle this.  That also allows us to potentially use srcu_barrier to
      ensure that they are all of the callbacks have run before proceeding.
      In order to conserve space, we union the rcu_head with the g_list.
      
      This will be necessary for nfsd which will allocate marks from a
      dedicated slabcache.  We have to be able to ensure that all of the
      objects are destroyed before destroying the cache.  That's fairly
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c510eff6
  12. 05 9月, 2015 3 次提交
  13. 18 8月, 2015 1 次提交
  14. 25 6月, 2015 1 次提交
  15. 14 12月, 2014 2 次提交
  16. 07 8月, 2014 2 次提交
  17. 04 4月, 2014 1 次提交
  18. 25 2月, 2014 1 次提交
    • J
      fsnotify: Allocate overflow events with proper type · ff57cd58
      Jan Kara 提交于
      Commit 7053aee2 "fsnotify: do not share events between notification
      groups" used overflow event statically allocated in a group with the
      size of the generic notification event. This causes problems because
      some code looks at type specific parts of event structure and gets
      confused by a random data it sees there and causes crashes.
      
      Fix the problem by allocating overflow event with type corresponding to
      the group type so code cannot get confused.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ff57cd58
  19. 18 2月, 2014 1 次提交
    • J
      inotify: Fix reporting of cookies for inotify events · 45a22f4c
      Jan Kara 提交于
      My rework of handling of notification events (namely commit 7053aee2
      "fsnotify: do not share events between notification groups") broke
      sending of cookies with inotify events. We didn't propagate the value
      passed to fsnotify() properly and passed 4 uninitialized bytes to
      userspace instead (so it is also an information leak). Sadly I didn't
      notice this during my testing because inotify cookies aren't used very
      much and LTP inotify tests ignore them.
      
      Fix the problem by passing the cookie value properly.
      
      Fixes: 7053aee2Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      45a22f4c
  20. 29 1月, 2014 1 次提交
  21. 22 1月, 2014 2 次提交
    • J
      fsnotify: remove .should_send_event callback · 83c4c4b0
      Jan Kara 提交于
      After removing event structure creation from the generic layer there is
      no reason for separate .should_send_event and .handle_event callbacks.
      So just remove the first one.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c4c4b0
    • J
      fsnotify: do not share events between notification groups · 7053aee2
      Jan Kara 提交于
      Currently fsnotify framework creates one event structure for each
      notification event and links this event into all interested notification
      groups.  This is done so that we save memory when several notification
      groups are interested in the event.  However the need for event
      structure shared between inotify & fanotify bloats the event structure
      so the result is often higher memory consumption.
      
      Another problem is that fsnotify framework keeps path references with
      outstanding events so that fanotify can return open file descriptors
      with its events.  This has the undesirable effect that filesystem cannot
      be unmounted while there are outstanding events - a regression for
      inotify compared to a situation before it was converted to fsnotify
      framework.  For fanotify this problem is hard to avoid and users of
      fanotify should kind of expect this behavior when they ask for file
      descriptors from notified files.
      
      This patch changes fsnotify and its users to create separate event
      structure for each group.  This allows for much simpler code (~400 lines
      removed by this patch) and also smaller event structures.  For example
      on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
      additional space for file name, additional 24 bytes for second and each
      subsequent group linking the event, and additional 32 bytes for each
      inotify group for private data.  After the conversion inotify event
      consumes 48 bytes plus space for file name which is considerably less
      memory unless file names are long and there are several groups
      interested in the events (both of which are uncommon).  Fanotify event
      fits in 56 bytes after the conversion (fanotify doesn't care about file
      names so its events don't have to have it allocated).  A win unless
      there are four or more fanotify groups interested in the event.
      
      The conversion also solves the problem with unmount when only inotify is
      used as we don't have to grab path references for inotify events.
      
      [hughd@google.com: fanotify: fix corruption preventing startup]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7053aee2
  22. 30 4月, 2013 1 次提交
  23. 12 12月, 2012 2 次提交
    • E
      fsnotify: make fasync generic for both inotify and fanotify · 0a6b6bd5
      Eric Paris 提交于
      inotify is supposed to support async signal notification when information
      is available on the inotify fd.  This patch moves that support to generic
      fsnotify functions so it can be used by all notification mechanisms.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      0a6b6bd5
    • L
      fsnotify: change locking order · 6960b0d9
      Lino Sanfilippo 提交于
      On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
      >
      > I finally built and tested a v3.0 kernel with these patches (I know I'm
      > SOOOOOO far behind).  Not what I hoped for:
      >
      > > [  150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds.  Have a nice day...
      > > [  150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
      > > [  150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
      > > [  150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      > > [  150.946012] CPU 0
      > > [  150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
      > > [  150.946012]
      > > [  150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
      > > [  150.946012] RIP: 0010:[<ffffffff810ffd58>]  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012] RSP: 0018:ffff88002c2e5df8  EFLAGS: 00010282
      > > [  150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
      > > [  150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
      > > [  150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
      > > [  150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
      > > [  150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
      > > [  150.946012] FS:  00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
      > > [  150.946012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      > > [  150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
      > > [  150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > > [  150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > > [  150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
      > > [  150.946012] Stack:
      > > [  150.946012]  ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
      > > [  150.946012]  ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
      > > [  150.946012]  ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
      > > [  150.946012] Call Trace:
      > > [  150.946012]  [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130
      > > [  150.946012]  [<ffffffff8115e9be>] evict+0x7e/0x170
      > > [  150.946012]  [<ffffffff8115ee40>] iput_final+0xd0/0x190
      > > [  150.946012]  [<ffffffff8115ef33>] iput+0x33/0x40
      > > [  150.946012]  [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160
      > > [  150.946012]  [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50
      > > [  150.946012]  [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0
      > > [  150.946012]  [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b
      > > [  150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
      > > [  150.946012]  83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
      > > [  150.946012] RIP  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
      > > [  150.946012]  RSP <ffff88002c2e5df8>
      > > [  150.946012] CR2: 0000000000000070
      >
      > Looks at aweful lot like the problem from:
      > http://www.spinics.net/lists/linux-fsdevel/msg46101.html
      >
      
      I tried to reproduce this bug with your test program, but without success.
      However, if I understand correctly, this occurs since we dont hold any locks when
      we call iput() in mark_destroy(), right?
      With the patches you tested, iput() is also not called within any lock, since the
      groups mark_mutex is released temporarily before iput() is called.  This is, since
      the original codes behaviour is similar.
      However since we now have a mutex as the biggest lock, we can do what you
      suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
      call iput() with the mutex held to avoid the race.
      The patch below implements this. It uses nested locking to avoid deadlock in case
      we do the final iput() on an inode which still holds marks and thus would take
      the mutex again when calling fsnotify_inode_delete() in destroy_inode().
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      6960b0d9