提交 · 7053aee26a3548ebaba046ae2e52396ccf56ac6c · openeuler / raspberrypi-kernel

22 1月, 2014 1 次提交

fsnotify: do not share events between notification groups · 7053aee2

由 Jan Kara 提交于 1月 21, 2014

Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups. This is done so that we save memory when several notification
groups are interested in the event. However the need for event
structure shared between inotify & fanotify bloats the event structure
so the result is often higher memory consumption.

Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors
with its events. This has the undesirable effect that filesystem cannot
be unmounted while there are outstanding events - a regression for
inotify compared to a situation before it was converted to fsnotify
framework. For fanotify this problem is hard to avoid and users of
fanotify should kind of expect this behavior when they ask for file
descriptors from notified files.

This patch changes fsnotify and its users to create separate event
structure for each group. This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures. For example
on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data. After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups
interested in the events (both of which are uncommon). Fanotify event
fits in 56 bytes after the conversion (fanotify doesn't care about file
names so its events don't have to have it allocated). A win unless
there are four or more fanotify groups interested in the event.

The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.

[hughd@google.com: fanotify: fix corruption preventing startup]
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7053aee2

12 12月, 2012 5 次提交

fsnotify: make fasync generic for both inotify and fanotify · 0a6b6bd5

由 Eric Paris 提交于 10月 14, 2011

inotify is supposed to support async signal notification when information
is available on the inotify fd. This patch moves that support to generic
fsnotify functions so it can be used by all notification mechanisms.
Signed-off-by: NEric Paris <eparis@redhat.com>

0a6b6bd5

fsnotify: use a mutex instead of a spinlock to protect a groups mark list · 986ab098

由 Lino Sanfilippo 提交于 6月 14, 2011

Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
of a spinlock results in more flexibility (i.e it allows to sleep while the
lock is held).
Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

986ab098

fsnotify: use reference counting for groups · 23e964c2

由 Lino Sanfilippo 提交于 6月 14, 2011

Get a group ref for each mark that is added to the groups list and release that
ref when the mark is freed in fsnotify_put_mark().
We also use get a group reference for duplicated marks and for private event
data.
Now we dont free a group any more when the number of marks becomes 0 but when
the groups ref count does. Since this will only happen when all marks are removed
from a groups mark list, we dont have to set the groups number of marks to 1 at
group creation.

Beside clearing all marks in fsnotify_destroy_group() we do also flush the
groups event queue. This is since events may hold references to groups (due to
private event data) and we have to put those references first before we get a
chance to put the final ref, which will result in a call to
fsnotify_final_destroy_group().
Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

23e964c2

fsnotify: introduce fsnotify_get_group() · 98612952

由 Lino Sanfilippo 提交于 6月 14, 2011

Introduce fsnotify_get_group() which increments the reference counter of a group.
Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

98612952

inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() · d8153d4d

由 Lino Sanfilippo 提交于 6月 14, 2011

Currently in fsnotify_put_group() the ref count of a group is decremented and if
it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
at group creation set to 1 and never increased after that a call to fsnotify_put_group()
always results in a call to fsnotify_destroy_group().
With this patch fsnotify_destroy_group() is called directly.
Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

d8153d4d

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

28 7月, 2010 17 次提交

fsnotify: remove global fsnotify groups lists · 02436668

由 Eric Paris 提交于 7月 28, 2010

The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.
Signed-off-by: NEric Paris <eparis@redhat.com>

02436668

fsnotify: remove group->mask · 43709a28

由 Eric Paris 提交于 7月 28, 2010

group->mask is now useless. It was originally a shortcut for fsnotify to
save on performance. These checks are now redundant, so we remove them.
Signed-off-by: NEric Paris <eparis@redhat.com>

43709a28

fsnotify: remove the global masks · 03930979

由 Eric Paris 提交于 7月 28, 2010

Because we walk the object->fsnotify_marks list instead of the global
fsnotify groups list we don't need the fsnotify_inode_mask and
fsnotify_vfsmount_mask as these were simply shortcuts in fsnotify() for
performance.  They are now extra checks, rip them out.
Signed-off-by: NEric Paris <eparis@redhat.com>

03930979

fsnotify: srcu to protect read side of inode and vfsmount locks · 75c1be48

由 Eric Paris 提交于 7月 28, 2010

Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side.  This patch protects the read side of those lists
with a new single srcu.
Signed-off-by: NEric Paris <eparis@redhat.com>

75c1be48

fanotify: drop the useless priority argument · 08ae8938

由 Eric Paris 提交于 5月 27, 2010

The priority argument in fanotify is useless.  Kill it.
Signed-off-by: NEric Paris <eparis@redhat.com>

08ae8938

fsnotify: add group priorities · cb2d429f

由 Eric Paris 提交于 12月 17, 2009

This introduces an ordering to fsnotify groups. With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important. But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.

eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.) Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.
Signed-off-by: NEric Paris <eparis@redhat.com>

cb2d429f

fsnotify: rename mark_entry to just mark · 841bdc10

由 Eric Paris 提交于 12月 17, 2009

previously I used mark_entry when talking about marks on inodes.  The
_entry is pretty useless.  Just use "mark" instead.
Signed-off-by: NEric Paris <eparis@redhat.com>

841bdc10

fsnotify: rename fsnotify_mark_entry to just fsnotify_mark · e61ce867

由 Eric Paris 提交于 12月 17, 2009

The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.
Signed-off-by: NEric Paris <eparis@redhat.com>

e61ce867

fsnotify: mount point listeners list and global mask · 7131485a

由 Eric Paris 提交于 12月 17, 2009

currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.

The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: NEric Paris <eparis@redhat.com>

7131485a

fsnotify: add groups to fsnotify_inode_groups when registering inode watch · 4ca76352

由 Eric Paris 提交于 12月 17, 2009

Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation. This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event. This patch adds groups to the global list on
when the first inode mark is added to the group.
Signed-of-by: NEric Paris <eparis@redhat.com>

4ca76352

fsnotify: initialize the group->num_marks in a better place · 36fddeba

由 Eric Paris 提交于 12月 17, 2009

Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list.  This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0)  This patch moves the initialization stuff and makes it clear when
it is really being held.
Signed-off-by: NEric Paris <eparis@redhat.com>

36fddeba

fsnotify: rename fsnotify_groups to fsnotify_inode_groups · 19c2a0e1

由 Eric Paris 提交于 12月 17, 2009

Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: NEric Paris <eparis@redhat.com>

19c2a0e1

fsnotify: drop mask argument from fsnotify_alloc_group · 0d2e2a1d

由 Eric Paris 提交于 12月 17, 2009

Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.
Signed-off-by: NEric Paris <eparis@redhat.com>

0d2e2a1d

fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group · ffab8340

由 Eric Paris 提交于 12月 17, 2009

fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: NEric Paris <eparis@redhat.com>

ffab8340

fsnotify: fsnotify_obtain_group kzalloc cleanup · cd7752ce

由 Eric Paris 提交于 12月 17, 2009

fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0.
This patch just deletes those useless lines.
Signed-off-by: NEric Paris <eparis@redhat.com>

cd7752ce

fsnotify: remove group_num altogether · 74be0cc8

由 Eric Paris 提交于 12月 17, 2009

The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added.  I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: NEric Paris <eparis@redhat.com>

74be0cc8

fsnotify: kzalloc fsnotify groups · f0553af0

由 Eric Paris 提交于 12月 17, 2009

Use kzalloc for fsnotify_groups so that none of the fields can leak any
information accidentally.
Signed-off-by: NEric Paris <eparis@redhat.com>

f0553af0

12 6月, 2009 3 次提交

fsnotify: generic notification queue and waitq · a2d8bc6c

由 Eric Paris 提交于 5月 21, 2009

inotify needs to do asyc notification in which event information is stored
on a queue until the listener is ready to receive it.  This patch
implements a generic notification queue for inotify (and later fanotify) to
store events to be sent at a later time.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

a2d8bc6c

fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49

由 Eric Paris 提交于 5月 21, 2009

This patch creates a way for fsnotify groups to attach marks to inodes.
These marks have little meaning to the generic fsnotify infrastructure
and thus their meaning should be interpreted by the group that attached
them to the inode's list.

dnotify and inotify  will make use of these markings to indicate which
inodes are of interest to their respective groups.  But this implementation
has the useful property that in the future other listeners could actually
use the marks for the exact opposite reason, aka to indicate which inodes
it had NO interest in.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

3be25f49

fsnotify: unified filesystem notification backend · 90586523

由 Eric Paris 提交于 5月 21, 2009

fsnotify is a backend for filesystem notification. fsnotify does
not provide any userspace interface but does provide the basis
needed for other notification schemes such as dnotify. fsnotify
can be extended to be the backend for inotify or the upcoming
fanotify. fsnotify provides a mechanism for "groups" to register for
some set of filesystem events and to then deliver those events to
those groups for processing.

fsnotify has a number of benefits, the first being actually shrinking the size
of an inode. Before fsnotify to support both dnotify and inotify an inode had

unsigned long i_dnotify_mask; /* Directory notify events */
struct dnotify_struct *i_dnotify; /* for directory notifications */
struct list_head inotify_watches; /* watches on this inode */
struct mutex inotify_mutex; /* protects the watches list

But with fsnotify this same functionallity (and more) is done with just

__u32 i_fsnotify_mask; /* all events for this inode */
struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */

That's right, inotify, dnotify, and fanotify all in 64 bits. We used that
much space just in inotify_watches alone, before this patch set.

fsnotify object lifetime and locking is MUCH better than what we have today.
inotify locking is incredibly complex. See 8f7b0ba1 as an example of
what's been busted since inception. inotify needs to know internal semantics
of superblock destruction and unmounting to function. The inode pinning and
vfs contortions are horrible.

no fsnotify implementers do allocation under locks. This means things like
f04b30de which (due to an overabundance of caution) changes GFP_KERNEL to
GFP_NOFS can be reverted. There are no longer any allocation rules when using
or implementing your own fsnotify listener.

fsnotify paves the way for fanotify. In brief fanotify is a notification
mechanism that delivers the lisener both an 'event' and an open file descriptor
to the object in question. This means that fanotify is pathname agnostic.
Some on lkml may not care for the original companies or users that pushed for
TALPA, but fanotify was designed with flexibility and input for other users in
mind. The readahead group expressed interest in fanotify as it could be used
to profile disk access on boot without breaking the audit system. The desktop
search groups have also expressed interest in fanotify as it solves a number
of the race conditions and problems present with managing inotify when more
than a limited number of specific files are of interest. fanotify can provide
for a userspace access control system which makes it a clean interface for AV
vendors to hook without trying to do binary patching on the syscall table,
LSM, and everywhere else they do their things today. With this patch series
fanotify can be implemented in less than 1200 lines of easy to review code.
Almost all of which is the socket based user interface.

This patch series builds fsnotify to the point that it can implement
dnotify and inotify_user. Patches exist and will be sent soon after
acceptance to finish the in kernel inotify conversion (audit) and implement
fanotify.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

90586523