提交 · ac7e22dcfafd04c842a02057afd6541c1d613ef9 · openeuler / raspberrypi-kernel

29 10月, 2010 6 次提交

fanotify: allow userspace to override max marks · ac7e22dc

由 Eric Paris 提交于 10月 28, 2010

Some fanotify groups, especially those like AV scanners, will need to place
lots of marks, particularly ignore marks. Since ignore marks do not pin
inodes in cache and are cleared if the inode is removed from core (usually
under memory pressure) we expose an interface for listeners, with
CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to
set and 'unlimited' number of marks. Programs which make use of this
feature will be able to OOM a machine.
Signed-off-by: NEric Paris <eparis@redhat.com>

ac7e22dc

fanotify: limit the number of marks in a single fanotify group · e7099d8a

由 Eric Paris 提交于 10月 28, 2010

There is currently no limit on the number of marks a given fanotify group
can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as
a serious DoS threat. This patch implements a default of 8192, the same as
inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating
the default DoS'able status.
Signed-off-by: NEric Paris <eparis@redhat.com>

e7099d8a

fanotify: allow userspace to override max queue depth · 5dd03f55

由 Eric Paris 提交于 10月 28, 2010

fanotify has a defualt max queue depth. This patch allows processes which
explicitly request it to have an 'unlimited' queue depth. These processes
need to be very careful to make sure they cannot fall far enough behind
that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN.
Signed-off-by: NEric Paris <eparis@redhat.com>

5dd03f55

fsnotify: implement a default maximum queue depth · 2529a0df

由 Eric Paris 提交于 10月 28, 2010

Currently fanotify has no maximum queue depth.  Since fanotify is
CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it
certianly is possible that an fanotify listener which can't keep up could
OOM the box.  This patch implements a default 16k depth.  This is the same
default depth used by inotify, but given fanotify's better queue merging in
many situations this queue will contain many additional useful events by
comparison.
Signed-off-by: NEric Paris <eparis@redhat.com>

2529a0df

fanotify: ignore fanotify ignore marks if open writers · 5322a59f

由 Eric Paris 提交于 10月 28, 2010

fanotify will clear ignore marks if a task changes the contents of an
inode.  The problem is with the races around when userspace finishes
checking a file and when that result is actually attached to the inode.
This race was described as such:

Consider the following scenario with hostile processes A and B, and
victim process C:
1. Process A opens new file for writing. File check request is generated.
2. File check is performed in userspace. Check result is "file has no malware".
3. The "permit" response is delivered to kernel space.
4. File ignored mark set.
5. Process A writes dummy bytes to the file. File ignored flags are cleared.
6. Process B opens the same file for reading. File check request is generated.
7. File check is performed in userspace. Check result is "file has no malware".
8. Process A writes malware bytes to the file. There is no cached response yet.
9. The "permit" response is delivered to kernel space and is cached in fanotify.
10. File ignored mark set.
11. Now any process C will be permitted to open the malware file.
There is a race between steps 8 and 10

While fanotify makes no strong guarantees about systems with hostile
processes there is no reason we cannot harden against this race.  We do
that by simply ignoring any ignore marks if the inode has open writers (aka
i_writecount > 0).  (We actually do not ignore ignore marks if the
FAN_MARK_SURV_MODIFY flag is set)
Reported-by: NVasily Novikov <vasily.novikov@kaspersky.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

5322a59f

fanotify: implement fanotify listener ordering · 4231a235

由 Eric Paris 提交于 10月 28, 2010

The fanotify listeners needs to be able to specify what types of operations
they are going to perform so they can be ordered appropriately between other
listeners doing other types of operations.  They need this to be able to make
sure that things like hierarchichal storage managers will get access to inodes
before processes which need the data.  This patch defines 3 possible uses
which groups must indicate in the fanotify_init() flags.

FAN_CLASS_PRE_CONTENT
FAN_CLASS_CONTENT
FAN_CLASS_NOTIF

Groups will receive notification in that order.  The order between 2 groups in
the same class is undeterministic.

FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to
the inode before they are certain that the inode contains it's final data.  A
hierarchical storage manager should choose to use this class.

FAN_CLASS_CONTENT is intended to be used by listeners which need access to the
inode after it contains its intended contents.  This would be the appropriate
level for an AV solution or document control system.

FAN_CLASS_NOTIF is intended for normal async notification about access, much the
same as inotify and dnotify.  Syncronous permissions events are not permitted
at this class.
Signed-off-by: NEric Paris <eparis@redhat.com>

4231a235

15 10月, 2010 1 次提交

llseek: automatically add .llseek fop · 6038f373

由 Arnd Bergmann 提交于 8月 15, 2010

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>

6038f373

28 8月, 2010 1 次提交

fanotify: Return EPERM when a process is not privileged · a2f13ad0

由 Andreas Gruenbacher 提交于 8月 24, 2010

The appropriate error code when privileged operations are denied is
EPERM, not EACCES.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <paris@paris.rdu.redhat.com>

a2f13ad0

23 8月, 2010 1 次提交

fanotify: flush outstanding perm requests on group destroy · 2eebf582

由 Eric Paris 提交于 8月 18, 2010

When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.
Reported-by: NAndreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

2eebf582

13 8月, 2010 1 次提交

Revert "fsnotify: store struct file not struct path" · 2069601b

由 Linus Torvalds 提交于 8月 12, 2010

This reverts commit 3bcf3860 (and the
accompanying commit c1e5c954 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2069601b

28 7月, 2010 30 次提交

fsnotify: remove group->mask · 43709a28

由 Eric Paris 提交于 7月 28, 2010

group->mask is now useless. It was originally a shortcut for fsnotify to
save on performance. These checks are now redundant, so we remove them.
Signed-off-by: NEric Paris <eparis@redhat.com>

43709a28

fsnotify: store struct file not struct path · 3bcf3860

由 Eric Paris 提交于 7月 28, 2010

Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file.  To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.
Signed-off-by: NEric Paris <eparis@redhat.com>

3bcf3860

fanotify: groups can specify their f_flags for new fd · 80af2588

由 Eric Paris 提交于 7月 28, 2010

Currently fanotify fds opened for thier listeners are done with f_flags
equal to O_RDONLY | O_LARGEFILE.  This patch instead takes f_flags from the
fanotify_init syscall and uses those when opening files in the context of
the listener.
Signed-off-by: NEric Paris <eparis@redhat.com>

80af2588

fsnotify: update gfp/slab.h includes · e4e047a2

由 Tejun Heo 提交于 5月 20, 2010

Implicit slab.h inclusion via percpu.h is about to go away.  Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

e4e047a2

fanotify: drop the useless priority argument · 08ae8938

由 Eric Paris 提交于 5月 27, 2010

The priority argument in fanotify is useless.  Kill it.
Signed-off-by: NEric Paris <eparis@redhat.com>

08ae8938

fanotify: do not return 0 in a void function · 8860f060

由 Eric Paris 提交于 12月 23, 2009

remove_access_response() is supposed to have a void return, but was
returning 0;
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NEric Paris <eparis@redhat.com>

8860f060

fanotify: userspace interface for permission responses · b2d87909

由 Eric Paris 提交于 12月 17, 2009

fanotify groups need to respond to events which include permissions types.
To do so groups will send a response using write() on the fanotify_fd they
have open.
Signed-off-by: NEric Paris <eparis@redhat.com>

b2d87909

fanotify: permissions and blocking · 9e66e423

由 Eric Paris 提交于 12月 17, 2009

This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events.  This is done using the
new fsnotify secondary queue.  No userspace interface is provided actually
respond to or request these events.
Signed-off-by: NEric Paris <eparis@redhat.com>

9e66e423

fsnotify: add group priorities · cb2d429f

由 Eric Paris 提交于 12月 17, 2009

This introduces an ordering to fsnotify groups. With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important. But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.

eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.) Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.
Signed-off-by: NEric Paris <eparis@redhat.com>

cb2d429f

fanotify: clear all fanotify marks · 4d92604c

由 Eric Paris 提交于 12月 17, 2009

fanotify listeners may want to clear all marks.  They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.
Signed-off-by: NEric Paris <eparis@redhat.com>

4d92604c

fanotify: allow ignored_masks to survive modify · c9778a98

由 Eric Paris 提交于 12月 17, 2009

Some users may want to truely ignore an inode even if it has been modified.
Say you are wanting a mount which contains a log file and you really don't
want any notification about that file.  This patch allows the listener to
do that.
Signed-off-by: NEric Paris <eparis@redhat.com>

c9778a98

fanotify: allow users to set an ignored_mask · b9e4e3bd

由 Eric Paris 提交于 12月 17, 2009

Change the sys_fanotify_mark() system call so users can set ignored_masks
on inodes. Remember, if a user new sets a real mask, and only sets ignored
masks, the ignore will never be pinned in memory. Thus ignored_masks can
be lost under memory pressure and the user may again get events they
previously thought were ignored.
Signed-off-by: NEric Paris <eparis@redhat.com>

b9e4e3bd

fsnotify: allow marks to not pin inodes in core · 90b1e7a5

由 Eric Paris 提交于 12月 17, 2009

inotify marks must pin inodes in core. dnotify doesn't technically need to
since they are closed when the directory is closed. fanotify also need to
pin inodes in core as it works today. But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest. I claim that these should be liberally sent to the
kernel and should not pin the inode in core. If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.

This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning. We do not pin a mark until it has a non-zero mask entry. If a
listener new sets a mask we never pin the inode.
Signed-off-by: NEric Paris <eparis@redhat.com>

90b1e7a5

fanotify: remove outgoing function checks in fanotify.h · 33d3dfff

由 Andreas Gruenbacher 提交于 12月 17, 2009

A number of validity checks on outgoing data are done in static inlines but
are only used in one place.  Instead just do them where they are used for
readability.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

33d3dfff

fanotify: remove fanotify.h declarations · 88380fe6

由 Andreas Gruenbacher 提交于 12月 17, 2009

fanotify_mark_validate functions are all needlessly declared in headers as
static inlines.  Instead just do the checks where they are needed for code
readability.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

88380fe6

fanotify: split fanotify_remove_mark · f3640192

由 Andreas Gruenbacher 提交于 12月 17, 2009

split fanotify_remove_mark into fanotify_remove_inode_mark and
fanotify_remove_vfsmount_mark.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

f3640192

fanotify: rename FAN_MARK_ON_VFSMOUNT to FAN_MARK_MOUNT · eac8e9e8

由 Andreas Gruenbacher 提交于 12月 17, 2009

the term 'vfsmount' isn't sensicle to userspace.  instead call is 'mount.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

eac8e9e8

fanotify: hooks the fanotify_mark syscall to the vfsmount code · 0ff21db9

由 Eric Paris 提交于 12月 17, 2009

Create a new fanotify_mark flag which indicates we should attach the mark
to the vfsmount holding the object referenced by dfd and pathname rather
than the inode itself.
Signed-off-by: NEric Paris <eparis@redhat.com>

0ff21db9

fanotify: remove fanotify_add_mark · 90dd201d

由 Andreas Gruenbacher 提交于 12月 17, 2009

fanotify_add_mark now does nothing useful anymore, drop it.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

90dd201d

fanotify: do not return pointer from fanotify_add_*_mark · 52202dfb

由 Andreas Gruenbacher 提交于 12月 17, 2009

No need to return the mark from fanotify_add_*_mark to fanotify_add_mark
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

52202dfb

fanotify: do not call fanotify_update_object_mask in fanotify_add_mark · 912ee394

由 Andreas Gruenbacher 提交于 12月 17, 2009

Recalculate masks in fanotify_add_mark, don't use
fanotify_update_object_mask.  This gets us one step closers to readable
code.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

912ee394

fanotify: do not call fanotify_update_object_mask in fanotify_remove_mark · 088b09b0

由 Andreas Gruenbacher 提交于 12月 17, 2009

Recalculate masks in fanotify_remove_mark, don't use
fanotify_update_object_mask.  This gets us one step closers to readable
code.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

088b09b0

fanotify: remove fanotify_update_mark · c6223f46

由 Andreas Gruenbacher 提交于 12月 17, 2009

fanotify_update_mark() doesn't do much useful;  remove it.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

c6223f46

fanotify: infrastructure to add an remove marks on vfsmounts · 88826276

由 Eric Paris 提交于 12月 17, 2009

infrastructure work to add and remove marks on vfsmounts.  This should get
every set up except wiring the functions to the syscalls.
Signed-off-by: NEric Paris <eparis@redhat.com>

88826276

fsnotify: split generic and inode specific mark code · 5444e298

由 Eric Paris 提交于 12月 17, 2009

currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: NEric Paris <eparis@redhat.com>

5444e298

fanotify: Add pids to events · 32c32632

由 Andreas Gruenbacher 提交于 12月 17, 2009

Pass the process identifiers of the triggering processes to fanotify
listeners: this information is useful for event filtering and logging.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

32c32632

fanotify: create_fd cleanup · 22aa425d

由 Andreas Gruenbacher 提交于 12月 17, 2009

Code cleanup which does the fd creation work seperately from the userspace
metadata creation.  It fits better with the other code.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

22aa425d

fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark · 9bbfc964

由 Heiko Carstens 提交于 12月 17, 2009

Please note that you need the patch below in addition, otherwise the
syscall wrapper stuff won't work on those 32 bit architectures which enable
the wrappers.

When enabled the syscall wrapper defines always take long parameters and then
cast them to whatever is needed. This approach doesn't work for the 32 bit
case where the original syscall takes a long long parameter, since we would
lose the upper 32 bits.
So syscalls with 64 bit arguments are special cases wrt to syscall wrappers
and enp up in the ugliness below (see also sys_fallocate). In addition these
special cased syscall wrappers have the drawback that ftrace syscall tracing
doesn't work on them, since they don't get defined by using the usual macros.
Signed-off-by: NEric Paris <eparis@redhat.com>

9bbfc964

fanotify: send events using read · a1014f10

由 Eric Paris 提交于 12月 17, 2009

Send events to userspace by reading the file descriptor from fanotify_init().
One will get blocks of data which look like:

struct fanotify_event_metadata {
	__u32 event_len;
	__u32 vers;
	__s32 fd;
	__u64 mask;
	__s64 pid;
	__u64 cookie;
} __attribute__ ((packed));

Simple code to retrieve and deal with events is below

	while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
		struct fanotify_event_metadata *metadata;

		metadata = (void *)buf;
		while(FAN_EVENT_OK(metadata, len)) {
			[PROCESS HERE!!]
			if (metadata->fd >= 0 && close(metadata->fd) != 0)
				goto fail;
			metadata = FAN_EVENT_NEXT(metadata, len);
		}
	}
Signed-off-by: NEric Paris <eparis@redhat.com>

a1014f10

fanotify: fanotify_mark syscall implementation · 2a3edf86

由 Eric Paris 提交于 12月 17, 2009

NAME
	fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object

SYNOPSIS
	int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
			  int dfd, const char *pathname)

DESCRIPTION
	fanotify_mark() is used to add remove or modify a mark on a filesystem
	object.  Marks are used to indicate that the fanotify group is
	interested in events which occur on that object.  At this point in
	time marks may only be added to files and directories.

	fanotify_fd must be a file descriptor returned by fanotify_init()

	The flags field must contain exactly one of the following:

	FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
	FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
		from the mark

	The following values can be OR'd into the flags field:

	FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
	FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)

	dfd may be any of the following:
	AT_FDCWD: the object will be lookup up based on pathname similar
		to open(2)

	file descriptor of a directory: if pathname is not NULL the
		object to modify will be lookup up similar to openat(2)

	file descriptor of the final object: if pathname is NULL the
		object to modify will be the object referenced by dfd

	The mask is the bitwise OR of the set of events of interest such as:
	FAN_ACCESS		- object was accessed (read)
	FAN_MODIFY		- object was modified (write)
	FAN_CLOSE_WRITE		- object was writable and was closed
	FAN_CLOSE_NOWRITE	- object was read only and was closed
	FAN_OPEN		- object was opened
	FAN_EVENT_ON_CHILD	- interested in objected that happen to
				  children.  Only relavent when the object
				  is a directory
	FAN_Q_OVERFLOW		- event queue overflowed (not implemented)

RETURN VALUE
	On success, this system call returns 0. On error, -1 is
	returned, and errno is set to indicate the error.

ERRORS
	EINVAL An invalid value was specified in flags.

	EINVAL An invalid value was specified in mask.

	EINVAL An invalid value was specified in ignored_mask.

	EINVAL fanotify_fd is not a file descriptor as returned by
	fanotify_init()

	EBADF fanotify_fd is not a valid file descriptor

	EBADF dfd is not a valid file descriptor and path is NULL.

	ENOTDIR dfd is not a directory and path is not NULL

	EACCESS no search permissions on some part of the path

	ENENT file not found

	ENOMEM Insufficient kernel memory is available.

CONFORMING TO
	These system calls are Linux-specific.
Signed-off-by: NEric Paris <eparis@redhat.com>

2a3edf86