提交 · ac7e22dcfafd04c842a02057afd6541c1d613ef9 · openeuler / raspberrypi-kernel

29 10月, 2010 11 次提交

fanotify: allow userspace to override max marks · ac7e22dc

由 Eric Paris 提交于 10月 28, 2010

Some fanotify groups, especially those like AV scanners, will need to place
lots of marks, particularly ignore marks. Since ignore marks do not pin
inodes in cache and are cleared if the inode is removed from core (usually
under memory pressure) we expose an interface for listeners, with
CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to
set and 'unlimited' number of marks. Programs which make use of this
feature will be able to OOM a machine.
Signed-off-by: NEric Paris <eparis@redhat.com>

ac7e22dc

fanotify: limit the number of marks in a single fanotify group · e7099d8a

由 Eric Paris 提交于 10月 28, 2010

There is currently no limit on the number of marks a given fanotify group
can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as
a serious DoS threat. This patch implements a default of 8192, the same as
inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating
the default DoS'able status.
Signed-off-by: NEric Paris <eparis@redhat.com>

e7099d8a

fanotify: allow userspace to override max queue depth · 5dd03f55

由 Eric Paris 提交于 10月 28, 2010

fanotify has a defualt max queue depth. This patch allows processes which
explicitly request it to have an 'unlimited' queue depth. These processes
need to be very careful to make sure they cannot fall far enough behind
that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN.
Signed-off-by: NEric Paris <eparis@redhat.com>

5dd03f55

fsnotify: implement a default maximum queue depth · 2529a0df

由 Eric Paris 提交于 10月 28, 2010

Currently fanotify has no maximum queue depth.  Since fanotify is
CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it
certianly is possible that an fanotify listener which can't keep up could
OOM the box.  This patch implements a default 16k depth.  This is the same
default depth used by inotify, but given fanotify's better queue merging in
many situations this queue will contain many additional useful events by
comparison.
Signed-off-by: NEric Paris <eparis@redhat.com>

2529a0df

fanotify: ignore fanotify ignore marks if open writers · 5322a59f

由 Eric Paris 提交于 10月 28, 2010

fanotify will clear ignore marks if a task changes the contents of an
inode.  The problem is with the races around when userspace finishes
checking a file and when that result is actually attached to the inode.
This race was described as such:

Consider the following scenario with hostile processes A and B, and
victim process C:
1. Process A opens new file for writing. File check request is generated.
2. File check is performed in userspace. Check result is "file has no malware".
3. The "permit" response is delivered to kernel space.
4. File ignored mark set.
5. Process A writes dummy bytes to the file. File ignored flags are cleared.
6. Process B opens the same file for reading. File check request is generated.
7. File check is performed in userspace. Check result is "file has no malware".
8. Process A writes malware bytes to the file. There is no cached response yet.
9. The "permit" response is delivered to kernel space and is cached in fanotify.
10. File ignored mark set.
11. Now any process C will be permitted to open the malware file.
There is a race between steps 8 and 10

While fanotify makes no strong guarantees about systems with hostile
processes there is no reason we cannot harden against this race.  We do
that by simply ignoring any ignore marks if the inode has open writers (aka
i_writecount > 0).  (We actually do not ignore ignore marks if the
FAN_MARK_SURV_MODIFY flag is set)
Reported-by: NVasily Novikov <vasily.novikov@kaspersky.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

5322a59f

fsnotify: call fsnotify_parent in perm events · 52420392

由 Eric Paris 提交于 10月 28, 2010

fsnotify perm events do not call fsnotify parent. That means you cannot
register a perm event on a directory and enforce permissions on all inodes in
that directory. This patch fixes that situation.
Signed-off-by: NEric Paris <eparis@redhat.com>

52420392

fsnotify: correctly handle return codes from listeners · ff8bcbd0

由 Eric Paris 提交于 10月 28, 2010

When fsnotify groups return errors they are ignored.  For permissions
events these should be passed back up the stack, but for most events these
should continue to be ignored.
Signed-off-by: NEric Paris <eparis@redhat.com>

ff8bcbd0

fanotify: implement fanotify listener ordering · 4231a235

由 Eric Paris 提交于 10月 28, 2010

The fanotify listeners needs to be able to specify what types of operations
they are going to perform so they can be ordered appropriately between other
listeners doing other types of operations.  They need this to be able to make
sure that things like hierarchichal storage managers will get access to inodes
before processes which need the data.  This patch defines 3 possible uses
which groups must indicate in the fanotify_init() flags.

FAN_CLASS_PRE_CONTENT
FAN_CLASS_CONTENT
FAN_CLASS_NOTIF

Groups will receive notification in that order.  The order between 2 groups in
the same class is undeterministic.

FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to
the inode before they are certain that the inode contains it's final data.  A
hierarchical storage manager should choose to use this class.

FAN_CLASS_CONTENT is intended to be used by listeners which need access to the
inode after it contains its intended contents.  This would be the appropriate
level for an AV solution or document control system.

FAN_CLASS_NOTIF is intended for normal async notification about access, much the
same as inotify and dnotify.  Syncronous permissions events are not permitted
at this class.
Signed-off-by: NEric Paris <eparis@redhat.com>

4231a235

fsnotify: implement ordering between notifiers · 6ad2d4e3

由 Eric Paris 提交于 10月 28, 2010

fanotify needs to be able to specify that some groups get events before
others. They use this idea to make sure that a hierarchical storage
manager gets access to files before programs which actually use them. This
is purely infrastructure. Everything will have a priority of 0, but the
infrastructure will exist for it to be non-zero.
Signed-off-by: NEric Paris <eparis@redhat.com>

6ad2d4e3

fanotify: allow fanotify to be built · 9343919c

由 Eric Paris 提交于 10月 28, 2010

We disabled the ability to build fanotify in commit 7c534773.
This reverts that commit and allows people to build fanotify.
Signed-off-by: NEric Paris <eparis@redhat.com>

9343919c

ext4: fix compile with CONFIG_EXT4_FS_XATTR disabled · 19ef2014

由 Ingo Molnar 提交于 10月 28, 2010

Commit 5dabfc78 ("ext4: rename {exit,init}_ext4_*() to
ext4_{exit,init}_*()") causes

  fs/ext4/super.c:4776: error: implicit declaration of function ‘ext4_init_xattr’

when CONFIG_EXT4_FS_XATTR is disabled.

It renamed init_ext4_xattr to ext4_init_xattr but forgot to update the
dummy definition in fs/ext4/xattr.h.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19ef2014

28 10月, 2010 29 次提交

9p: Add datasync to client side TFSYNC/RFSYNC for dotl · b165d601

由 Venkateswararao Jujjuri (JV) 提交于 10月 22, 2010

SYNOPSIS
    size[4] Tfsync tag[2] fid[4] datasync[4]

    size[4] Rfsync tag[2]

DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other  permanent  storage
    device)  where that  file  resides.

    If datasync flag is specified data will be fleshed but does not flush
    modified metadata unless  that  metadata  is  needed  in order to allow a
    subsequent data retrieval to be correctly handled.
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

b165d601

fs/9p: Use generic_file_open with lookup_instantiate_filp · 877cb3d4

由 Aneesh Kumar K.V 提交于 9月 22, 2010

We need to do O_LARGEFILE check even in case of 9p. Use the
generic_file_open helper
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

877cb3d4

fs/9p: Add missing iput in v9fs_vfs_lookup · 9856af8b

由 Aneesh Kumar K.V 提交于 9月 22, 2010

Make sure we drop inode reference in the error path
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

9856af8b

fs/9p: Use mknod 9p operation on create without open request · f5fc6145

由 Aneesh Kumar K.V 提交于 10月 12, 2010

A create without LOOKUP_OPEN flag set is due to mknod of regular
files. Use mknod 9P operation for the same
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

f5fc6145

9p: Implement TREADLINK operation for 9p2000.L · 329176cc

由 M. Mohan Kumar 提交于 9月 28, 2010

Synopsis

	size[4] TReadlink tag[2] fid[4]
	size[4] RReadlink tag[2] target[s]

Description
	Readlink is used to return the contents of the symoblic link
        referred by fid. Contents of symboic link is returned as a
        response.

	target[s] - Contents of the symbolic link referred by fid.
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

329176cc

9p: Use V9FS_MAGIC in statfs · 368c09d2

由 M. Mohan Kumar 提交于 9月 27, 2010

Use V9FS_MAGIC as the file system type while filling kernel statfs
strucutre instead of using host file system magic number. Also move
the definition of V9FS_MAGIC from v9fs.h to standard magic.h file.
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

368c09d2

9p: Implement TGETLOCK · 1d769cd1

由 M. Mohan Kumar 提交于 9月 27, 2010

Synopsis

    size[4] TGetlock tag[2] fid[4] getlock[n]
    size[4] RGetlock tag[2] getlock[n]

Description

TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure.  Otherwise it
returns the details of the conflicting locks in the getlock structure

    getlock structure:
      type[1] - Type of lock: F_RDLCK, F_WRLCK
      start[8] - Starting offset for lock
      length[8] - Number of bytes to check for the lock
             If length is 0, check for lock in all bytes starting at the location
            'start' through to the end of file
      pid[4] - PID of the process that wants to take lock/owns the task
               in case of reply
      client[4] - Client id of the system that owns the process which
                  has the conflicting lock
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

1d769cd1

9p: Implement TLOCK · a099027c

由 M. Mohan Kumar 提交于 9月 27, 2010

Synopsis

    size[4] TLock tag[2] fid[4] flock[n]
    size[4] RLock tag[2] status[1]

Description

Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request

    flock structure:
        type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
        flags[4] - Flags could be either of
          P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
            conflicting lock exists, wait for that lock to be released.
          P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
            trying to reclaim a lock after a server restrart (due to crash)
        start[8] - Starting offset for lock
        length[8] - Number of bytes to lock
          If length is 0, lock all bytes starting at the location 'start'
          through to the end of file
        pid[4] - PID of the process that wants to take lock
        client_id[4] - Unique client id

        status[1] - Status of the lock request, can be
          P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
          P9_LOCK_GRACE(3)
          P9_LOCK_SUCCESS - Request was successful
          P9_LOCK_BLOCKED - A conflicting lock is held by another process
          P9_LOCK_ERROR - Error while processing the lock request
          P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
            requests in this period (except locks with
            P9_LOCK_FLAGS_RECLAIM flag set)
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

a099027c

[9p] Introduce client side TFSYNC/RFSYNC for dotl. · 920e65dc

由 Venkateswararao Jujjuri (JV) 提交于 9月 22, 2010

SYNOPSIS
    size[4] Tfsync tag[2] fid[4]

    size[4] Rfsync tag[2]

DESCRIPTION

The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other  permanent  storage
device)  where that  file  resides.
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

920e65dc

V
[fs/9p] Add file_operations for cached mode in dotl protocol. · b04faaf3
由 Venkateswararao Jujjuri (JV) 提交于 9月 22, 2010
```
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
```
b04faaf3

fs/9p: Add access = client option to opt in acl evaluation. · 76381a42

由 Aneesh Kumar K.V 提交于 9月 28, 2010

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

76381a42

fs/9p: Implement create time inheritance · ad77dbce

由 Aneesh Kumar K.V 提交于 9月 28, 2010

Inherit default ACL on create
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

ad77dbce

fs/9p: Update ACL on chmod · 6e8dc555

由 Aneesh Kumar K.V 提交于 9月 28, 2010

We need update the acl value on chmod
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

6e8dc555

fs/9p: Implement setting posix acl · 22d8dcdf

由 Aneesh Kumar K.V 提交于 9月 28, 2010

This patch also update mode bits, as a normal file system.
I am not sure wether we should do that, considering that
a setxattr on the server will again update the ACL/mode value
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

22d8dcdf

fs/9p: Add xattr callbacks for POSIX ACL · 7a4566b0

由 Aneesh Kumar K.V 提交于 9月 28, 2010

This patch implement fetching POSIX ACL from the server
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

7a4566b0

fs/9p: Implement POSIX ACL permission checking function · 85ff872d

由 Aneesh Kumar K.V 提交于 9月 28, 2010

The ACL value is fetched as a part of inode initialization
from the server and the permission checking function use the
cached value of the ACL
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

85ff872d

fs/9p: Remove the redundant rsize calculation in v9fs_file_write() · 8d40fa24

由 jvrao 提交于 8月 30, 2010

the same calculation is done in p9_client_write
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

8d40fa24

9p: Add a Direct IO support for non-cached operations. · 3e24ad2f

由 jvrao 提交于 8月 24, 2010

The presence of v9fs_direct_IO() in the address space ops vector
allowes open() O_DIRECT flags which would have failed otherwise.

In the non-cached mode, we shunt off direct read and write requests before
the VFS gets them, so this method should never be called.

Direct IO is not 'yet' supported in the cached mode. Hence when
this routine is called through generic_file_aio_read(), the read/write fails
with an error.
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

3e24ad2f

fs/9p: mkdir fix for setting S_ISGID bit as per parent directory · 7c7298cf

由 Harsh Prateek Bora 提交于 8月 18, 2010

The current implementation of 9p client mkdir function does not
set the S_ISGID mode bit for the directory being created if the
parent directory has this bit set. This patch fixes this problem
so that the newly created directory inherits the gid from parent
directory and not from the process creating this directory, when
the S_ISGID bit is set in parent directory.
Signed-off-by: NHarsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

7c7298cf

9p: Pass the correct end of buffer to p9dirent_read · 8812a3d5

由 Sripathi Kodi 提交于 8月 09, 2010

A patch was accepted recently for sending correct buffer size to p9stat_read.
We need a similar patch in v9fs_dir_readdir_dotl to send correct end of buffer
to p9dirent_read.
Signed-off-by: NSripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

8812a3d5

fs/9p: setrlimit fix for 9p write · 3834b12a

由 Harsh Prateek Bora 提交于 8月 03, 2010

Current 9p client file write code does not check for RLIMIT_FSIZE resource.
This bug was found by running LTP test case for setrlimit. This bug is fixed
by calling generic_write_checks before sending the write request to the
server.
Without this patch: the write function is allowed to write above the
RLIMIT_FSIZE set by user.
With this patch: the write function checks for RLIMIT_SIZE and writes upto
the size limit.
Signed-off-by: NHarsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

3834b12a

9p: remove unneeded checks · 57ee047b

由 Dan Carpenter 提交于 8月 04, 2010

git_t is unsigned an can never be less than zero.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

57ee047b

ext4: optimize orphan_list handling for ext4_setattr · 3d287de3

由 Dmitry Monakhov 提交于 10月 27, 2010

Surprisingly chown() on ext4 is not SMP scalable operation. 
Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
result in significant performance overhead because of global orphan
mutex, especially in no-journal mode (where orphan_add() is noop).
It is possible to skip explicit orphan_del if possible.
Results of fchown() micro-benchmark in no-journal mode
while (1) {
   iteration++;
   fchown(fd, uid, gid);
   fchown(fd, uid + 1, gid + 1)
}
measured: iterations per millisecond
| nr_tasks | w/o patch | with patch |
|        1 |       142 |        185 |
|        4 |       109 |        642 |
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d287de3

N
ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new · beed5ecb
由 Nicolas Kaiser 提交于 10月 27, 2010
```
Signed-off-by: NNicolas Kaiser <nikai@nikai.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
beed5ecb

ext4: fix compile error in ext4_fallocate() · a6371b63

由 Kazuya Mio 提交于 10月 27, 2010

When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got
the following compile error.

  CC [M]  fs/ext4/extents.o
fs/ext4/extents.c: In function 'ext4_fallocate':
fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function)
fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once
fs/ext4/extents.c:3772: error: for each function it appears in.)
make[2]: *** [fs/ext4/extents.o] Error 1

The patch fixes this problem.
Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a6371b63

ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static · eee4adc7

由 Eric Sandeen 提交于 10月 27, 2010

These functions are only used within fs/ext4/mballoc.c, so move them
so they are used after they are defined, and then make them be static.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

eee4adc7

T
ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end() · 61d08673
由 Theodore Ts'o 提交于 10月 27, 2010
```
Fix a namespace leak from fs/ext4
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
61d08673

ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static · 4a873a47

由 Theodore Ts'o 提交于 10月 27, 2010

Fix a namespace leak by moving the function to the file where it is
used and making it static.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4a873a47

ext4: rename {ext,idx}_pblock and inline small extent functions · bf89d16f

由 Theodore Ts'o 提交于 10月 27, 2010

Cleanup namespace leaks from fs/ext4 and the inline trivial functions
ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the
code size actually shrinks when we make these functions inline,
they're so trivial.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf89d16f