提交 · e7848683ae7ded0a4a8964122a47da9104a98337 · openanolis / cloud-kernel

31 7月, 2012 12 次提交

btrfs: Push mnt_want_write() outside of i_mutex · e7848683

由 Jan Kara 提交于 6月 12, 2012

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e7848683

fat: Push mnt_want_write() outside of i_mutex · e24f17da

由 Jan Kara 提交于 6月 12, 2012

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
outside of i_mutex as in other places.

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e24f17da

fs: Push mnt_want_write() outside of i_mutex · c30dabfe

由 Jan Kara 提交于 6月 12, 2012

Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
without it. This isn't really a problem because mnt_want_write() is a
non-blocking operation (essentially has a trylock semantics) but when the
function starts to handle also frozen filesystems, it will get a full lock
semantics and thus proper lock ordering has to be established. So move
all mnt_want_write() calls outside of i_mutex.

One non-trivial case needing conversion is kern_path_create() /
user_path_create() which didn't include mnt_want_write() but now needs to
because it acquires i_mutex.  Because there are virtual file systems which
don't bother with freeze / remount-ro protection we actually provide both
versions of the function - one which calls mnt_want_write() and one which does
not.

[AV: scratch the previous, mnt_want_write() has been moved to kern_path_create()
by now]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c30dabfe

mm: Make default vm_ops provide ->page_mkwrite handler · 4fcf1c62

由 Jan Kara 提交于 6月 12, 2012

Make default vm_ops provide ->page_mkwrite handler. Currently it only updates
file's modification times and gets locked page but later it will also handle
filesystem freezing.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4fcf1c62

mm: Update file times from fault path only if .page_mkwrite is not set · 41c4d25f

由 Jan Kara 提交于 6月 12, 2012

Filesystems wanting to properly support freezing need to have control
when file_update_time() is called. After pushing file_update_time()
to all relevant .page_mkwrite implementations we can just stop calling
file_update_time() when filesystem implements .page_mkwrite.
Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

41c4d25f

sysfs: Push file_update_time() into bin_page_mkwrite() · 14ae417c

由 Jan Kara 提交于 6月 12, 2012

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

14ae417c

gfs2: Push file_update_time() into gfs2_page_mkwrite() · a63e9b2e

由 Jan Kara 提交于 6月 12, 2012

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a63e9b2e

9p: Push file_update_time() into v9fs_vm_page_mkwrite() · 120c2bca

由 Jan Kara 提交于 6月 12, 2012

CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

120c2bca

ceph: Push file_update_time() into ceph_page_mkwrite() · 3ca9c3bd

由 Jan Kara 提交于 6月 12, 2012

CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: NSage Weil <sage@newdream.net>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3ca9c3bd

fs: Push file_update_time() into __block_page_mkwrite() · 5e8830dc

由 Jan Kara 提交于 6月 12, 2012

Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5e8830dc

fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() · 183fef91

由 Jan Kara 提交于 6月 12, 2012

CC: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

183fef91

simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early · 64894cf8

由 Al Viro 提交于 7月 31, 2012

The write ref to vfsmount taken in lookup_open()/atomic_open() is going to
be dropped; we take the one to stay in dentry_open(). Just grab the temporary
in caller if it looks like we are going to need it (create/truncate/writable open)
and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write()
inside, check that flag and treat "false" as "mnt_want_write() has just failed".
mnt_want_write() is cheap and the things get considerably simpler and more robust
that way - we get it and drop it in the same function, to start with, rather
than passing a "has something in the guts of really scary functions taken it"
back to caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

64894cf8

30 7月, 2012 24 次提交

fix O_EXCL handling for devices · f8310c59

由 Al Viro 提交于 7月 30, 2012

O_EXCL without O_CREAT has different semantics; it's "fail if already opened",
not "fail if already exists". commit 71574865 broke that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f8310c59

A
lockd: handle lockowner allocation failure in nlmclnt_proc() · bf884891
由 Al Viro 提交于 7月 29, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
bf884891

lockd: shift grabbing a reference to nlm_host into nlm_alloc_call() · 446945ab

由 Al Viro 提交于 7月 26, 2012

It's used both for client and server hosts; we can't do nlmclnt_release_host()
on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON()
for calling the wrong one. Makes life simpler for callers, actually...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

446945ab

fs: add link restriction audit reporting · a51d9eaa

由 Kees Cook 提交于 7月 25, 2012

Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a51d9eaa

fs: add link restrictions · 800179c9

由 Kees Cook 提交于 7月 25, 2012

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

1996 Aug, Zygo Blaxell
http://marc.info/?l=bugtraq&m=87602167419830&w=2
1996 Oct, Andrew Tridgell
http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
1997 Dec, Albert D Cahalan
http://lkml.org/lkml/1997/12/16/4
2005 Feb, Lorenzo Hernández García-Hierro
http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
2010 May, Kees Cook
https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

- Violates POSIX.
- POSIX didn't consider this situation and it's not useful to follow
a broken specification at the cost of security.
- Might break unknown applications that use this feature.
- Applications that break because of the change are easy to spot and
fix. Applications that are vulnerable to symlink ToCToU by not having
the change aren't. Additionally, no applications have yet been found
that rely on this behavior.
- Applications should just use mkstemp() or O_CREATE|O_EXCL.
- True, but applications are not perfect, and new software is written
all the time that makes these mistakes; blocking this flaw at the
kernel is a single solution to the entire class of vulnerability.
- This should live in the core VFS.
- This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
- This should live in an LSM.
- This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

800179c9

vfs: don't let do_last pass negative dentry to audit_inode · 3134f37e

由 Jeff Layton 提交于 7月 25, 2012

I can reliably reproduce the following panic by simply setting an audit
rule on a recent 3.5.0+ kernel:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 PGD 7acd9067 PUD 7b8fb067 PMD 0
 Oops: 0000 [#86] SMP
 Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 CPU 0
 Pid: 1286, comm: abrt-dump-oops Tainted: G      D      3.5.0+ #1 Bochs Bochs
 RIP: 0010:[<ffffffff810d1250>]  [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 RSP: 0018:ffff88007aebfc38  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
 FS:  00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
 Stack:
  ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
  ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
  ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
 Call Trace:
  [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160
  [<ffffffff810d4968>] __audit_inode+0x118/0x2d0
  [<ffffffff811955a9>] do_last+0x999/0xe80
  [<ffffffff81191fe8>] ? inode_permission+0x18/0x50
  [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130
  [<ffffffff81195b4a>] path_openat+0xba/0x420
  [<ffffffff81196111>] do_filp_open+0x41/0xa0
  [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120
  [<ffffffff811855cd>] do_sys_open+0xed/0x1c0
  [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300
  [<ffffffff811856c1>] sys_open+0x21/0x30
  [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b
  RSP <ffff88007aebfc38>
 CR2: 0000000000000040

The problem is that do_last is passing a negative dentry to audit_inode.
The comments on lookup_open note that it can pass back a negative dentry
if O_CREAT is not set.

This patch fixes the oops, but I'm not clear on whether there's a better
approach.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3134f37e

A
brcm80211: pointless current->files passed to filp_close() · 0b5306b3
由 Al Viro 提交于 7月 22, 2012
```
... only needed if it's been in descriptor table
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0b5306b3
A
sound_firmware: don't pass crap to filp_close() · 58609306
由 Al Viro 提交于 7月 22, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
58609306

gadgetfs: clean up · 20818a0c

由 Al Viro 提交于 7月 22, 2012

sigh...
* opened files have non-NULL dentries and non-NULL inodes
* close_filp() needs current->files only if the file had been
in descriptor table.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

20818a0c

slightly reduce lossage in gdm72xx · 09fada5b

由 Al Viro 提交于 7月 22, 2012

* filp_close() needs non-NULL second argument only if it'd been in descriptor
table
* opened files have non-NULL dentries, TYVM
* ... and those dentries are positive - it's kinda hard to open a file that
doesn't exist.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

09fada5b

slightly reduce idiocy in drivers/staging/bcm/Misc.c · 32aecdd3

由 Al Viro 提交于 7月 22, 2012

a) vfs_llseek() does *not* access userland pointers of any kind
b) neither does filp_close(), for that matter
c) ... nor filp_open()
d) vfs_read() does, but we do have a wrapper for that (kernel_read()),
so there's no need to reinvent it.
e) passing current->files to filp_close() on something that never
had been in descriptor table is pointless.

ISAGN: voodoo dolls to be used on voodoo programmers...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

32aecdd3

A
consolidate pipe file creation · e4fad8e5
由 Al Viro 提交于 7月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e4fad8e5
A
take grabbing f->f_path to do_dentry_open() · b5bcdda3
由 Al Viro 提交于 7月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b5bcdda3

uninline file_free_rcu() · 5c33b183

由 Al Viro 提交于 7月 20, 2012

What inline?  Its only use is passing its address to call_rcu(), for fuck sake!
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5c33b183

A
ecryptfs_lookup_interpose(): allocate dentry_info first · 0b1d9011
由 Al Viro 提交于 7月 20, 2012
```
less work on failure that way
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0b1d9011

sanitize ecryptfs_lookup() · bc65a121

由 Al Viro 提交于 7月 20, 2012

* ->lookup() never gets hit with . or ..
* dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's
no need to d_drop() the sucker.
* wrong name printed in one of the printks (NULL, in fact)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc65a121

A
clean unix_bind() up a bit · faf02010
由 Al Viro 提交于 7月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
faf02010

pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. · a8104a9f

由 Al Viro 提交于 7月 20, 2012

One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8104a9f

mknod: take sanity checks on mode into the very beginning · 8e4bfca1

由 Al Viro 提交于 7月 20, 2012

Note that applying umask can't affect their results.  While
that affects errno in cases like
	mknod("/no_such_directory/a", 030000)
yielding -EINVAL (due to impossible mode_t) instead of
-ENOENT (due to inexistent directory), IMO that makes a lot
more sense, POSIX allows to return either and any software
that relies on getting -ENOENT instead of -EINVAL in that
case deserves everything it gets.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e4bfca1

new helper: done_path_create() · 921a1650

由 Al Viro 提交于 7月 20, 2012

releases what needs to be released after {kern,user}_path_create()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

921a1650

pull unlock+dput() out into do_spu_create() · 25b2692a

由 Al Viro 提交于 7月 19, 2012

... and cleaning spufs_create() a bit, while we are at it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

25b2692a

A
spufs: pull unlock-and-dput() up into spufs_create() · 1ba44cc9
由 Al Viro 提交于 7月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1ba44cc9
A
spufs_create_context(): simplify failure exits · 66ec7b2c
由 Al Viro 提交于 7月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
66ec7b2c

move spu_forget() into spufs_rmdir() · 67cba9fd

由 Al Viro 提交于 7月 19, 2012

now that __fput() is *not* done in any callchain containing mmput(),
we can do that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

67cba9fd

23 7月, 2012 4 次提交
- A
  ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() · 8cae6f71
  由 Al Viro 提交于 7月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8cae6f71
- A
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() · 11e62a8f
  由 Al Viro 提交于 7月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  11e62a8f
- A
  switch dentry_open() to struct path, make it grab references itself · 765927b2
  由 Al Viro 提交于 6月 26, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  765927b2
- A
  spufs: shift dget/mntget towards dentry_open() · bf349a44
  由 Al Viro 提交于 6月 25, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  bf349a44

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功