提交 · 2c3de1c2d7d68c6ba4c1ecd82c68285f34d9609e · openanolis / cloud-kernel

28 3月, 2013 1 次提交

vfs/splice: Fix missed checks in new __kernel_write() helper · 3e84f48e

由 Al Viro 提交于 3月 27, 2013

Commit 06ae43f3 ("Don't bother with redoing rw_verify_area() from
default_file_splice_from()") lost the checks to test existence of the
write/aio_write methods.  My apologies ;-/

Eventually, we want that in fs/splice.c side of things (no point
repeating it for every buffer, after all), but for now this is the
obvious minimal fix.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e84f48e

27 3月, 2013 5 次提交

userns: Restrict when proc and sysfs can be mounted · 87a8ebd6

由 Eric W. Biederman 提交于 3月 24, 2013

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

87a8ebd6

vfs: Carefully propogate mounts across user namespaces · 132c94e3

由 Eric W. Biederman 提交于 3月 22, 2013

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

132c94e3

vfs: Add a mount flag to lock read only bind mounts · 90563b19

由 Eric W. Biederman 提交于 3月 22, 2013

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

90563b19

userns: Don't allow creation if the user is chrooted · 3151527e

由 Eric W. Biederman 提交于 3月 15, 2013

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3151527e

Nest rename_lock inside vfsmount_lock · 7ea600b5

由 Al Viro 提交于 3月 26, 2013

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.
Spotted-by: NBrad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea600b5

23 3月, 2013 2 次提交

nfsd: fix bad offset use · e49dbbf3

由 Kent Overstreet 提交于 3月 22, 2013

vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face1502 "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e49dbbf3

vfs,proc: guarantee unique inodes in /proc · 51f0885e

由 Linus Torvalds 提交于 3月 22, 2013

Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc/<pid>/net/ for example.

This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems. We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.

In short: don't do this. Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way. But we're kind of stuck with it.

This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number. That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.

That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable. We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Analyzed-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51f0885e

22 3月, 2013 2 次提交

cifs: ignore everything in SPNEGO blob after mechTypes · f853c616

由 Jeff Layton 提交于 3月 11, 2013

We've had several reports of people attempting to mount Windows 8 shares
and getting failures with a return code of -EINVAL. The default sec=
mode changed recently to sec=ntlmssp. With that, we expect and parse a
SPNEGO blob from the server in the NEGOTIATE reply.

The current decode_negTokenInit function first parses all of the
mechTypes and then tries to parse the rest of the negTokenInit reply.
The parser however currently expects a mechListMIC or nothing to follow the
mechTypes, but Windows 8 puts a mechToken field there instead to carry
some info for the new NegoEx stuff.

In practice, we don't do anything with the fields after the mechTypes
anyway so I don't see any real benefit in continuing to parse them.
This patch just has the kernel ignore the fields after the mechTypes.
We'll probably need to reinstate some of this if we ever want to support
NegoEx.
Reported-by: NJason Burgess <jason@jacknife2.dns2go.com>
Reported-by: NYan Li <elliot.li.tech@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

f853c616

Don't bother with redoing rw_verify_area() from default_file_splice_from() · 06ae43f3

由 Al Viro 提交于 3月 20, 2013

default_file_splice_from() ends up calling vfs_write() (via very convoluted
callchain). It's an overkill, since we already have done rw_verify_area()
in the caller by the time we call vfs_write() we are under set_fs(KERNEL_DS),
so access_ok() is also pointless. Add a new helper (__kernel_write()),
use it instead of kernel_write() in there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

06ae43f3

21 3月, 2013 5 次提交

NFSv4.1: Add a helper pnfs_commit_and_return_layout · 24028672

由 Trond Myklebust 提交于 3月 20, 2013

In order to be able to safely return the layout in nfs4_proc_setattr,
we need to block new uses of the layout, wait for all outstanding
users of the layout to complete, commit the layout and then return it.

This patch adds a helper in order to do all this safely.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>

24028672

NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn · 24956804

由 Trond Myklebust 提交于 3月 20, 2013

Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NBenny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org

24956804

NFSv4.1: Fix a race in pNFS layoutcommit · a073dbff

由 Trond Myklebust 提交于 3月 20, 2013

We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the
NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations
where the two are out of sync.
The first half of the problem is to ensure that pnfs_layoutcommit_inode
clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg.
We still need to keep the reference to those segments until the RPC call
is finished, so in order to make it clear _where_ those references come
from, we add a helper pnfs_list_write_lseg_done() that cleans up after
pnfs_list_write_lseg.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NBenny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org

a073dbff

pnfs-block: removing DM device maybe cause oops when call dev_remove · 4376c946

由 fanchaoting 提交于 3月 21, 2013

when pnfs block using device mapper,if umounting later,it maybe
cause oops. we apply "1 + sizeof(bl_umount_request)" memory for
msg->data, the memory maybe overflow when we do "memcpy(&dataptr
[sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))",
because the size of bl_msg is more than 1 byte.

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4376c946

NFSv4: Fix the string length returned by the idmapper · cf4ab538

由 Trond Myklebust 提交于 3月 08, 2013

Functions like nfs_map_uid_to_name() and nfs_map_gid_to_group() are
expected to return a string without any terminating NUL character.
Regression introduced by commit 57e62324
(NFS: Store the legacy idmapper result in the keyring).
Reported-by: NDave Chiluk <dave.chiluk@canonical.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>=3.4]

cf4ab538

20 3月, 2013 2 次提交

ext4: fix data=journal fast mount/umount hang · 2b405bfa

由 Theodore Ts'o 提交于 3月 20, 2013

In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into jbd2_log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.

Arguably we should adjust ext4_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext4_evict_inode(), and so to save a bit of CPU time, and to make the
patch much more obviously correct by inspection(tm), we'll fix it by
explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.

This can be easily replicated via: 

     mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb

------------[ cut here ]------------
WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd()
Hardware name: Bochs
JBD2: bad log_start_commit: 3005630206 3005630206 0 0
Modules linked in:
Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020
Call Trace:
 [<c015c0ef>] warn_slowpath_common+0x68/0x7d
 [<c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd
 [<c015c177>] warn_slowpath_fmt+0x2b/0x2f
 [<c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd
 [<c02b8075>] jbd2_log_start_commit+0x24/0x34
 [<c0279ed5>] ext4_evict_inode+0x71/0x2e3
 [<c021f0ec>] evict+0x94/0x135
 [<c021f9aa>] iput+0x10a/0x110
 [<c02b7836>] jbd2_journal_destroy+0x190/0x1ce
 [<c0175284>] ? bit_waitqueue+0x50/0x50
 [<c028d23f>] ext4_put_super+0x52/0x294
 [<c020efe3>] generic_shutdown_super+0x48/0xb4
 [<c020f071>] kill_block_super+0x22/0x60
 [<c020f3e0>] deactivate_locked_super+0x22/0x49
 [<c020f5d6>] deactivate_super+0x30/0x33
 [<c0222795>] mntput_no_expire+0x107/0x10c
 [<c02233a7>] sys_umount+0x2cf/0x2e0
 [<c02233ca>] sys_oldumount+0x12/0x14
 [<c08096b8>] syscall_call+0x7/0xb
---[ end trace 6a954cc790501c1f ]---
jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org

2b405bfa

ext4: fix ext4_evict_inode() racing against workqueue processing code · 1ada47d9

由 Theodore Ts'o 提交于 3月 20, 2013

Commit 84c17543 (ext4: move work from io_end to inode) triggered a
regression when running xfstest #270 when the file system is mounted
with dioread_nolock.

The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
this guarantees that last io_end structure has been freed, but it does
not guarantee that the workqueue structure, which was moved into the
inode by commit 84c17543, is actually finished.  Once
ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
will allow ext4_ioend_wait() to return on CPU #2, at which point the
evict_inode() codepath can race against the workqueue code on CPU #1
accessing EXT4_I(inode)->i_unwritten_work to find the next item of
work to do.

Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
will be renamed ext4_ioend_shutdown(), since it is only used by
ext4_evict_inode().  Also, move the call to ext4_ioend_shutdown()
until after truncate_inode_pages() and filemap_write_and_wait() are
called, to make sure all dirty pages have been written back and
flushed from the page cache first.

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e
*pdpt = 0000000030bc3001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c17543-dirty #91 Bochs Bochs
EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0
EIP is at cwq_activate_delayed_work+0x3b/0x7e
EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
Stack:
 f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
 f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
Call Trace:
 [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb
 [<c01def1d>] process_one_work+0x5d8/0x637
 [<c040a10b>] ? ext4_end_bio+0x300/0x300
 [<c01e3105>] worker_thread+0x249/0x3ef
 [<c01ea317>] kthread+0xd8/0xeb
 [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb
 [<c023a370>] ? trace_hardirqs_on+0x27/0x37
 [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28
 [<c01ea23f>] ? __init_kthread_worker+0x71/0x71
Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1
 6c c1 00 31 c9 83
EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
CR2: 0000000000000000
---[ end trace a1923229da53d8a4 ]---
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>

1ada47d9

19 3月, 2013 5 次提交

nfsd: fix startup order in nfsd_reply_cache_init · ac534ff2

由 Jeff Layton 提交于 3月 15, 2013

If we end up doing "goto out_nomem" in this function, we'll call
nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and
free entries, but that list may not be initialized yet if the server is
starting up for the first time. It's also possible for the shrinker to
kick in before we've initialized the LRU list.

Rearrange the initialization so that the LRU list_head and cache size
are initialized before doing any of the allocations that might fail.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ac534ff2

nfsd: only unhash DRC entries that are in the hashtable · a517b608

由 Jeff Layton 提交于 3月 18, 2013

It's not safe to call hlist_del() on a newly initialized hlist_node.
That leads to a NULL pointer dereference. Only do that if the entry
is hashed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a517b608

xfs: ensure we capture IO errors correctly · e0018738

由 Dave Chinner 提交于 3月 12, 2013

Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.

Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit c163f9a1)

e0018738

xfs: fix xfs_iomap_eof_prealloc_initial_size type · 3325beed

由 Mark Tinguely 提交于 2月 24, 2013

Fix the return type of xfs_iomap_eof_prealloc_initial_size() to
xfs_fsblock_t to reflect the fact that the return value may be an
unsigned 64 bits if XFS_BIG_BLKNOS is defined.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit e8108ced)

3325beed

xfs: fix potential infinite loop in xfs_iomap_prealloc_size() · 83cdadd8

由 Brian Foster 提交于 2月 22, 2013

If freesp == 0, we could end up in an infinite loop while squashing
the preallocation. Break the loop when we've killed the prealloc
entirely.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit e78c420b)

83cdadd8

18 3月, 2013 1 次提交

ext4: fix memory leakage in mext_check_coverage · 0e401101

由 Dmitry Monakhov 提交于 3月 18, 2013

Regression was introduced by following commit 8c854473
TESTCASE (git://oss.sgi.com/xfs/cmds/xfstests.git):
#while true;do ./check 301 || break ;done

Also fix potential memory leakage in get_ext_path() once
ext4_ext_find_extent() have failed.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e401101

16 3月, 2013 1 次提交

Btrfs: fix warning of free_extent_map · 3b277594

由 Liu Bo 提交于 3月 15, 2013

Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3b277594

15 3月, 2013 6 次提交

Btrfs: fix warning when creating snapshots · 7c2ec3f0

由 Liu Bo 提交于 3月 13, 2013

Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction.  Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7c2ec3f0

Btrfs: return as soon as possible when edquot happens · 720f1e20

由 Wang Shilong 提交于 3月 06, 2013

If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

720f1e20

Btrfs: return EIO if we have extent tree corruption · 492104c8

由 Josef Bacik 提交于 3月 08, 2013

The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing.  Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

492104c8

btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622

由 Eric Sandeen 提交于 3月 09, 2013

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bc178622

Btrfs: remove btrfs_try_spin_lock · d340d247

由 Liu Bo 提交于 3月 11, 2013

Remove a useless function declaration
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d340d247

Btrfs: get better concurrency for snapshot-aware defrag work · a09a0a70

由 Liu Bo 提交于 3月 11, 2013

Using spinning case instead of blocking will result in better concurrency
overall.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a09a0a70

14 3月, 2013 4 次提交

nfsd: convert to idr_alloc() · ebd6c707

由 Tejun Heo 提交于 3月 13, 2013

idr_get_new*() and friends are about to be deprecated.  Convert to the
new idr_alloc() interface.

Only compile-tested.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Tested-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebd6c707

nfsd: remove unused get_new_stid() · 801cb2d6

由 Tejun Heo 提交于 3月 13, 2013

get_new_stid() is no longer used since commit 3abdb607 ("nfsd4:
simplify idr allocation").  Remove it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

801cb2d6

cifs: delay super block destruction until all cifsFileInfo objects are gone · 24261fc2

由 Mateusz Guzik 提交于 3月 08, 2013

cifsFileInfo objects hold references to dentries and it is possible that
these will still be around in workqueues when VFS decides to kill super
block during unmount.

This results in panics like this one:
BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:943!
[..]
Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0)
[..]
Call Trace:
 [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60
 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0
 [<ffffffff8119f946>] kill_anon_super+0x16/0x30
 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs]
 [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80
 [<ffffffff811a085e>] deactivate_super+0x4e/0x70
 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130
 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0
 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b

Fix this by making each cifsFileInfo object hold a reference to cifs
super block, which implicitly keeps VFS super block around as well.
Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

24261fc2

cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSY · 47c78f4a

由 Sachin Prabhu 提交于 3月 11, 2013

NT_SHARING_VIOLATION errors are mapped to ETXTBSY which is unexpected
for operations such as unlink where we can hit these errors.

The patch maps the error NT_SHARING_VIOLATION to EBUSY instead. The
patch also replaces all instances of ETXTBSY in
cifs_rename_pending_delete() with EBUSY.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

47c78f4a

13 3月, 2013 4 次提交

ext2: Fix BUG_ON in evict() on inode deletion · c288d296

由 Jan Kara 提交于 3月 13, 2013

Commit 8e3dffc6 introduced a regression where deleting inode with
large extended attributes leads to triggering
  BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
in fs/inode.c:evict(). That happens because freeing of xattr block
dirtied the inode and it happened after clear_inode() has been called.

Fix the issue by moving removal of xattr block into ext2_evict_inode()
before clear_inode() call close to a place where data blocks are
truncated. That is also more logical place and removes surprising
requirement that ext2_free_blocks() mustn't dirty the inode.
Reported-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>

c288d296

fs: Readd the fs module aliases. · fa7614dd

由 Eric W. Biederman 提交于 3月 12, 2013

I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module.  It turns out I was wrong.  At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does.  So at some point we may be delete these aliases without
problems.  However that day is not today.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

fa7614dd

Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d

由 Mathieu Desnoyers 提交于 2月 25, 2013

Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().

This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:

Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().

I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.

While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.

And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8aec0f5d

ext4: use s_extent_max_zeroout_kb value as number of kb · 4f42f80a

由 Lukas Czerner 提交于 3月 12, 2013

Currently when converting extent to initialized, we have to decide
whether to zeroout part/all of the uninitialized extent in order to
avoid extent tree growing rapidly.

The decision is made by comparing the size of the extent with the
configurable value s_extent_max_zeroout_kb which is in kibibytes units.

However when converting it to number of blocks we currently use it as it
was in bytes. This is obviously bug and it will result in ext4 _never_
zeroout extents, but rather always split and convert parts to
initialized while leaving the rest uninitialized in default setting.

Fix this by using s_extent_max_zeroout_kb as kibibytes.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

4f42f80a

12 3月, 2013 2 次提交

vfs: fix pipe counter breakage · a930d879

由 Al Viro 提交于 3月 12, 2013

If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.

That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there.  And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.

This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a930d879

ext4: use atomic64_t for the per-flexbg free_clusters count · 90ba983f

由 Theodore Ts'o 提交于 3月 11, 2013

A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow.  This was detected by PaX security patchset:

http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551

This bug was introduced in commit 9f24e420, so it's been around
since 2.6.30.  :-(

Fix this by using an atomic64_t for struct orlav_stats's
free_clusters.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

90ba983f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功