提交 · f51a822c315e9d4c4c67247bea10e4b8eb795af1 · openeuler / raspberrypi-kernel

02 5月, 2013 1 次提交

libceph: distinguish page array and pagelist count · d4b515fa

由 Alex Elder 提交于 2月 25, 2013

Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d4b515fa

26 4月, 2013 1 次提交

aio: fix possible invalid memory access when DEBUG is enabled · 91d80a84

由 Zhao Hongjiang 提交于 4月 26, 2013

dprintk() shouldn't access @ring after it's unmapped.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91d80a84

19 4月, 2013 1 次提交

Revert "block: add missing block_bio_complete() tracepoint" · 0a82a8d1

由 Linus Torvalds 提交于 4月 18, 2013

This reverts commit 3a366e61.

Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.

Jens says:
 "It's not quite clear why that is yet, so I think we should just revert
  the commit for 3.9 final (which I'm assuming is pretty close).

  The wifi is crap at the LSF hotel, so sending this email instead of
  queueing up a revert and pull request."
Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: NJens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a82a8d1

18 4月, 2013 3 次提交

hfsplus: fix potential overflow in hfsplus_file_truncate() · 12f267a2

由 Vyacheslav Dubeyko 提交于 4月 17, 2013

Change a u32 to loff_t hfsplus_file_truncate().
Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12f267a2

fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() · 23d9e482

由 Naoya Horiguchi 提交于 4月 17, 2013

Documentation/filesystems/proc.txt says about coredump_filter bitmask,

  Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
  effected by bit 5-6.

However current code can go into the subsequent flag checks of bit 0-4
for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work
as written in the document.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[3.7+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23d9e482

hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) · a2fce914

由 Naoya Horiguchi 提交于 4月 17, 2013

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma.  This behavior was
recently introduced by commit 314e51b9 ("mm: kill vma flag
VM_RESERVED and mm->reserved_vm counter").

This looks to me a serious regression, so let's fix it.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[3.7+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2fce914

14 4月, 2013 1 次提交

vfs: Revert spurious fix to spinning prevention in prune_icache_sb · 5b55d708

由 Suleiman Souhlal 提交于 4月 13, 2013

Revert commit 62a3ddef ("vfs: fix spinning prevention in prune_icache_sb").

This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.

Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.
Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b55d708

13 4月, 2013 1 次提交

Btrfs: make sure nbytes are right after log replay · 4bc4bee4

由 Josef Bacik 提交于 4月 05, 2013

While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4bc4bee4

12 4月, 2013 1 次提交

kthread: Prevent unpark race which puts threads on the wrong cpu · f2530dc7

由 Thomas Gleixner 提交于 4月 09, 2013

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NDave Hansen <dave@sr71.net>
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Acked-by: NPeter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f2530dc7

11 4月, 2013 2 次提交

cifs: Allow passwords which begin with a delimitor · c369c9a4

由 Sachin Prabhu 提交于 4月 09, 2013

Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

c369c9a4

NFSv4: Doh! Typo in the fix to nfs41_walk_client_list · eb04e0ac

由 Trond Myklebust 提交于 4月 10, 2013

Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd1: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list

eb04e0ac

10 4月, 2013 4 次提交

mnt: release locks on error path in do_loopback · e9c5d8a5

由 Andrey Vagin 提交于 4月 09, 2013

do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.

[   77.661566] ================================================
[   77.662939] [ BUG: lock held when returning to user space! ]
[   77.664104] 3.9.0-rc5+ #17 Not tainted
[   77.664982] ------------------------------------------------
[   77.666488] mount/514 is leaving the kernel with locks still held!
[   77.668027] 2 locks held by mount/514:
[   77.668817]  #0:  (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0
[   77.671755]  #1:  (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9c5d8a5

procfs: add proc_remove_subtree() · 8ce584c7

由 Al Viro 提交于 3月 30, 2013

just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8ce584c7

A
ecryptfs: close rmmod race · 52f21999
由 Al Viro 提交于 3月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
52f21999

NFSv4: Fix another potential state manager deadlock · fa332941

由 Trond Myklebust 提交于 4月 09, 2013

Don't hold the NFSv4 sequence id while we check for open permission.
The call to ACCESS may block due to reboot recovery.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fa332941

06 4月, 2013 3 次提交

NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list · 7b1f1fd1

由 Trond Myklebust 提交于 4月 05, 2013

It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.

Also fix the refcounting in nfs41_walk_client_list().

Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]

7b1f1fd1

NFSv4: Fix a memory leak in nfs4_discover_server_trunking · b193d59a

由 Trond Myklebust 提交于 4月 04, 2013

When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
the old one.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]

b193d59a

GFS2: Issue discards in 512b sectors · b2c87cae

由 Bob Peterson 提交于 3月 22, 2013

This patch changes GFS2's discard issuing code so that it calls
function sb_issue_discard rather than blkdev_issue_discard. The
code was calling blkdev_issue_discard and specifying the correct
sector offset and sector size, but blkdev_issue_discard expects
these values to be in terms of 512 byte sectors, even if the native
sector size for the device is different. Calling sb_issue_discard
with the BLOCK size instead ensures the correct block-to-512b-sector
translation. I verified that "minlen" is specified in blocks, so
comparing it to a number of blocks is correct.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b2c87cae

04 4月, 2013 5 次提交

GFS2: Fix unlock of fcntl locks during withdrawn state · c2952d20

由 Steven Whitehouse 提交于 3月 14, 2013

When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c2952d20

GFS2: return error if malloc failed in gfs2_rs_alloc() · 441362d0

由 Wei Yongjun 提交于 3月 11, 2013

The error code in gfs2_rs_alloc() is set to ENOMEM when error
but never be used, instead, gfs2_rs_alloc() always return 0.
Fix to return 'error'.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

441362d0

GFS2: use memchr_inv · 4146c3d4

由 Akinobu Mita 提交于 3月 07, 2013

Use memchr_inv to verify that the specified memory range is cleared.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>

4146c3d4

GFS2: use kmalloc for lvb bitmap · 57c7310b

由 David Teigland 提交于 3月 05, 2013

The temp lvb bitmap was on the stack, which could
be an alignment problem for __set_bit_le.  Use
kmalloc for it instead.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

57c7310b

ext4: fix big-endian bugs which could cause fs corruptions · 8cde7ad1

由 Zheng Liu 提交于 4月 03, 2013

When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out.  So
fix it.

[ Also fix a bug found by Dmitry Monakhov where we were missing
  le32_to_cpu() calls in the new indirect punch hole code.

  There are a number of other big endian warnings found by static code
  analyzers, but we'll wait for the next merge window to fix them all
  up.  These fixes are designed to be Obviously Correct by code
  inspection, and easy to demonstrate that it won't make any
  difference (and hence, won't introduce any bugs) on little endian
  architectures such as x86.  --tytso ]
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NCAI Qian <caiqian@redhat.com>
Reported-by: NChristian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>

8cde7ad1

02 4月, 2013 1 次提交

loop: prevent bdev freeing while device in use · c1681bf8

由 Anatol Pomozov 提交于 4月 01, 2013

struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]
Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1681bf8

30 3月, 2013 1 次提交

reiserfs: Fix warning and inode leak when deleting inode with xattrs · 35e5cbc0

由 Jan Kara 提交于 3月 29, 2013

After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
started failing to delete xattrs from inode. This was due to a buggy
test for '.' and '..' in fill_with_dentries() which resulted in passing
'.' and '..' entries to lookup_one_len() in some cases. That returned
error and so we failed to iterate over all xattrs of and inode.

Fix the test in fill_with_dentries() along the lines of the one in
lookup_one_len().
Reported-by: NPawel Zawora <pzawora@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

35e5cbc0

29 3月, 2013 1 次提交

Btrfs: don't drop path when printing out tree errors in scrub · d8fe29e9

由 Josef Bacik 提交于 3月 29, 2013

A user reported a panic where we were panicing somewhere in
tree_backref_for_extent from scrub_print_warning.  He only captured the trace
but looking at scrub_print_warning we drop the path right before we mess with
the extent buffer to print out a bunch of stuff, which isn't right.  So fix this
by dropping the path after we use the eb if we need to.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d8fe29e9

28 3月, 2013 9 次提交

Btrfs: fix wrong return value of btrfs_lookup_csum() · 82d130ff

由 Miao Xie 提交于 3月 28, 2013

If we don't find the expected csum item, but find a csum item which is
adjacent to the specified extent, we should return -EFBIG, or we should
return -ENOENT. But btrfs_lookup_csum() return -EFBIG even the csum item
is not adjacent to the specified extent. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

82d130ff

Btrfs: fix wrong reservation of csums · 39847c4d

由 Miao Xie 提交于 3月 28, 2013

We reserve the space for csums only when we write data into a file, in
the other cases, such as tree log, log replay, we don't do reservation,
so we can use the reservation of the transaction handle just for the former.
And for the latter, we should use the tree's own reservation. But the
function - btrfs_csum_file_blocks() didn't differentiate between these
two types of the cases, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

39847c4d

Btrfs: fix double free in the btrfs_qgroup_account_ref() · a7975026

由 Wang Shilong 提交于 3月 25, 2013

The function btrfs_find_all_roots is responsible to allocate
memory for 'roots' and free it if errors happen,so the caller should not
free it again since the work has been done.

Besides,'tmp' is allocated after the function btrfs_find_all_roots,
so we can return directly if btrfs_find_all_roots() fails.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a7975026

Btrfs: limit the global reserve to 512mb · fdf30d1c

由 Josef Bacik 提交于 3月 26, 2013

A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space. This is because the
global block reserve was taking up the entire free metadata space. This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done. This allowed the user to
complete his rsync without issues. Thanks

Cc: stable@vger.kernel.org
Reported-and-tested-by: NStefan Priebe <s.priebe@profihost.ag>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

fdf30d1c

Btrfs: hold the ordered operations mutex when waiting on ordered extents · db1d607d

由 Josef Bacik 提交于 3月 26, 2013

We need to hold the ordered_operations mutex while waiting on ordered extents
since we splice and run the ordered extents list. We need to make sure anybody
else who wants to wait on ordered extents does actually wait for them to be
completed. This will keep us from bailing out of flushing in case somebody is
already waiting on ordered extents to complete. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

db1d607d

Btrfs: fix space accounting for unlink and rename · 6e137ed3

由 Josef Bacik 提交于 3月 26, 2013

We are way over-reserving for unlink and rename. Rename is just some random
huge number and unlink accounts for tree log operations that don't actually
happen during unlink, not to mention the tree log doesn't take from the trans
block rsv anyway so it's completely useless. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6e137ed3

Btrfs: fix space leak when we fail to reserve metadata space · f4881bc7

由 Josef Bacik 提交于 3月 25, 2013

Dave reported a warning when running xfstest 275. We have been leaking delalloc
metadata space when our reservations fail. This is because we were improperly
calculating how much space to free for our checksum reservations. The problem
is we would sometimes free up space that had already been freed in another
thread and we would end up with negative usage for the delalloc space. This
patch fixes the problem by calculating how much space the other threads would
have already freed, and then calculate how much space we need to free had we not
done the reservation at all, and then freeing any excess space. This makes
xfstests 275 no longer have leaked space. Thanks

Cc: stable@vger.kernel.org
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

f4881bc7

Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes · adaa4b8e

由 Jan Schmidt 提交于 3月 21, 2013

When you take a snapshot, punch a hole where there has been data, then take
another snapshot and try to send an incremental stream, btrfs send would
give you EIO. That is because is_extent_unchanged had no support for holes
being punched. With this patch, instead of returning EIO we just return
0 (== the extent is not unchanged) and we're good.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Cc: Alexander Block <ablock84@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

adaa4b8e

vfs/splice: Fix missed checks in new __kernel_write() helper · 3e84f48e

由 Al Viro 提交于 3月 27, 2013

Commit 06ae43f3 ("Don't bother with redoing rw_verify_area() from
default_file_splice_from()") lost the checks to test existence of the
write/aio_write methods.  My apologies ;-/

Eventually, we want that in fs/splice.c side of things (no point
repeating it for every buffer, after all), but for now this is the
obvious minimal fix.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e84f48e

27 3月, 2013 5 次提交

userns: Restrict when proc and sysfs can be mounted · 87a8ebd6

由 Eric W. Biederman 提交于 3月 24, 2013

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

87a8ebd6

vfs: Carefully propogate mounts across user namespaces · 132c94e3

由 Eric W. Biederman 提交于 3月 22, 2013

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

132c94e3

vfs: Add a mount flag to lock read only bind mounts · 90563b19

由 Eric W. Biederman 提交于 3月 22, 2013

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

90563b19

userns: Don't allow creation if the user is chrooted · 3151527e

由 Eric W. Biederman 提交于 3月 15, 2013

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3151527e

Nest rename_lock inside vfsmount_lock · 7ea600b5

由 Al Viro 提交于 3月 26, 2013

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.
Spotted-by: NBrad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea600b5