提交 · 6f904ff0e39ea88f81eb77e8dfb4e1238492f0a8 · openeuler / raspberrypi-kernel

08 8月, 2010 12 次提交

writeback: harmonize writeback threads naming · 6f904ff0

由 Artem Bityutskiy 提交于 7月 25, 2010

The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.

hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else.  This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer

This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()'              -> 'bdi_forker_thread()'

because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.

This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".

Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6f904ff0

coda: fixup clash with block layer REQ_* defines · 4aeefdc6

由 Jens Axboe 提交于 8月 03, 2010

CODA should not be using defines in the global name space of
that nature, prefix them with CODA_.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

4aeefdc6

writeback: remove wb in get_next_work_item · 08852b6d

由 Minchan Kim 提交于 8月 03, 2010

83ba7b07 cleans up the writeback.
So we don't use wb any more in get_next_work_item.
Let's remove unnecessary argument.

CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

08852b6d

splice: fix misuse of SPLICE_F_NONBLOCK · 6965031d

由 Miklos Szeredi 提交于 8月 03, 2010

SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the
pipe.  In __generic_file_splice_read(), however, it causes an EAGAIN
if the page is currently being read.

This makes it impossible to write an application that only wants
failure if the pipe is full.  For example if the same process is
handling both ends of a pipe and isn't otherwise able to determine
whether a splice to the pipe will fill it or not.

We could make the read non-blocking on O_NONBLOCK or some other splice
flag, but for now this is the simplest fix.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6965031d

block: push down BKL into .open and .release · 6e9624b8

由 Arnd Bergmann 提交于 8月 07, 2010

The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.

This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.

The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.

Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6e9624b8

writeback: Add tracing to balance_dirty_pages · 028c2dd1

由 Dave Chinner 提交于 7月 07, 2010

Tracing high level background writeback events is good, but it doesn't
give the entire picture. Add visibility into write throttling to catch IO
dispatched by foreground throttling of processing dirtying lots of pages.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

028c2dd1

writeback: Initial tracing support · 455b2864

由 Dave Chinner 提交于 7月 07, 2010

Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:

sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0

This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.

The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

455b2864

gcc-4.6: fs: fix unused but set warnings · 1676effc

由 Andi Kleen 提交于 6月 21, 2010

No real bugs I believe, just some dead code, and some
shut up code.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

1676effc

writeback: merge bdi_writeback_task and bdi_start_fn · 08243900

由 Christoph Hellwig 提交于 6月 19, 2010

Move all code for the writeback thread into fs/fs-writeback.c instead of
splitting it over two functions in two files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

08243900

writeback: remove wb_list · c1955ce3

由 Christoph Hellwig 提交于 6月 19, 2010

The wb_list member of struct backing_device_info always has exactly one
element.  Just use the direct bdi->wb pointer instead and simplify some
code.

Also remove bdi_task_init which is now trivial to prepare for the next
patch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c1955ce3

block: unify flags for struct bio and struct request · 7b6d91da

由 Christoph Hellwig 提交于 8月 07, 2010

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7b6d91da

block: BARRIER request should imply SYNC · 41f2df62

由 Christoph Hellwig 提交于 6月 17, 2010

A barrier request should by defintion have priority in get_request
and let the queue be unplugged immediately as it's blocking all forward
progress due to the queue draining.

Most filesystems already get this implicitly by the way how submit_bh
treats the buffer_ordered flag, and gfs2 sets it explicitly.  But btrfs
and XFS are still forgetting to set the flag, as is blkdev_issue_flush
and some places in DM/MD.

For XFS on metadata heavy workloads this gives a consistent speedup
in the 2-3% range.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

41f2df62

02 8月, 2010 1 次提交

NFS: Fix a typo in include/linux/nfs_fs.h · 77a63f3d

由 Trond Myklebust 提交于 8月 01, 2010

nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.

Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.
Reported-and-tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

77a63f3d

31 7月, 2010 4 次提交

CIFS: Remove __exit mark from cifs_exit_dns_resolver() · 51c20fcc

由 David Howells 提交于 7月 30, 2010

Remove the __exit mark from cifs_exit_dns_resolver() as it's called by the
module init routine in case of error, and so may have been discarded during
linkage.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51c20fcc

T
NFS: Ensure that writepage respects the nonblock flag · cfb506e1
由 Trond Myklebust 提交于 7月 30, 2010
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
cfb506e1

NFS: kswapd must not block in nfs_release_page · b608b283

由 Trond Myklebust 提交于 7月 30, 2010

See https://bugzilla.kernel.org/show_bug.cgi?id=16056

If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

b608b283

nfs: include space for the NUL in root path · 674b2222

由 Dan Carpenter 提交于 7月 13, 2010

In root_nfs_name() it does the following:

        if (strlen(buf) + strlen(cp) > NFS_MAXPATHLEN) {
                printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n");
                return -1;
        }
        sprintf(nfs_export_path, buf, cp);

In the original code if (strlen(buf) + strlen(cp) == NFS_MAXPATHLEN)
then the sprintf() would lead to an overflow.  Generally the rest of the
code assumes that the path can have NFS_MAXPATHLEN (1024) characters and
a NUL terminator so the fix is to add space to the nfs_export_path[]
buffer.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

674b2222

30 7月, 2010 1 次提交

CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials · de09a977

由 David Howells 提交于 7月 29, 2010

It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.

What happens is that get_task_cred() can race with commit_creds():

	TASK_1			TASK_2			RCU_CLEANER
	-->get_task_cred(TASK_2)
	rcu_read_lock()
	__cred = __task_cred(TASK_2)
				-->commit_creds()
				old_cred = TASK_2->real_cred
				TASK_2->real_cred = ...
				put_cred(old_cred)
				  call_rcu(old_cred)
		[__cred->usage == 0]
	get_cred(__cred)
		[__cred->usage == 1]
	rcu_read_unlock()
							-->put_cred_rcu()
							[__cred->usage == 1]
							panic()

However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.

If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.

We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.

Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:

kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
 ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
 [<ffffffff810698cd>] put_cred+0x13/0x15
 [<ffffffff81069b45>] commit_creds+0x16b/0x175
 [<ffffffff8106aace>] set_current_groups+0x47/0x4e
 [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
 RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de09a977

29 7月, 2010 2 次提交

ecryptfs: Bugfix for error related to ecryptfs_hash_buckets · a6f80fb7

由 Andre Osterhues 提交于 7月 13, 2010

The function ecryptfs_uid_hash wrongly assumes that the
second parameter to hash_long() is the number of hash
buckets instead of the number of hash bits.
This patch fixes that and renames the variable
ecryptfs_hash_buckets to ecryptfs_hash_bits to make it
clearer.

Fixes: CVE-2010-2492
Signed-off-by: NAndre Osterhues <aosterhues@escrypt.com>
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6f80fb7

GFS2: Use kmalloc when possible for ->readdir() · d2a97a4e

由 Steven Whitehouse 提交于 7月 28, 2010

If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Also using GFP_NOFS to avoid any issues wrt to deleting inodes
while under a glock, and suggestion from Linus to factor out
the alloc/dealloc.

I've given this a test with a variety of different sized
directories and it seems to work ok.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2a97a4e

28 7月, 2010 2 次提交

ceph: use complete_all and wake_up_all · 03066f23

由 Yehuda Sadeh 提交于 7月 27, 2010

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

03066f23

9p: Pass the correct end of buffer to p9stat_read · da7ddd32

由 Latchesar Ionkov 提交于 7月 19, 2010

Pass the correct end of the buffer to p9stat_read.
Signed-off-by: NLatchesar Ionkov <lucho@ionkov.net>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

da7ddd32

27 7月, 2010 3 次提交

sysfs: allow creating symlinks from untagged to tagged directories · d3300212

由 Eric W. Biederman 提交于 7月 20, 2010

Supporting symlinks from untagged to tagged directories is reasonable,
and needed to support CONFIG_SYSFS_DEPRECATED.  So don't fail a prior
allowing that case to work.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

d3300212

sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories. · 521d0453

由 Eric W. Biederman 提交于 7月 20, 2010

This happens for network devices when SYSFS_DEPRECATED is enabled.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

521d0453

sysfs: Don't allow the creation of symlinks we can't remove · 96d6523a

由 Eric W. Biederman 提交于 7月 08, 2010

Recently my tagged sysfs support revealed a flaw in the device core
that a few rare drivers are running into such that we don't always put
network devices in a class subdirectory named net/.

Since we are not creating the class directory the network devices wind
up in a non-tagged directory, but the symlinks to the network devices
from /sys/class/net are in a tagged directory.  All of which works
until we go to remove or rename the symlink.  When we remove or rename
a symlink we look in the namespace of the target of the symlink.
Since the target of the symlink is in a non-tagged sysfs directory we
don't have a namespace to look in, and we fail to remove the symlink.

Detect this problem up front and simply don't create symlinks we won't
be able to remove later.  This prevents symlink leakage and fails in
a much clearer and more understandable way.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

96d6523a

25 7月, 2010 1 次提交
- R
  ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES" · 25848b3e
  由 Robert P. J. Day 提交于 7月 24, 2010
```
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NSage Weil <sage@newdream.net>
```
  25848b3e
24 7月, 2010 4 次提交

ceph: fix dentry lease release · 1dadcce3

由 Sage Weil 提交于 7月 23, 2010

When we embed a dentry lease release notification in a request, invalidate
our lease so we don't think we still have it.  Otherwise we can get all
sorts of incorrect client behavior when multiple clients are interacting
with the same part of the namespace.
Signed-off-by: NSage Weil <sage@newdream.net>

1dadcce3

ceph: fix leak of dentry in ceph_init_dentry() error path · 8c696737

由 Sage Weil 提交于 7月 22, 2010

If we fail to allocate a ceph_dentry_info, don't leak the dn reference.
Signed-off-by: NSage Weil <sage@newdream.net>

8c696737

ceph: fix pg_mapping leak on pg_temp updates · bc4fdca8

由 Sage Weil 提交于 7月 20, 2010

Free the ceph_pg_mapping structs when they are removed from the pg_temp
rbtree.  Also fix a leak in the __insert_pg_mapping() error path.
Signed-off-by: NSage Weil <sage@newdream.net>

bc4fdca8

ceph: fix d_release dop for snapdir, snapped dentries · 252af521

由 Sage Weil 提交于 7月 22, 2010

We need to set the d_release dop for snapdir and snapped dentries so that
the ceph_dentry_info struct gets released. We also use the dcache to
cache readdir results when possible, which only works if we know when
dentries are dropped from the cache. Since we don't use the dcache for
readdir in the hidden snapdir, avoid that case in ceph_dentry_release.
Signed-off-by: NSage Weil <sage@newdream.net>

252af521

23 7月, 2010 2 次提交

ceph: avoid dcache readdir for snapdir · a0dff78d

由 Sage Weil 提交于 7月 22, 2010

We should always go to the MDS for readdir on the hidden snapdir.  The
set of snapshots can change at any time; the client can't trust its cache
for that.
Signed-off-by: NSage Weil <sage@newdream.net>

a0dff78d

CIFS: Fix a malicious redirect problem in the DNS lookup code · 4c0c03ca

由 David Howells 提交于 7月 22, 2010

Fix the security problem in the CIFS filesystem DNS lookup code in which a
malicious redirect could be installed by a random user by simply adding a
result record into one of their keyrings with add_key() and then invoking a
CIFS CFS lookup [CVE-2010-2524].

This is done by creating an internal keyring specifically for the caching of
DNS lookups.  To enforce the use of this keyring, the module init routine
creates a set of override credentials with the keyring installed as the thread
keyring and instructs request_key() to only install lookup result keys in that
keyring.

The override is then applied around the call to request_key().

This has some additional benefits when a kernel service uses this module to
request a key:

 (1) The result keys are owned by root, not the user that caused the lookup.

 (2) The result keys don't pop up in the user's keyrings.

 (3) The result keys don't come out of the quota of the user that caused the
     lookup.

The keyring can be viewed as root by doing cat /proc/keys:

2a0ca6c3 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: 1/4

It can then be listed with 'keyctl list' by root.

	# keyctl list 0x2a0ca6c3
	1 key in keyring:
	726766307: --alswrv     0     0 dns_resolver: foo.bar.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve French <smfrench@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c0c03ca

22 7月, 2010 1 次提交

Fix up trivial spelling errors ('taht' -> 'that') · a4ce96ac

由 Linus Torvalds 提交于 7月 21, 2010

Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.
Reported-by: NLucas <canolucas@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a4ce96ac

20 7月, 2010 5 次提交

xfs: track AGs with reclaimable inodes in per-ag radix tree · 16fd5367

由 Dave Chinner 提交于 7月 20, 2010

https://bugzilla.kernel.org/show_bug.cgi?id=16348

When the filesystem grows to a large number of allocation groups,
the summing of recalimable inodes gets expensive. In many cases,
most AGs won't have any reclaimable inodes and so we are wasting CPU
time aggregating over these AGs. This is particularly important for
the inode shrinker that gets called frequently under memory
pressure.

To avoid the overhead, track AGs with reclaimable inodes in the
per-ag radix tree so that we can find all the AGs with reclaimable
inodes via a simple gang tag lookup. This involves setting the tag
when the first reclaimable inode is tracked in the AG, and removing
the tag when the last reclaimable inode is removed from the tree.
Then the summation process becomes a loop walking the radix tree
summing AGs with the reclaim tag set.

This significantly reduces the overhead of scanning - a 6400 AG
filesystea now only uses about 25% of a cpu in kswapd while slab
reclaim progresses instead of being permanently stuck at 100% CPU
and making little progress. Clean filesystems filesystems will see
no overhead and the overhead only increases linearly with the number
of dirty AGs.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

16fd5367

xfs: convert inode shrinker to per-filesystem contexts · 70e60ce7

由 Dave Chinner 提交于 7月 20, 2010

Now the shrinker passes us a context, wire up a shrinker context per
filesystem. This allows us to remove the global mount list and the
locking problems that introduced. It also means that a shrinker call
does not need to traverse clean filesystems before finding a
filesystem with reclaimable inodes.  This significantly reduces
scanning overhead when lots of filesystems are present.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

70e60ce7

Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE · 2ebc3464

由 Dan Rosenberg 提交于 7月 19, 2010

1.  The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check
whether the donor file is append-only before writing to it.

2.  The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer
overflow that allows a user to specify an out-of-bounds range to copy
from the source file (if off + len wraps around).  I haven't been able
to successfully exploit this, but I'd imagine that a clever attacker
could use this to read things he shouldn't.  Even if it's not
exploitable, it couldn't hurt to be safe.
Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
cc: stable@kernel.org
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2ebc3464

Btrfs: fix CLONE ioctl destination file size expansion to block boundary · b5384d48

由 Sage Weil 提交于 6月 12, 2010

The CLONE and CLONE_RANGE ioctls round up the range of extents being
cloned to the block size when the range to clone extends to the end of file
(this is always the case with CLONE).  It was then using that offset when
extending the destination file's i_size.  Fix this by not setting i_size
beyond the originally requested ending offset.

This bug was introduced by a22285a6 (2.6.35-rc1).
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b5384d48

Btrfs: fix split_leaf double split corner case · 99d8f83c

由 Chris Mason 提交于 7月 07, 2010

split_leaf was not properly balancing leaves when it was forced to
split a leaf twice.  This commit adds an extra push left and right
before forcing the double split in hopes of getting the slot where
we want to insert at either the start or end of the leaf.

If the extra pushes do work, then we are able to avoid splitting twice
and we keep the tree properly balanced.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

99d8f83c

19 7月, 2010 2 次提交

[S390] dasd: use correct label location for diag fba disks · cffab6bc

由 Peter Oberparleiter 提交于 7月 19, 2010

Partition boundary calculation fails for DASD FBA disks under the
following conditions:
- disk is formatted with CMS FORMAT with a blocksize of more than
  512 bytes
- all of the disk is reserved to a single CMS file using CMS RESERVE
- the disk is accessed using the DIAG mode of the DASD driver

Under these circumstances, the partition detection code tries to
read the CMS label block containing partition-relevant information
from logical block offset 1, while it is in fact located at physical
block offset 1.

Fix this problem by using the correct CMS label block location
depending on the device type as determined by the DASD SENSE ID
information.
Signed-off-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

cffab6bc

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0