提交 · 033193275b3ffcfe7f3fde7b569f3d207f6cd6a0 · openeuler / raspberrypi-kernel

23 3月, 2011 9 次提交

pagewalk: only split huge pages when necessary · 03319327

由 Dave Hansen 提交于 3月 22, 2011

Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
unconditionally split any transparent huge pages it runs in to.  In
practice, that means that anyone doing a

	cat /proc/$pid/smaps

will unconditionally break down every huge page in the process and depend
on khugepaged to re-collapse it later.  This is fairly suboptimal.

This patch changes that behavior.  It teaches each ->pmd_entry handler
(there are five) that they must break down the THPs themselves.  Also, the
_generic_ code will never break down a THP unless a ->pte_entry handler is
actually set.

This means that the ->pmd_entry handlers can now choose to deal with THPs
without breaking them down.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NEric B Munson <emunson@mgebm.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Cc: Michael J Wolf <mjwolf@us.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03319327

mm: hugetlbfs: change remove_from_page_cache · bd65cb86

由 Minchan Kim 提交于 3月 22, 2011

This patch series changes remove_from_page_cache()'s page ref counting
rule.  Page cache ref count is decreased in delete_from_page_cache().  So
we don't need to decrease the page reference in callers.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: William Irwin <wli@holomorphy.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd65cb86

mm: add replace_page_cache_page() function · ef6a3c63

由 Miklos Szeredi 提交于 3月 22, 2011

This function basically does:

     remove_from_page_cache(old);
     page_cache_release(old);
     add_to_page_cache_locked(new);

Except it does this atomically, so there's no possibility for the "add" to
fail because of a race.

If memory cgroups are enabled, then the memory cgroup charge is also moved
from the old page to the new.

This function is currently used by fuse to move pages into the page cache
on read, instead of copying the page contents.

[minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()]
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef6a3c63

9p: use the updated offset given by generic_write_checks · aaf0ef1d

由 M. Mohan Kumar 提交于 3月 16, 2011

Without this fix, even if a file is opened in O_APPEND mode, data will be
written at current file position instead of end of file.
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

aaf0ef1d

fs/9p: Add v9fs_dentry2v9ses · 42869c8a

由 Aneesh Kumar K.V 提交于 3月 08, 2011

Add the new static inline and use the same
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

42869c8a

fs/9p: Attach writeback_fid on first open with WR flag · 7add697a

由 Aneesh Kumar K.V 提交于 3月 08, 2011

We don't need writeback fid if we are only doing O_RDONLY open
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

7add697a

fs/9p: Open writeback fid in O_SYNC mode · ea59bb75

由 Aneesh Kumar K.V 提交于 3月 08, 2011

Older version of protocol don't support tsyncfs operation.
So for them force a O_SYNC flag on the server
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

ea59bb75

fs/9p: Use truncate_setsize instead of vmtruncate · 059c138b

由 Aneesh Kumar K.V 提交于 3月 08, 2011

convert vmtruncate usage to truncate_setsize. We also writeback
all dirty pages before doing 9p operations and on success call truncate_setsize.
This ensure that we continue sanely on failed truncate on the server. The
disadvantage is that we are now going to write back the content that get
thrown away later as a part of truncate.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

059c138b

fs/9p: Fix race in initializing writeback fid · 5a7e0a8c

由 Aneesh Kumar K.V 提交于 3月 08, 2011

When two process open the same file we can end up with both of them
allocating the writeback_fid. Add a new mutex which can be used
for synchronizing v9fs_inode member values.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

5a7e0a8c

22 3月, 2011 8 次提交

pstore: use mount option instead sysfs to tweak kmsg_bytes · 366f7e7a

由 Luck, Tony 提交于 3月 18, 2011

/sys/fs is a somewhat strange way to tweak what could more
obviously be tuned with a mount option.
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

366f7e7a

S
ceph: rename dentry_release -> d_release, fix comment · 147851d2
由 Sage Weil 提交于 3月 15, 2011
```
Just for consistency's sake.  Fix obsolete comment too.
Signed-off-by: NSage Weil <sage@newdream.net>
```
147851d2

ceph: add request to the tail of unsafe write list · 49bcb932

由 Henry C Chang 提交于 3月 15, 2011

In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

49bcb932

ceph: remove request from unsafe list if it is canceled/timed out · 78a25565

由 Henry C Chang 提交于 3月 15, 2011

This fixes the list corruption warning like this:

------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G        W  2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94
[<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43
[<ffffffff812351a3>] __list_add+0x68/0x81
[<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph]
[<ffffffff8111d2a0>] do_sync_write+0xe8/0x125
[<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3
[<ffffffff811e8521>] ? security_file_permission+0x16/0x18
[<ffffffff8111d864>] vfs_write+0xae/0x10b
[<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76
[<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
---[ end trace 08573eb9f07ff6f4 ]---
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

78a25565

S
ceph: move readahead default to fs/ceph from libceph · 80456f86
由 Sage Weil 提交于 3月 10, 2011
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
80456f86

ceph: add ino32 mount option · ad1fee96

由 Yehuda Sadeh 提交于 1月 21, 2011

The ino32 mount option forces the ceph fs to report 32 bit
ino values.  This is useful for 64 bit kernels with 32 bit userspace.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

ad1fee96

S
ceph: remove debugfs debug cruft · 21f3b5f1
由 Sage Weil 提交于 1月 19, 2011
```
Whoops!
Signed-off-by: NSage Weil <sage@newdream.net>
```
21f3b5f1

FS: lookup_mnt() is only used in the core fs routines now · 0f60f240

由 David Howells 提交于 3月 21, 2011

lookup_mnt() is only used in the core fs routines now, so it doesn't need to
be globally declared anymore. It isn't exported to modules at the moment, so
nothing that can be modularised seems to be using it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0f60f240

21 3月, 2011 15 次提交

fuse: make fuse_dentry_revalidate() RCU aware · e7c0a167

由 Miklos Szeredi 提交于 3月 21, 2011

Only bail out of fuse_dentry_revalidate() on LOOKUP_RCU when blocking
is actually necessary.

CC: Nick Piggin <npiggin@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

e7c0a167

fuse: make fuse_permission() RCU aware · 19690ddb

由 Miklos Szeredi 提交于 3月 21, 2011

Only bail out of fuse_permission() on IPERM_FLAG_RCU when blocking is
actually necessary.

CC: Nick Piggin <npiggin@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

19690ddb

fuse: wakeup pollers on connection release/abort · 357ccf2b

由 Bryan Green 提交于 3月 01, 2011

If a fuse dev connection is broken, wake up any
processes that are blocking, in a poll system call,
on one of the files in the now defunct filesystem.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

357ccf2b

fuse: reduce size of struct fuse_request · 07d5f69b

由 Miklos Szeredi 提交于 3月 21, 2011

Reduce the size of struct fuse_request by removing cuse_init_out from
the request structure and allocating it dinamically instead.

CC: Tejun Heo <tj@kernel.org>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

07d5f69b

bfs: fix bitmap size argument to find_first_zero_bit() · 69b195be

由 Akinobu Mita 提交于 3月 21, 2011

The usage of find_first_zero_bit() in bfs_create() is wrong for two
reasons.

The bitmap size argument to find_first_zero_bit() is info->si_lasti but
the correct bitmap size is info->si_lasti + 1 as info->si_lasti is the
last valid index in info->si_imap bitmap.

Another problem is that it is impossible to detect that info->si_imap
bitmap is full because there is an off-by-one bug in the return value
check for find_first_zero_bit().  If no zero bits exist in info->si_imap,
find_first_zero_bit() returns info->si_lasti.  But the check can't catch
it due to the off-by-one.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: N"Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

69b195be

fs: Use BUG_ON(!mnt) at dentry_open(). · c212f9aa

由 Tetsuo Handa 提交于 1月 19, 2011

dentry_open() requires callers to pass a valid vfsmount.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c212f9aa

fs: devpts_pty_new() return -ENOMEM if dentry allocation failed · aa597bc1

由 Andrey Vagin 提交于 2月 08, 2011

In this case nobody can open a slave point, so will be better return
from devpts_pty_new()

Now we should not check error code from d_find_alias() in
devpts_pty_kill(), because the dentry exists all times.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa597bc1

nfs: lock() vs unlock() typo · 1c34092a

由 Dan Carpenter 提交于 3月 20, 2011

These should be spin_unlock() instead of spin_lock().  It's a typo.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1c34092a

pstore: fix leaking ->i_private · a872d510

由 Tony Luck 提交于 3月 18, 2011

Move kfree() of i_private out of ->unlink() and into ->evict_inode()
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a872d510

introduce sys_syncfs to sync a single file system · b7ed78f5

由 Sage Weil 提交于 3月 10, 2011

It is frequently useful to sync a single file system, instead of all
mounted file systems via sync(2):

 - On machines with many mounts, it is not at all uncommon for some of
   them to hang (e.g. unresponsive NFS server).  sync(2) will get stuck on
   those and may never get to the one you do care about (e.g., /).
 - Some applications write lots of data to the file system and then
   want to make sure it is flushed to disk.  Calling fsync(2) on each
   file introduces unnecessary ordering constraints that result in a large
   amount of sub-optimal writeback/flush/commit behavior by the file
   system.

There are currently two ways (that I know of) to sync a single super_block:

 - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
   mapping, which isn't usually desirable, and doesn't work for non-block
   file systems.
 - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
   current implemention.  Relying on this little-known side effect for
   something like data safety sounds foolish.

Both of these approaches require root privileges, which some applications
do not have (nor should they need?) given that sync(2) is an unprivileged
operation.

This patch introduces a new system call syncfs(2) that takes an fd and
syncs only the file system it references.  Maybe someday we can

 $ sync /some/path

and not get

 sync: ignoring all arguments

The syscall is motivated by comments by Al and Christoph at the last LSF.
syncfs(2) seems like an appropriate name given statfs(2).

A similar ioctl was also proposed a while back, see
	http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7ed78f5

Small typo fix... · 1bef8291

由 Holger Hans Peter Freyther 提交于 2月 24, 2011

Hi,

I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...

>From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment

The function is called umh_pipe_setup not uhm_pipe_setup.
Signed-off-by: NHolger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1bef8291

Filesystem: fifo: Fixed coding style issue. · ff38c083

由 David Jenni 提交于 2月 23, 2011

Fixed coding style issue.
Signed-off-by: NDavid Jenni <dave.j@gmx.ch>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff38c083

B
fs/inode: Fix kernel-doc format for inode_init_owner · eaae668d
由 Ben Hutchings 提交于 2月 15, 2011
```
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
eaae668d

select: remove unused MAX_SELECT_SECONDS · 2c3d44dc

由 Namhyung Kim 提交于 1月 21, 2011

Remove the leftover from the commit 8ff3e8e8 ("select:
switch select() and poll() over to hrtimers").
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c3d44dc

vfs: cleanup do_vfs_ioctl() · 27a4f7e6

由 Namhyung Kim 提交于 1月 17, 2011

Move declaration of 'inode' to beginning of the function. Since it
is referenced directly or indirectly (in case of FIFREEZE/FITHAW/
FS_IOC_FIEMAP) it's not harmful IMHO. And remove unnecessary casts
using 'argp' instead.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

27a4f7e6

18 3月, 2011 8 次提交

fs: call security_d_instantiate in d_obtain_alias V2 · 24ff6663

由 Josef Bacik 提交于 11月 18, 2010

While trying to track down some NFS problems with BTRFS, I kept noticing I was
getting -EACCESS for no apparent reason. Eric Paris and printk() helped me
figure out that it was SELinux that was giving me grief, with the following
denial

type=AVC msg=audit(1290013638.413:95): avc: denied { 0x800000 } for pid=1772
comm="nfsd" name="" dev=sda1 ino=256 scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=file

Turns out this is because in d_obtain_alias if we can't find an alias we create
one and do all the normal instantiation stuff, but we don't do the
security_d_instantiate.

Usually we are protected from getting a hashed dentry that hasn't yet run
security_d_instantiate() by the parent's i_mutex, but obviously this isn't an
option there, so in order to deal with the case that a second thread comes in
and finds our new dentry before we get to run security_d_instantiate(), we go
ahead and call it if we find a dentry already. Eric assures me that this is ok
as the code checks to see if the dentry has been initialized already so calling
security_d_instantiate() against the same dentry multiple times is ok. With
this patch I'm no longer getting errant -EACCESS values.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

24ff6663

A
lose 'mounting_here' argument in ->d_manage() · 1aed3e42
由 Al Viro 提交于 3月 18, 2011
```
it's always false...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1aed3e42
A
don't pass 'mounting_here' flag to follow_down() · 7cc90cc3
由 Al Viro 提交于 3月 18, 2011
```
it's always false now
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7cc90cc3

change the locking order for namespace_sem · b12cea91

由 Al Viro 提交于 3月 18, 2011

Have it nested inside ->i_mutex.  Instead of using follow_down()
under namespace_sem, followed by grabbing i_mutex and checking that
mountpoint to be is not dead, do the following:
	grab i_mutex
	check that it's not dead
	grab namespace_sem
	see if anything is mounted there
	if not, we've won
	otherwise
		drop locks
		put_path on what we had
		replace with what's mounted
		retry everything with new mountpoint to be

New helper (lock_mount()) does that.  do_add_mount(), do_move_mount(),
do_loopback() and pivot_root() switched to it; in case of the last
two that eliminates a race we used to have - original code didn't
do follow_down().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b12cea91

fix deadlock in pivot_root() · 27cb1572

由 Al Viro 提交于 3月 18, 2011

Don't hold vfsmount_lock over the loop traversing ->mnt_parent;
do check_mnt(new.mnt) under namespace_sem instead; combined with
namespace_sem held over all that code it'll guarantee the stability
of ->mnt_parent chain all the way to the root.

Doing check_mnt() outside of namespace_sem in case of pivot_root()
is wrong anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

27cb1572

vfs: split off vfsmount-related parts of vfs_kern_mount() · 9d412a43

由 Al Viro 提交于 3月 17, 2011

new function: mount_fs().  Does all work done by vfs_kern_mount()
except the allocation and filling of vfsmount; returns root dentry
or ERR_PTR().

vfs_kern_mount() switched to using it and taken to fs/namespace.c,
along with its wrappers.

alloc_vfsmnt()/free_vfsmnt() made static.

functions in namespace.c slightly reordered.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9d412a43

Some fixes for pstore · fbe0aa1f

由 Tony Luck 提交于 3月 17, 2011

1) Change from ->get_sb() to ->mount()
2) Use mount_single() instead of mount_nodev()
3) Pulled in ramfs_get_inode() & trimmed to what I need for pstore
4) Drop the ugly pstore_writefile() Just save data using kmalloc() and
   provide a pstore_file_read() that uses simple_read_from_buffer().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fbe0aa1f

kill simple_set_mnt() · 474a00ee

由 Al Viro 提交于 3月 17, 2011

not needed anymore, since all users (->get_sb() instances) are gone.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

474a00ee