提交 · d6952123b53cc8b334df69bba2cd0063b0d88f68 · openanolis / cloud-kernel

01 8月, 2011 8 次提交

switch posix_acl_equiv_mode() to umode_t * · d6952123

由 Al Viro 提交于 7月 23, 2011

... so that &inode->i_mode could be passed to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6952123

A
switch posix_acl_create() to umode_t * · d3fb6120
由 Al Viro 提交于 7月 23, 2011
```
so we can pass &inode->i_mode to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3fb6120

block: initialise bd_super in bdget() · 782b94cd

由 Lachlan McIlroy 提交于 6月 30, 2011

bd_super is currently reset to NULL in kill_block_super() so we rely on previous
users of the block_device object to initialise this value for the next user.
This quirk was exposed on RHEL5 when a third party filesystem did not always use
kill_block_super() and therefore bd_super wasn't being reset when a block_device
object was recycled within the cache. This may not be a problem upstream but
makes sense to be defensive.
Signed-off-by: NLachlan McIlroy <lmcilroy@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

782b94cd

vfs: avoid call to inode_lru_list_del() if possible · c4ae0c65

由 Eric Dumazet 提交于 7月 28, 2011

inode_lru_list_del() is expensive because of per superblock lru locking,
while some inodes are not in lru list.

Adding a check in iput_final() can speedup pipe/sockets workloads on
SMP.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c4ae0c65

vfs: avoid taking inode_hash_lock on pipes and sockets · f2ee7abf

由 Eric Dumazet 提交于 7月 28, 2011

Some inodes (pipes, sockets, ...) are not hashed, no need to take
contended inode_hash_lock at dismantle time.

nice speedup on SMP machines on socket intensive workloads.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f2ee7abf

vfs: conditionally call inode_wb_list_del() · b12362bd

由 Eric Dumazet 提交于 7月 28, 2011

Some inodes (pipes, sockets, ...) are not in bdi writeback list.

evict() can avoid calling inode_wb_list_del() and its expensive spinlock
by checking inode i_wb_list being empty or not.

At this point, no other cpu/user can concurrently manipulate this inode
i_wb_list
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b12362bd

VFS: Fix automount for negative autofs dentries · 5a30d8a2

由 David Howells 提交于 7月 11, 2011

Autofs may set the DCACHE_NEED_AUTOMOUNT flag on negative dentries.  These
need attention from the automounter daemon regardless of the LOOKUP_FOLLOW flag.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5a30d8a2

Btrfs: load the key from the dir item in readdir into a fake dentry · b4aff1f8

由 Josef Bacik 提交于 6月 28, 2011

In btrfs we have 2 indexes for inodes. One is for readdir, it's in this nice
sequential order and works out brilliantly for readdir. However if you use ls,
it usually stat's each file it gets from readdir. This is where the second
index comes in, which is based on a hash of the name of the file. So then the
lookup has to lookup this index, and then lookup the inode. The index lookup is
going to be in random order (since its based on the name hash), which gives us
less than stellar performance. Since we know the inode location from the
readdir index, I create a dummy dentry and copy the location key into
dentry->d_fsdata. Then on lookup if we have d_fsdata we use that location to
lookup the inode, avoiding looking up the other directory index. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b4aff1f8

28 7月, 2011 1 次提交
- A
  hppfs: missing include · d6b722aa
  由 Al Viro 提交于 7月 27, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d6b722aa
27 7月, 2011 31 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

fs/exec.c:acct_arg_size(): ptl is no longer needed for add_mm_counter() · 32e107f7

由 Oleg Nesterov 提交于 7月 26, 2011

acct_arg_size() takes ->page_table_lock around add_mm_counter() if
!SPLIT_RSS_COUNTING.  This is not needed after commit 172703b0 ("mm:
delete non-atomic mm counter implementation").
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32e107f7

exec: do not retry load_binary method if CONFIG_MODULES=n · b4edf8bd

由 Tetsuo Handa 提交于 7月 26, 2011

If CONFIG_MODULES=n, it makes no sense to retry the list of binary formats
handler because the list will not be modified by request_module().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Richard Weinberger <richard@nod.at>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4edf8bd

exec: do not call request_module() twice from search_binary_handler() · 91219352

由 Tetsuo Handa 提交于 7月 26, 2011

Currently, search_binary_handler() tries to load binary loader module
using request_module() if a loader for the requested program is not yet
loaded. But second attempt of request_module() does not affect the result
of search_binary_handler().

If request_module() triggered recursion, calling request_module() twice
causes 2 to the power of MAX_KMOD_CONCURRENT (= 50) repetitions. It is
not an infinite loop but is sufficient for users to consider as a hang up.

Therefore, this patch changes not to call request_module() twice, making 1
to the power of MAX_KMOD_CONCURRENT repetitions in case of recursion.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: NRichard Weinberger <richard@nod.at>
Tested-by: NRichard Weinberger <richard@nod.at>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91219352

fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP · aacb3d17

由 Michal Hocko 提交于 7月 26, 2011

Commit a8bef8ff ("mm: migration: avoid race between
shift_arg_pages() and rmap_walk() during migration by not migrating
temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS
and VM_STACK_INCOMPLETE_SETUP do not overlap.  The check is a compile
time one, so BUILD_BUG_ON is more appropriate.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aacb3d17

proc: fix a race in do_io_accounting() · 293eb1e7

由 Vasiliy Kulikov 提交于 7月 26, 2011

If an inode's mode permits opening /proc/PID/io and the resulting file
descriptor is kept across execve() of a setuid or similar binary, the
ptrace_may_access() check tries to prevent using this fd against the
task with escalated privileges.

Unfortunately, there is a race in the check against execve().  If
execve() is processed after the ptrace check, but before the actual io
information gathering, io statistics will be gathered from the
privileged process.  At least in theory this might lead to gathering
sensible information (like ssh/ftp password length) that wouldn't be
available otherwise.

Holding task->signal->cred_guard_mutex while gathering the io
information should protect against the race.

The order of locking is similar to the one inside of ptrace_attach():
first goes cred_guard_mutex, then lock_task_sighand().
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

293eb1e7

procfs: return ENOENT on opening a being-removed proc entry · d2857e79

由 Daisuke Ogino 提交于 7月 26, 2011

Change the return value to ENOENT.  This return value is then returned
when opening the proc entry that have been removed.  For example,
open("/proc/bus/pci/XX/YY") when the corresponding device is being
hot-removed.
Signed-off-by: NDaisuke Ogino <ogino.daisuke@jp.fujitsu.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2857e79

do_coredump: fix the "ispipe" error check · 99b64567

由 Oleg Nesterov 提交于 7月 26, 2011

do_coredump() assumes that if format_corename() fails it should return
-ENOMEM.  This is not true, for example cn_print_exe_file() can propagate
the error from d_path.  Even if it was true, this is too fragile.  Change
the code to check "ispipe < 0".
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Reviewed-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99b64567

coredump: escape / in hostname and comm · 2c563731

由 Jiri Slaby 提交于 7月 26, 2011

Change every occurence of / in comm and hostname to !.  If the process
changes its name to contain /, the core is not dumped (if the directory
tree doesn't exist like that).  The same with hostname being something
like myhost/3.  Fix this behaviour by using the escape loop used in %E.
(We extract it to a separate function.)

Now both with comm == myprocess/1 and hostname == myhost/1, the core is
dumped like (kernel.core_pattern='core.%p.%e.%h):
core.2349.myprocess!1.myhost!1
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c563731

coredump: use task comm instead of (unknown) · 3141c8b1

由 Jiri Slaby 提交于 7月 26, 2011

If we don't know the file corresponding to the binary (i.e.  exe_file is
unknown), use "task->comm (path unknown)" instead of simple "(unknown)"
as suggested by ak.

The fallback is the same as %e except it will append "(path unknown)".
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3141c8b1

merge fchmod() and fchmodat() guts, kill ancient broken kludge · e57712eb

由 Al Viro 提交于 7月 26, 2011

The kludge in question is undocumented and doesn't work for 32bit
binaries on amd64, sparc64 and s390.  Passing (mode_t)-1 as
mode had (since 0.99.14v and contrary to behaviour of any
other Unix, prescriptions of POSIX, SuS and our own manpages)
was kinda-sorta no-op.  Note that any software relying on
that (and looking for examples shows none) would be visibly
broken on sparc64, where practically all userland is built
32bit.  No such complaints noticed...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e57712eb

A
xfs: fix misspelled S_IS...() · 03209378
由 Al Viro 提交于 7月 25, 2011
```
mode_t is not a bitmap...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
03209378
A
xfs: get rid of open-coded S_ISREG(), etc. · abbede1b
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
abbede1b

ceph: document unlocked d_parent accesses · d79698da

由 Sage Weil 提交于 7月 26, 2011

For the most part we don't care about racing with rename when directing
MDS requests; either the old or new parent is fine.  Document that, and
do some minor cleanup.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

d79698da

ceph: explicitly reference rename old_dentry parent dir in request · 41b02e1f

由 Sage Weil 提交于 7月 26, 2011

We carry a pin on the parent directory for the rename source and dest
dentries.  For the source it's r_locked_dir; we need to explicitly
reference the old_dentry parent as well, since the dentry's d_parent may
change between when the request was created and pinned and when it is
freed.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

41b02e1f

S
ceph: document locking for ceph_set_dentry_offset · 4f177264
由 Sage Weil 提交于 7月 26, 2011
```
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>
```
4f177264

ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug · e5f86dc3

由 Sage Weil 提交于 7月 26, 2011

Have caller pass in a safely-obtained reference to the parent directory
for calculating a dentry's hash valud.

While we're here, simpify the flow through ceph_encode_fh() so that there
is a single exit point and cleanup.

Also fix a bug with the dentry hash calculation: calculate the hash for the
dentry we were given, not its parent.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

e5f86dc3

ceph: protect d_parent access in ceph_d_revalidate · bf1c6aca

由 Sage Weil 提交于 7月 26, 2011

Protect d_parent with d_lock.  Carry a reference.  Simplify the flow so
that there is a single exit point and cleanup.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

bf1c6aca

ceph: protect access to d_parent · 5f21c96d

由 Sage Weil 提交于 7月 26, 2011

d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode.  Also take a reference and drop it in the caller to avoid
a use-after-free.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

5f21c96d

ceph: handle racing calls to ceph_init_dentry · 48d0cbd1

由 Sage Weil 提交于 7月 26, 2011

The ->lookup() and prepopulate_readdir() callers are working with unhashed
dentries, so we don't have to worry.  The export.c callers, though, need
to initialize something they got back from d_obtain_alias() and are
potentially racing with other callers.  Make sure we don't return unless
the dentry is properly initialized (by us or someone else).
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

48d0cbd1

ceph: set dir complete frag after adding capability · dfabbed6

由 Sage Weil 提交于 7月 26, 2011

Curretly ceph_add_cap clears the complete bit if we are newly issued the
FILE_SHARED cap, which is normally the case for a newly issue cap on a new
directory.  That means we clear the just-set bit.  Move the check that sets
the flag to after the cap is added/updated.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

dfabbed6

ceph: set up readahead size when rsize is not passed · e9852227

由 Yehuda Sadeh 提交于 7月 22, 2011

This should improve the default read performance, as without it
readahead is practically disabled.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

e9852227

ceph: ignore lease mask · 2f90b852

由 Sage Weil 提交于 7月 26, 2011

The lease mask is no longer used (and it changed a while back).  Instead,
use a non-zero duration to indicate that there is a lease being issued.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

2f90b852

ceph: fix ceph_lookup_open intent usage · 468640e3

由 Sage Weil 提交于 7月 26, 2011

We weren't properly calling lookup_instantiate_filp when setting up the
lookup intent, which could lead to file leakage on errors.  So:

 - use separate helper for the hidden snapdir translation, immediately
   following the mds request
 - use ceph_finish_lookup for the final dentry/return value dance in the
   exit path
 - lookup_instantiate_filp on success
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

468640e3

ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC · 9bae113a

由 Sage Weil 提交于 7月 26, 2011

We only need to put these on the directory unsafe list if they have
side effects that fsync(2) should flush out.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bae113a

ceph: fix bad parent_inode calc in ceph_lookup_open · acda7657

由 Sage Weil 提交于 7月 26, 2011

We were always getting NULL here because the intent file f_dentry is always
NULL at this point, which means we were always passing NULL to
ceph_mdsc_do_request.  In reality, this was fine, since this isn't
currently ever a write operation that needs to get strung on the dir's
unsafe list.

Use the dir explicitly, and only pass it if this open has side-effects that
a dir fsync should flush.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

acda7657

ceph: avoid carrying Fw cap during write into page cache · d8de9ab6

由 Sage Weil 提交于 7月 26, 2011

The generic_file_aio_write call may block on balance_dirty_pages while we
flush data to the OSDs.  If we hold a reference to the FILE_WR cap during
that interval revocation by the MDS (e.g., to do a stat(2)) may be very
slow.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

d8de9ab6

ceph: report f_bfree based on kb_avail rather than diffing. · 8f04d422

由 Greg Farnum 提交于 7月 26, 2011

Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NGreg Farnum <gregory.farnum@dreamhost.com>

8f04d422

ceph: only queue capsnap if caps are dirty · e77dc3e9

由 Sage Weil 提交于 7月 26, 2011

We used to go into this branch if i_wrbuffer_ref_head was non-zero.  This
was an ancient check from before we were careful about dealing with all
kinds of caps (and not just dirty pages).  It is cleaner to only queue a
capsnap if there is an actual dirty cap.  If we are racing with...
something...we will end up here with ci->i_wrbuffer_refs but no dirty
caps.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

e77dc3e9

ceph: fix snap writeback when racing with writes · af0ed569

由 Sage Weil 提交于 7月 26, 2011

There are two problems that come up when we try to queue a capsnap while a
write is in progress:

 - The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap
   with dirty == 0.  That will crash later in __ceph_flush_snaps().  Or
   on the FILE_WR cap if a write is in progress.
 - We may not have i_head_snapc set, which causes problems pretty quickly.
   Look to the snaprealm in this case.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

af0ed569

ceph: use flag bit for at_end readdir flag · 9cfa1098

由 Sage Weil 提交于 7月 26, 2011

This saves us a word of memory per file.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9cfa1098

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功