提交 · 501ab32387533924b211cacff36d19296414ec0b · openeuler / Kernel

23 2月, 2015 10 次提交

xfs: use generic percpu counters for inode counter · 501ab323

由 Dave Chinner 提交于 2月 23, 2015

XFS has hand-rolled per-cpu counters for the superblock since before
there was any generic implementation. There are some warts around
the  use of them for the inode counter as the hand rolled counter is
designed to be accurate at zero, but has no specific accurracy at
any other value. This design causes problems for the maximum inode
count threshold enforcement, as there is no trigger that balances
the counters as they get close tothe maximum threshold.

Instead of designing new triggers for balancing, just replace the
handrolled per-cpu counter with a generic counter.  This enables us
to update the counter through the normal superblock modification
funtions, but rather than do that we add a xfs_mod_icount() helper
function (from Christoph Hellwig) and keep the percpu counter
outside the superblock in the struct xfs_mount.

This means we still need to initialise the per-cpu counter
specifically when we read the superblock, and vice versa when we
log/write it, but it does mean that we don't need to change any
other code.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

501ab323

A
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 0a280962
由 Al Viro 提交于 2月 21, 2015
```
X-Coverup: just ask spender
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0a280962

procfs: fix race between symlink removals and traversals · 7e0e953b

由 Al Viro 提交于 2月 21, 2015

use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e0e953b

debugfs: leave freeing a symlink body until inode eviction · 0db59e59

由 Al Viro 提交于 2月 21, 2015

As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.

And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0db59e59

trylock_super(): replacement for grab_super_passive() · eb6ef3df

由 Konstantin Khlebnikov 提交于 2月 19, 2015

I've noticed significant locking contention in memory reclaimer around
sb_lock inside grab_super_passive(). Grab_super_passive() is called from
two places: in icache/dcache shrinkers (function super_cache_scan) and
from writeback (function __writeback_inodes_wb). Both are required for
progress in memory allocator.

Grab_super_passive() acquires sb_lock to increment sb->s_count and check
sb->s_instances. It seems sb->s_umount locked for read is enough here:
super-block deactivation always runs under sb->s_umount locked for write.
Protecting super-block itself isn't a problem: in super_cache_scan() sb
is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
are still active. Inside writeback super-block comes from inode from bdi
writeback list under wb->list_lock.

This patch removes locking sb_lock and checks s_instances under s_umount:
generic_shutdown_super() unlinks it under sb->s_umount locked for write.
New variant is called trylock_super() and since it only locks semaphore,
callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
they're done.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eb6ef3df

fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4

由 David Howells 提交于 1月 29, 2015

Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54f2a2f4

Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · ce40fa78

由 David Howells 提交于 1月 29, 2015

Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
thereof) in cachefiles:

 (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
     it doesn't want to deal with automounts in its cache.

 (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
     cachefiles.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce40fa78

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

VFS: Split DCACHE_FILE_TYPE into regular and special types · 44bdb5e5

由 David Howells 提交于 1月 29, 2015

Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
socket files).

d_is_reg() and d_is_special() are added to detect these subtypes and
d_is_file() is left as the union of the two.

This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
use d_is_reg(dentry) instead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44bdb5e5

VFS: Add a fallthrough flag for marking virtual dentries · df1a085a

由 David Howells 提交于 1月 29, 2015

Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead.  This may be recorded on medium if directory integration is stored
there.

The flag can be set with d_set_fallthru() and tested with d_is_fallthru().

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df1a085a

20 2月, 2015 6 次提交

posix_acl: fix reference leaks in posix_acl_create · fed0b588

由 Omar Sandoval 提交于 2月 08, 2015

get_acl gets a reference which we must release in the error cases.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fed0b588

autofs4: Wrong format for printing dentry · 76bf3f6b

由 Rasmus Villemoes 提交于 2月 06, 2015

%pD for struct file*, %pd for struct dentry*.

Fixes: a455589f ("assorted conversions to %p[dD]")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

76bf3f6b

coredump: Fix typo in comment · fcbc32bc

由 Bastien Nocera 提交于 2月 05, 2015

Signed-off-by: NBastien Nocera <hadess@hadess.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcbc32bc

fs/aio.c: Remove duplicate function name in pr_debug messages · acd88d4e

由 Kinglong Mee 提交于 2月 04, 2015

Have defined pr_fmt as below in fs/aio.c, so remove duplicate
function name in pr_debug message.

#define pr_fmt(fmt) "%s: " fmt, __func__
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

acd88d4e

configfs: Fix potential NULL d_inode dereference · 112fc894

由 David Howells 提交于 1月 27, 2015

Code that does this:

		if (!(d_unhashed(dentry) && dentry->d_inode)) {
			...
			simple_unlink(parent->d_inode, dentry);
		}

is broken because:

    !(d_unhashed(dentry) && dentry->d_inode)

is equivalent to:

    !d_unhashed(dentry) || !dentry->d_inode

so it is possible to get into simple_unlink() with dentry->d_inode == NULL.

simple_unlink(), however, assumes dentry->d_inode cannot be NULL.

I think that what was meant is this:

    !d_unhashed(dentry) && dentry->d_inode

and that the logical-not operator or the final close-bracket was misplaced.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

112fc894

don't bother with most of the bad_file_ops methods · db671a8e

由 Al Viro 提交于 2月 04, 2015

Only ->open() should be there (always failing, of course).  We never
replace ->f_op of an already opened struct file, so there's no way
for any of those methods to be called.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db671a8e

19 2月, 2015 22 次提交

x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4

由 Hector Marco-Gisbert 提交于 2月 14, 2015

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

  static unsigned long randomize_stack_top(unsigned long stack_top)
  {
           unsigned int random_variable = 0;

           if ((current->flags & PF_RANDOMIZE) &&
                   !(current->personality & ADDR_NO_RANDOMIZE)) {
                   random_variable = get_random_int() & STACK_RND_MASK;
                   random_variable <<= PAGE_SHIFT;
           }
           return PAGE_ALIGN(stack_top) + random_variable;
           return PAGE_ALIGN(stack_top) - random_variable;
  }

Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

	  random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).

This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().

The successful fix can be tested with:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
  7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
  7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
  7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
  7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
  ...

Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.
Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>

4e7c22d4

ceph: return error for traceless reply race · 4d41cef2

由 Yan, Zheng 提交于 2月 04, 2015

When we receives traceless reply for request that created new inode,
we re-send a lookup request to MDS get information of the newly created
inode. (VFS expects FS' callback return an inode in create case)
This breaks one request into two requests. Other client may modify or
move to the new inode in the middle.

When the race happens, ceph_handle_notrace_create() unconditionally
links the dentry for 'create' operation to the inode returned by lookup.
This may confuse VFS when the inode is a directory (VFS does not allow
multiple linkages for directory inode).

This patch makes ceph_handle_notrace_create() when it detect a race.
This event should be rare and it happens only when we talk to old MDS.
Recent MDS does not send traceless reply for request that creates new
inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

4d41cef2

Y
ceph: fix dentry leaks · 5cba372c
由 Yan, Zheng 提交于 2月 02, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
5cba372c

ceph: re-send requests when MDS enters reconnecting stage · 3de22be6

由 Yan, Zheng 提交于 2月 04, 2015

So that MDS can check if any request is already completed and process
completed requests in clientreplay stage. When completed requests are
processed in clientreplay stage, MDS can avoid sending traceless
replies.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3de22be6

I
ceph: show nocephx_require_signatures and notcp_nodelay options · 2a0b61ce
由 Ilya Dryomov 提交于 2月 02, 2015
```
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
```
2a0b61ce

ceph: fix atomic_open snapdir · bf91c315

由 Yan, Zheng 提交于 1月 19, 2015

ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
and creates snapdir inode if it's -ENOENT
Signed-off-by: NYan, Zheng <zyan@redhat.com>

bf91c315

ceph: properly mark empty directory as complete · 2f92b3d0

由 Yan, Zheng 提交于 1月 19, 2015

ceph_add_cap() calls __check_cap_issue(), which clears directory
inode' complete flag. so we should set the complete flag for empty
directory should be set after calling ceph_add_cap().
Signed-off-by: NYan, Zheng <zyan@redhat.com>

2f92b3d0

Y
client: include kernel version in client metadata · a6a5ce4f
由 Yan, Zheng 提交于 1月 16, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
a6a5ce4f
Y
ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
由 Yan, Zheng 提交于 1月 14, 2015
```
remove all unsupported operations from {inode,file}_operations.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
38c48b5f

ceph: fix request time stamp encoding · 1f041a89

由 Yan, Zheng 提交于 1月 13, 2015

struct timespec uses 'long' to present second and nanosecond. 'long'
is 64 bits on 64bits machine. ceph MDS expects time stamp to be
encoded as struct ceph_timespec, which uses 'u32' to present second
and nanosecond.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1f041a89

ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a

由 Yan, Zheng 提交于 1月 10, 2015

when inode has inline data but its size > PAGE_SIZE (it was truncated
to larger size), previous direct read code return -EIO. This patch adds
code to return zeros for data whose offset > PAGE_SIZE.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fcc02d2a

ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) · 86d8f67b

由 Yan, Zheng 提交于 1月 09, 2015

use an atomic variable to track number of sessions, this can avoid block
operation inside wait loops.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

86d8f67b

ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) · c4d4a582

由 Yan, Zheng 提交于 1月 09, 2015

we should not do block operation in wait_event_interruptible()'s condition
check function, but reading inline data can block. so move the read inline
data code to ceph_get_caps()
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c4d4a582

ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) · d3383a8e

由 Yan, Zheng 提交于 1月 08, 2015

check_cap_flush() calls mutex_lock(), which may block. So we can't
use it as condition check function for wait_event();
Signed-off-by: NYan, Zheng <zyan@redhat.com>

d3383a8e

ceph: improve reference tracking for snaprealm · 982d6011

由 Yan, Zheng 提交于 12月 23, 2014

When snaprealm is created, its initial reference count is zero.
But in some rare cases, the newly created snaprealm is not referenced
by anyone. This causes snaprealm with zero reference count not freed.

The fix is set reference count of newly snaprealm to 1. The reference
is return the function who requests to create the snaprealm. When the
function finishes its job, it releases the reference.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

982d6011

ceph: properly zero data pages for file holes. · 1487a688

由 Yan, Zheng 提交于 1月 06, 2015

A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls
ceph_zero_pape_vector_range(). The first argument, page_align + read + ret,
passed to ceph_zero_pape_vector_range() is wrong.

When a file has holes, this wrong parameter may cause memory corruption
either in kernal space or user space. Kernel space memory may be corrupted in
the case of non direct IO; user space memory may be corrupted in the case of
direct IO. In the latter case, the application doing direct IO may crash due
to memory corruption, as we have experienced.

The correct value should be initial_align + read + ret, where intial_align =
o_direct ? buf_align : io_align. Compared with page_align, the current page
offest, initial_align is the initial page offest, which should be used to
calculate the page and offset in ceph_zero_pape_vector_range().
Reported-by: Ncaifeng zhu <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1487a688

ceph: acl: Remove unused function · 671762f8

由 Rickard Strandqvist 提交于 1月 04, 2015

Remove the function ceph_get_cached_acl() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

671762f8

ceph: handle SESSION_FORCE_RO message · 03f4fcb0

由 Yan, Zheng 提交于 1月 05, 2015

mark session as readonly and wake up all cap waiters.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

03f4fcb0

NFSv4.1: Clean up bind_conn_to_session · 71a097c6

由 Trond Myklebust 提交于 2月 18, 2015

We don't need to fake up an entire session in order retrieve the arguments.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

71a097c6

NFSv4.1: Always set up a forward channel when binding the session · 7e9f0738

由 Trond Myklebust 提交于 2月 18, 2015

Currently, the client requests a back channel or a bidirectional
connection when binding a new TCP channel to an existing session.
Fix that to ask for a forward channel or bidirectional.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e9f0738

NFSv4.1: Don't set up a backchannel if the server didn't agree to do so · b1c0df5f

由 Trond Myklebust 提交于 2月 18, 2015

If the server doesn't agree to out backchannel setup request, then
don't set one up.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b1c0df5f

NFSv4.1: Clean up create_session · 79969dd1

由 Trond Myklebust 提交于 2月 18, 2015

Don't decode directly into the shared struct session
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

79969dd1

18 2月, 2015 2 次提交

pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit · 338d00cf

由 Tom Haynes 提交于 2月 17, 2015

The File Layout's filelayout_mark_request_commit() is almost the
Flex File Layout's ff_layout_mark_request_commit(). And that can
be reduced by calling into nfs_request_add_commit_list().
Signed-off-by: NTom Haynes <loghyr@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

338d00cf

A
configfs_add_file: fold into its sole caller · 28444a2b
由 Al Viro 提交于 1月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
28444a2b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功