提交 · 8a001af4bbb8a2e4e8ca6805f80b7b04db9aacc3 · openeuler / Kernel

03 3月, 2015 1 次提交

eCryptfs: don't pass fs-specific ioctl commands through · 6d65261a

由 Tyler Hicks 提交于 2月 24, 2015

eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.

One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.

This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.

https://bugzilla.kernel.org/show_bug.cgi?id=93691
https://launchpad.net/bugs/1305335Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org # v2.6.36+: c43f7b8f eCryptfs: Handle ioctl calls with unlocked and compat functions

6d65261a

01 3月, 2015 1 次提交

nilfs2: fix potential memory overrun on inode · 957ed60b

由 Ryusuke Konishi 提交于 2月 27, 2015

Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
have a memory overrun issue:

Each b-tree node of nilfs2 stores a set of key-value pairs and the number
of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
a few other "bn_*" members.

Since the value of "bn_nchildren" is used for operations on the key-values
within the b-tree node, it can cause memory access overrun if a large
number is incorrectly set to "bn_nchildren".

For instance, nilfs_btree_node_lookup() function determines the range of
binary search with it, and too large "bn_nchildren" leads
nilfs_btree_node_get_key() in that function to overrun.

As for intermediate b-tree nodes, this is prevented by a sanity check
performed when each node is read from a drive, however, no sanity check
has been done for root nodes stored in inodes.

This patch fixes the issue by adding missing sanity check against b-tree
root nodes so that it's called when on-memory inodes are read from ifile,
inode metadata file.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

957ed60b

27 2月, 2015 1 次提交

nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd · c876486b

由 Andrew Elble 提交于 2月 25, 2015

commit 2d4a532d ("nfsd: ensure that clp->cl_revoked list is
protected by clp->cl_lock") removed the use of the reaplist to
clean out clp->cl_revoked. It failed to change list_entry() to
walk clp->cl_revoked.next instead of reaplist.next

Fixes: 2d4a532d ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
Cc: stable@vger.kernel.org
Reported-by: NEric Meddaugh <etmsys@rit.edu>
Tested-by: NEric Meddaugh <etmsys@rit.edu>
Signed-off-by: NAndrew Elble <aweits@rit.edu>
Reviewed-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c876486b

25 2月, 2015 1 次提交

eCryptfs: ensure copy to crypt_stat->cipher does not overrun · 2a559a8b

由 Colin Ian King 提交于 2月 23, 2015

The patch 237fead6: "[PATCH] ecryptfs: fs/Makefile and
fs/Kconfig" from Oct 4, 2006, leads to the following static checker
warning:

  fs/ecryptfs/crypto.c:846 ecryptfs_new_file_context()
  error: off-by-one overflow 'crypt_stat->cipher' size 32.  rl = '0-32'

There is a mismatch between the size of ecryptfs_crypt_stat.cipher
and ecryptfs_mount_crypt_stat.global_default_cipher_name causing the
copy of the cipher name to cause a off-by-one string copy error. This
fix ensures the space reserved for this string is the same size including
the trailing zero at the end throughout ecryptfs.

This fix avoids increasing the size of ecryptfs_crypt_stat.cipher
and also ecryptfs_parse_tag_70_packet_silly_stack.cipher_string and instead
reduces the of ECRYPTFS_MAX_CIPHER_NAME_SIZE to 31 and includes the + 1 for
the end of string terminator.

NOTE: An overflow is not possible in practice since the value copied
into global_default_cipher_name is validated by
ecryptfs_code_for_cipher_string() at mount time. None of the allowed
cipher strings are long enough to cause the potential buffer overflow
fixed by this patch.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
[tyhicks: Added the NOTE about the overflow not being triggerable]
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>

2a559a8b

24 2月, 2015 2 次提交

xfs: cancel failed transaction in xfs_fs_commit_blocks() · 83d5f018

由 Eric Sandeen 提交于 2月 24, 2015

If xfs_trans_reserve fails we don't cancel the transaction,
and we'll leak the allocated transaction pointer.

Spotted by Coverity.
Signed-off-by: NEric Sandeen <ssandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

83d5f018

xfs: Ensure we have target_ip for RENAME_EXCHANGE · fc921566

由 Eric Sandeen 提交于 2月 24, 2015

We shouldn't get here with RENAME_EXCHANGE set and no
target_ip, but let's be defensive, because xfs_cross_rename()
will dereference it.

Spotted by Coverity.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

fc921566

23 2月, 2015 11 次提交

xfs: ensure truncate forces zeroed blocks to disk · 5885ebda

由 Dave Chinner 提交于 2月 23, 2015

A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.

Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.

Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.
Reported-by: NLiu Bo <bo.li.liu@oracle.com>
cc: <stable@vger.kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

5885ebda

xfs: Fix quota type in quota structures when reusing quota file · dfcc70a8

由 Jan Kara 提交于 2月 23, 2015

For filesystems without separate project quota inode field in the
superblock we just reuse project quota file for group quotas (and vice
versa) if project quota file is allocated and we need group quota file.
When we reuse the file, quota structures on disk suddenly have wrong
type stored in d_flags though. Nobody really cares about this (although
structure type reported to userspace was wrong as well) except
that after commit 14bf61ff (quota: Switch ->get_dqblk() and
->set_dqblk() to use bytes as space units) assertion in
xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
was testing without XFS_DEBUG so I didn't notice when submitting the
above commit).

Fix the problem by properly resetting ddq->d_flags when running quotacheck
for a quota file.

CC: stable@vger.kernel.org
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

dfcc70a8

A
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 0a280962
由 Al Viro 提交于 2月 21, 2015
```
X-Coverup: just ask spender
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0a280962

procfs: fix race between symlink removals and traversals · 7e0e953b

由 Al Viro 提交于 2月 21, 2015

use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e0e953b

debugfs: leave freeing a symlink body until inode eviction · 0db59e59

由 Al Viro 提交于 2月 21, 2015

As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.

And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0db59e59

trylock_super(): replacement for grab_super_passive() · eb6ef3df

由 Konstantin Khlebnikov 提交于 2月 19, 2015

I've noticed significant locking contention in memory reclaimer around
sb_lock inside grab_super_passive(). Grab_super_passive() is called from
two places: in icache/dcache shrinkers (function super_cache_scan) and
from writeback (function __writeback_inodes_wb). Both are required for
progress in memory allocator.

Grab_super_passive() acquires sb_lock to increment sb->s_count and check
sb->s_instances. It seems sb->s_umount locked for read is enough here:
super-block deactivation always runs under sb->s_umount locked for write.
Protecting super-block itself isn't a problem: in super_cache_scan() sb
is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
are still active. Inside writeback super-block comes from inode from bdi
writeback list under wb->list_lock.

This patch removes locking sb_lock and checks s_instances under s_umount:
generic_shutdown_super() unlinks it under sb->s_umount locked for write.
New variant is called trylock_super() and since it only locks semaphore,
callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
they're done.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eb6ef3df

fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4

由 David Howells 提交于 1月 29, 2015

Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54f2a2f4

Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · ce40fa78

由 David Howells 提交于 1月 29, 2015

Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
thereof) in cachefiles:

 (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
     it doesn't want to deal with automounts in its cache.

 (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
     cachefiles.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce40fa78

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

VFS: Split DCACHE_FILE_TYPE into regular and special types · 44bdb5e5

由 David Howells 提交于 1月 29, 2015

Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
socket files).

d_is_reg() and d_is_special() are added to detect these subtypes and
d_is_file() is left as the union of the two.

This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
use d_is_reg(dentry) instead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44bdb5e5

VFS: Add a fallthrough flag for marking virtual dentries · df1a085a

由 David Howells 提交于 1月 29, 2015

Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead.  This may be recorded on medium if directory integration is stored
there.

The flag can be set with d_set_fallthru() and tested with d_is_fallthru().

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df1a085a

20 2月, 2015 7 次提交

Btrfs: fix allocation size calculations in alloc_btrfs_bio · e57cf21e

由 Chris Mason 提交于 2月 19, 2015

Since commit 8e5cfb55 (Btrfs: Make raid_map array be inlined in
btrfs_bio structure), the raid map array is allocated along with the
btrfs bio in alloc_btrfs_bio.  The calculation used to decide how much
we need to allocate was using the wrong parameter passed into the
allocation function.

The passed in real_stripes will be zero if a target replace operation
is not currently running.  We want to use total_stripes instead.
Signed-off-by: NChris Mason <clm@fb.com>
Reported-by: NDavid Sterba <dsterba@suse.cz>
Tested-by: NDavid Sterba <dsterba@suse.cz>

e57cf21e

posix_acl: fix reference leaks in posix_acl_create · fed0b588

由 Omar Sandoval 提交于 2月 08, 2015

get_acl gets a reference which we must release in the error cases.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fed0b588

autofs4: Wrong format for printing dentry · 76bf3f6b

由 Rasmus Villemoes 提交于 2月 06, 2015

%pD for struct file*, %pd for struct dentry*.

Fixes: a455589f ("assorted conversions to %p[dD]")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

76bf3f6b

coredump: Fix typo in comment · fcbc32bc

由 Bastien Nocera 提交于 2月 05, 2015

Signed-off-by: NBastien Nocera <hadess@hadess.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcbc32bc

fs/aio.c: Remove duplicate function name in pr_debug messages · acd88d4e

由 Kinglong Mee 提交于 2月 04, 2015

Have defined pr_fmt as below in fs/aio.c, so remove duplicate
function name in pr_debug message.

#define pr_fmt(fmt) "%s: " fmt, __func__
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

acd88d4e

configfs: Fix potential NULL d_inode dereference · 112fc894

由 David Howells 提交于 1月 27, 2015

Code that does this:

		if (!(d_unhashed(dentry) && dentry->d_inode)) {
			...
			simple_unlink(parent->d_inode, dentry);
		}

is broken because:

    !(d_unhashed(dentry) && dentry->d_inode)

is equivalent to:

    !d_unhashed(dentry) || !dentry->d_inode

so it is possible to get into simple_unlink() with dentry->d_inode == NULL.

simple_unlink(), however, assumes dentry->d_inode cannot be NULL.

I think that what was meant is this:

    !d_unhashed(dentry) && dentry->d_inode

and that the logical-not operator or the final close-bracket was misplaced.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

112fc894

don't bother with most of the bad_file_ops methods · db671a8e

由 Al Viro 提交于 2月 04, 2015

Only ->open() should be there (always failing, of course).  We never
replace ->f_op of an already opened struct file, so there's no way
for any of those methods to be called.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db671a8e

19 2月, 2015 16 次提交

x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4

由 Hector Marco-Gisbert 提交于 2月 14, 2015

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

  static unsigned long randomize_stack_top(unsigned long stack_top)
  {
           unsigned int random_variable = 0;

           if ((current->flags & PF_RANDOMIZE) &&
                   !(current->personality & ADDR_NO_RANDOMIZE)) {
                   random_variable = get_random_int() & STACK_RND_MASK;
                   random_variable <<= PAGE_SHIFT;
           }
           return PAGE_ALIGN(stack_top) + random_variable;
           return PAGE_ALIGN(stack_top) - random_variable;
  }

Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

	  random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).

This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().

The successful fix can be tested with:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
  7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
  7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
  7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
  7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
  ...

Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.
Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>

4e7c22d4

ceph: return error for traceless reply race · 4d41cef2

由 Yan, Zheng 提交于 2月 04, 2015

When we receives traceless reply for request that created new inode,
we re-send a lookup request to MDS get information of the newly created
inode. (VFS expects FS' callback return an inode in create case)
This breaks one request into two requests. Other client may modify or
move to the new inode in the middle.

When the race happens, ceph_handle_notrace_create() unconditionally
links the dentry for 'create' operation to the inode returned by lookup.
This may confuse VFS when the inode is a directory (VFS does not allow
multiple linkages for directory inode).

This patch makes ceph_handle_notrace_create() when it detect a race.
This event should be rare and it happens only when we talk to old MDS.
Recent MDS does not send traceless reply for request that creates new
inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

4d41cef2

Y
ceph: fix dentry leaks · 5cba372c
由 Yan, Zheng 提交于 2月 02, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
5cba372c

ceph: re-send requests when MDS enters reconnecting stage · 3de22be6

由 Yan, Zheng 提交于 2月 04, 2015

So that MDS can check if any request is already completed and process
completed requests in clientreplay stage. When completed requests are
processed in clientreplay stage, MDS can avoid sending traceless
replies.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3de22be6

I
ceph: show nocephx_require_signatures and notcp_nodelay options · 2a0b61ce
由 Ilya Dryomov 提交于 2月 02, 2015
```
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
```
2a0b61ce

ceph: fix atomic_open snapdir · bf91c315

由 Yan, Zheng 提交于 1月 19, 2015

ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
and creates snapdir inode if it's -ENOENT
Signed-off-by: NYan, Zheng <zyan@redhat.com>

bf91c315

ceph: properly mark empty directory as complete · 2f92b3d0

由 Yan, Zheng 提交于 1月 19, 2015

ceph_add_cap() calls __check_cap_issue(), which clears directory
inode' complete flag. so we should set the complete flag for empty
directory should be set after calling ceph_add_cap().
Signed-off-by: NYan, Zheng <zyan@redhat.com>

2f92b3d0

Y
client: include kernel version in client metadata · a6a5ce4f
由 Yan, Zheng 提交于 1月 16, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
a6a5ce4f
Y
ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
由 Yan, Zheng 提交于 1月 14, 2015
```
remove all unsupported operations from {inode,file}_operations.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
38c48b5f

ceph: fix request time stamp encoding · 1f041a89

由 Yan, Zheng 提交于 1月 13, 2015

struct timespec uses 'long' to present second and nanosecond. 'long'
is 64 bits on 64bits machine. ceph MDS expects time stamp to be
encoded as struct ceph_timespec, which uses 'u32' to present second
and nanosecond.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1f041a89

ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a

由 Yan, Zheng 提交于 1月 10, 2015

when inode has inline data but its size > PAGE_SIZE (it was truncated
to larger size), previous direct read code return -EIO. This patch adds
code to return zeros for data whose offset > PAGE_SIZE.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fcc02d2a

ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) · 86d8f67b

由 Yan, Zheng 提交于 1月 09, 2015

use an atomic variable to track number of sessions, this can avoid block
operation inside wait loops.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

86d8f67b

ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) · c4d4a582

由 Yan, Zheng 提交于 1月 09, 2015

we should not do block operation in wait_event_interruptible()'s condition
check function, but reading inline data can block. so move the read inline
data code to ceph_get_caps()
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c4d4a582

ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) · d3383a8e

由 Yan, Zheng 提交于 1月 08, 2015

check_cap_flush() calls mutex_lock(), which may block. So we can't
use it as condition check function for wait_event();
Signed-off-by: NYan, Zheng <zyan@redhat.com>

d3383a8e

ceph: improve reference tracking for snaprealm · 982d6011

由 Yan, Zheng 提交于 12月 23, 2014

When snaprealm is created, its initial reference count is zero.
But in some rare cases, the newly created snaprealm is not referenced
by anyone. This causes snaprealm with zero reference count not freed.

The fix is set reference count of newly snaprealm to 1. The reference
is return the function who requests to create the snaprealm. When the
function finishes its job, it releases the reference.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

982d6011

ceph: properly zero data pages for file holes. · 1487a688

由 Yan, Zheng 提交于 1月 06, 2015

A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls
ceph_zero_pape_vector_range(). The first argument, page_align + read + ret,
passed to ceph_zero_pape_vector_range() is wrong.

When a file has holes, this wrong parameter may cause memory corruption
either in kernal space or user space. Kernel space memory may be corrupted in
the case of non direct IO; user space memory may be corrupted in the case of
direct IO. In the latter case, the application doing direct IO may crash due
to memory corruption, as we have experienced.

The correct value should be initial_align + read + ret, where intial_align =
o_direct ? buf_align : io_align. Compared with page_align, the current page
offest, initial_align is the initial page offest, which should be used to
calculate the page and offset in ceph_zero_pape_vector_range().
Reported-by: Ncaifeng zhu <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

1487a688

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功