提交 · 3e0708b990f7e46d87d47b3b06de322490f2f2ee · openeuler / raspberrypi-kernel

25 6月, 2015 13 次提交

Y
ceph: ratelimit warn messages for MDS closes session · 3e0708b9
由 Yan, Zheng 提交于 5月 22, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
3e0708b9

ceph: simplify two mount_timeout sites · 5be73034

由 Ilya Dryomov 提交于 5月 19, 2015

No need to bifurcate wait now that we've got ceph_timeout_jiffies().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

5be73034

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

ceph: exclude setfilelock requests when calculating oldest tid · e8a7b8b1

由 Yan, Zheng 提交于 5月 19, 2015

setfilelock requests can block for a long time, which can prevent
client from advancing its oldest tid.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e8a7b8b1

ceph: don't pre-allocate space for cap release messages · 745a8e3b

由 Yan, Zheng 提交于 5月 14, 2015

Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

745a8e3b

Y
ceph: make sure syncfs flushes all cap snaps · affbc19a
由 Yan, Zheng 提交于 5月 05, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
affbc19a
Y
ceph: don't trim auth cap when there are cap snaps · 622f3e25
由 Yan, Zheng 提交于 5月 07, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
622f3e25

ceph: take snap_rwsem when accessing snap realm's cached_context · 604d1b02

由 Yan, Zheng 提交于 5月 01, 2015

When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps()
accesses snap realm's cached_context. So we need take read lock
of snap_rwsem.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

604d1b02

ceph: avoid sending unnessesary FLUSHSNAP message · 86056090

由 Yan, Zheng 提交于 5月 01, 2015

when a snap notification contains no new snapshot, we can avoid
sending FLUSHSNAP message to MDS. But we still need to create
cap_snap in some case because it's required by write path and
page writeback path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

86056090

ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c

由 Yan, Zheng 提交于 4月 30, 2015

In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5dda377c

ceph: use empty snap context for uninline_data and get_pool_perm · 7b06a826

由 Yan, Zheng 提交于 5月 01, 2015

Cached_context in ceph_snap_realm is directly accessed by
uninline_data() and get_pool_perm(). This is racy in theory.
both uninline_data() and get_pool_perm() do not modify existing
object, they only create new object. So we can pass the empty
snap context to them.  Unlike cached_context in ceph_snap_realm,
we do not need to protect the empty snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7b06a826

Y
ceph: check OSD caps before read/write · 10183a69
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
10183a69
Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14

22 4月, 2015 3 次提交
- Y
  ceph: fix uninline data function · ec137c10
  由 Yan, Zheng 提交于 4月 13, 2015
```
For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 to be encoded
as string in object's xattr.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  ec137c10
- Y
  ceph: rename snapshot support · 0ea611a3
  由 Yan, Zheng 提交于 4月 07, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  0ea611a3
- Y
  ceph: fix null pointer dereference in send_mds_reconnect() · c0bd50e2
  由 Yan, Zheng 提交于 4月 07, 2015
```
sb->s_root can be null when umounting
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  c0bd50e2
20 4月, 2015 14 次提交

ceph: hold on to exclusive caps on complete directories · 32ec4397

由 Yan, Zheng 提交于 3月 26, 2015

If a directory is complete, we want to keep the exclusive
cap. So that MDS does not end up revoking the shared cap
on every create/unlink operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

32ec4397

ceph: show non-default options only · ff7eeb82

由 Ilya Dryomov 提交于 3月 25, 2015

Don't pollute /proc/mounts with default options (presently these are
dcache, nofsc and acl).  Leave the acl/noacl however - it's a bit of
a special case due to CONFIG_CEPH_FS_POSIX_ACL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff7eeb82

libceph, ceph: split ceph_show_options() · ff40f9ae

由 Ilya Dryomov 提交于 3月 25, 2015

Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff40f9ae

Y
ceph: cleanup unsafe requests when reconnecting is denied · 1c841a96
由 Yan, Zheng 提交于 3月 24, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
1c841a96

ceph: don't zero i_wrbuffer_ref when reconnecting is denied · a9f6eb61

由 Yan, Zheng 提交于 3月 24, 2015

remove_session_caps_cb() does not truncate dirty data in page
cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will
result negtive i_wrbuffer_ref/i_wrbuffer_ref_head
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a9f6eb61

ceph: don't mark dirty caps when there is no auth cap · 571ade33

由 Yan, Zheng 提交于 3月 24, 2015

No i_auth_cap means reconnecting to MDS was denied. So don't
add new dirty caps.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

571ade33

ceph: keep i_snap_realm while there are writers · db40cc17

由 Yan, Zheng 提交于 3月 23, 2015

when reconnecting to MDS is denied, we remove session caps
forcibly. But it's possible there are ongoing write, the
write code needs to reference i_snap_realm. So if there are
ongoing write, we keep i_snap_realm.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

db40cc17

ceph: kstrdup() memory handling · a149bb9a

由 Sanidhya Kashyap 提交于 3月 21, 2015

Currently, there is no check for the kstrdup() for r_path2,
r_path1 and snapdir_name as various locations as there is a
possibility of failure during memory pressure. Therefore,
returning ENOMEM where the checks have been missed.
Signed-off-by: NSanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a149bb9a

ceph: properly release page upon error · c1d00b2d

由 Taesoo Kim 提交于 3月 20, 2015

When ceph_update_writeable_page fails (including -EAGAIN), it
unlocks (w/ unlock_page) the page but does not 'release'
(w/ page_cache_release) properly.

Upon error, properly set *pagep to NULL, indicating an error.
Signed-off-by: NTaesoo Kim <tsgatesv@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c1d00b2d

ceph: match wait_for_completion_timeout return type · 57e95460

由 Nicholas Mc Guire 提交于 3月 10, 2015

return type of wait_for_completion_timeout is unsigned long not int. An
appropriately named unsigned long is added and the assignment fixed up.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

57e95460

ceph: use msecs_to_jiffies for time conversion · 3563dbdd

由 Nicholas Mc Guire 提交于 2月 06, 2015

This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3563dbdd

ceph: remove redundant declaration · e1eba3ea

由 Fabian Frederick 提交于 3月 03, 2015

ceph_aops was already defined extern in addr.c section
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e1eba3ea

Y
ceph: fix dcache/nocache mount option · e2c3de04
由 Yan, Zheng 提交于 3月 04, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
e2c3de04

ceph: drop cap releases in requests composed before cap reconnect · 6e6f0923

由 Yan, Zheng 提交于 2月 27, 2015

These cap releases are stale because MDS will re-establish client
caps according to the cap reconnect messages.

Note: MDS can detect stale cap messages, so these stale cap
releases are harmless even we don't drop them.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

6e6f0923

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

12 4月, 2015 5 次提交

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

direct_IO: remove rw from a_ops->direct_IO() · 22c6186e

由 Omar Sandoval 提交于 3月 16, 2015

Now that no one is using rw, remove it completely.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22c6186e

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

13 3月, 2015 1 次提交

fs: remove ki_nbytes · 66ee59af

由 Christoph Hellwig 提交于 2月 11, 2015

There is no need to pass the total request length in the kiocb, as
we already get passed in through the iov_iter argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

66ee59af

23 2月, 2015 1 次提交

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

19 2月, 2015 1 次提交

ceph: return error for traceless reply race · 4d41cef2

由 Yan, Zheng 提交于 2月 04, 2015

When we receives traceless reply for request that created new inode,
we re-send a lookup request to MDS get information of the newly created
inode. (VFS expects FS' callback return an inode in create case)
This breaks one request into two requests. Other client may modify or
move to the new inode in the middle.

When the race happens, ceph_handle_notrace_create() unconditionally
links the dentry for 'create' operation to the inode returned by lookup.
This may confuse VFS when the inode is a directory (VFS does not allow
multiple linkages for directory inode).

This patch makes ceph_handle_notrace_create() when it detect a race.
This event should be rare and it happens only when we talk to old MDS.
Recent MDS does not send traceless reply for request that creates new
inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

4d41cef2