提交 · 5d37ca1480a70f437e4c425ee5723c760cf6afac · openeuler / Kernel

07 9月, 2017 1 次提交

ceph: send LSSNAP request to auth mds of directory inode · 5d37ca14

由 Yan, Zheng 提交于 7月 26, 2017

Snapdir inode has no capability. __choose_mds() should choose mds
base on capabilities of snapdir's parent inode.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5d37ca14

07 7月, 2017 1 次提交

ceph: avoid invalid memory dereference in the middle of umount · 62a65f36

由 Yan, Zheng 提交于 6月 22, 2017

extra_mon_dispatch() and debugfs' foo_show functions dereference
fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch
and debugfs before destroying fsc->mds.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

62a65f36

15 6月, 2017 1 次提交

ceph: use current_kernel_time() to get request time stamp · 56199016

由 Yan, Zheng 提交于 6月 01, 2017

ceph uses ktime_get_real_ts() to get request time stamp. In most
other cases, current_kernel_time() is used to get time stamp for
filesystem operations (called by current_time()).

There is granularity difference between ktime_get_real_ts() and
current_kernel_time(). The later one can be up to one jiffy behind
the former one. This can causes inode's ctime to go back.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

56199016

09 5月, 2017 1 次提交

fs: ceph: CURRENT_TIME with ktime_get_real_ts() · 1134e091

由 Deepa Dinamani 提交于 5月 08, 2017

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
M:	Ilya Dryomov <idryomov@gmail.com>
M:	"Yan, Zheng" <zyan@redhat.com>
M:	Sage Weil <sage@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1134e091

04 5月, 2017 7 次提交

ceph: handle epoch barriers in cap messages · 92475f05

由 Jeff Layton 提交于 4月 13, 2017

Have the client store and update the osdc epoch_barrier when a cap
message comes in with one.

When sending cap messages, send the epoch barrier as well. This allows
clients to inform servers that their released caps may not be used until
a particular OSD map epoch.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng” <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

92475f05

ceph: make seeky readdir more efficient · 79162547

由 Yan, Zheng 提交于 4月 05, 2017

Current cephfs client uses string to indicate start position of
readdir. The string is last entry of previous readdir reply.
This approach does not work for seeky readdir because we can
not easily convert the new postion to a string. For seeky readdir,
mds needs to return dentries from the beginning. Client keeps
retrying if the reply does not contain the dentry it wants.

In current version of ceph, mds sorts CDentry in its cache in
hash order. Client also uses dentry hash to compose dir postion.
For seeky readdir, if client passes the hash part of dir postion
to mds. mds can avoid replying useless dentries.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

79162547

ceph: close stopped mds' session · 2827528d

由 Yan, Zheng 提交于 3月 28, 2017

If a mds has stopped, close its session and clean up its session
requests/caps. The process is similar to handling SESSION_CLOSE
initiated by mds.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2827528d

ceph: fix potential use-after-free · 0a07fc8c

由 Yan, Zheng 提交于 3月 29, 2017

__unregister_session() free the session if it drops the last
reference. We should grab an extra reference if we want to use
session after __unregister_session().
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0a07fc8c

ceph: allow connecting to mds whose rank >= mdsmap::m_max_mds · 76201b63

由 Yan, Zheng 提交于 3月 28, 2017

mdsmap::m_max_mds is the expected count of active mds. It's not the
max rank of active mds. User can decrease mdsmap::m_max_mds, but does
not stop mds whose rank >= mdsmap::m_max_mds.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

76201b63

libceph: convert ceph_pagelist.refcnt from atomic_t to refcount_t · 0e1a5ee6

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e1a5ee6

ceph: convert ceph_mds_session.s_ref from atomic_t to refcount_t · 3997c01d

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3997c01d

24 2月, 2017 1 次提交

ceph: tidy some white space in get_nonsnap_parent() · f1075480

由 Dan Carpenter 提交于 2月 23, 2017

The white space here seems slightly messed up.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f1075480

20 2月, 2017 9 次提交

ceph: remove req from unsafe list when unregistering it · df963ea8

由 Jeff Layton 提交于 2月 14, 2017

There's no reason a request should ever be on a s_unsafe list but not
in the request tree.

Cc: stable@vger.kernel.org
Link: http://tracker.ceph.com/issues/18474Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

df963ea8

ceph: add a new flag to indicate whether parent is locked · 3dd69aab

由 Jeff Layton 提交于 1月 31, 2017

struct ceph_mds_request has an r_locked_dir pointer, which is set to
indicate the parent inode and that its i_rwsem is locked.  In some
critical places, we need to be able to indicate the parent inode to the
request handling code, even when its i_rwsem may not be locked.

Most of the code that operates on r_locked_dir doesn't require that the
i_rwsem be locked. We only really need it to handle manipulation of the
dcache. The rest (filling of the inode, updating dentry leases, etc.)
already has its own locking.

Add a new r_req_flags bit that indicates whether the parent is locked
when doing the request, and rename the pointer to "r_parent". For now,
all the places that set r_parent also set this flag, but that will
change in a later patch.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3dd69aab

ceph: convert bools in ceph_mds_request to a new r_req_flags field · bc2de10d

由 Jeff Layton 提交于 2月 01, 2017

Currently, we have a bunch of bool flags in struct ceph_mds_request. We
need more flags though, but each bool takes (at least) a byte. Those
add up over time.

Merge all of the existing bools in this struct into a single unsigned
long, and use the set/test/clear_bit macros to manipulate them. These
are atomic operations, but that is required here to prevent
load/modify/store races. The existing flags are protected by different
locks, so we can't rely on them for that purpose.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bc2de10d

ceph: drop session argument to ceph_fill_trace · f5a03b08

由 Jeff Layton 提交于 1月 31, 2017

Just get it from r_session since that's what's always passed in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f5a03b08

ceph: cleanup ACCESS_ONCE -> READ_ONCE · 52953d55

由 Seraphime Kirkovski 提交于 12月 26, 2016

This removes the uses of ACCESS_ONCE in favor of READ_ONCE
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

52953d55

ceph: pass parent inode info to ceph_encode_dentry_release if we have it · ca6c8ae0

由 Jeff Layton 提交于 12月 15, 2016

If we have a parent inode reference already, then we don't need to
go back up the directory tree to find one.

Link: http://tracker.ceph.com/issues/18148Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ca6c8ae0

ceph: pass parent dir ino info to build_dentry_path · fd36a717

由 Jeff Layton 提交于 12月 15, 2016

In the event that we have a parent inode reference in the request, we
can use that instead of mucking about in the dcache. Pass any parent
inode info we have down to build_dentry_path so it can make use of it.

Link: http://tracker.ceph.com/issues/18148Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fd36a717

ceph: clean up unsafe d_parent accesses in build_dentry_path · c6b0b656

由 Jeff Layton 提交于 12月 15, 2016

While we hold a reference to the dentry when build_dentry_path is
called, we could end up racing with a rename that changes d_parent.
Handle that situation correctly, by using the rcu_read_lock to
ensure that the parent dentry and inode stick around long enough
to safely check ceph_snap and ceph_ino.

Link: http://tracker.ceph.com/issues/18148Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c6b0b656

ceph: clean up unsafe d_parent access in __choose_mds · 30c71233

由 Jeff Layton 提交于 12月 15, 2016

__choose_mds exists to pick an MDS to use when issuing a call. Doing
that typically involves picking an inode and using the authoritative
MDS for it. In most cases, that's pretty straightforward, as we are
using an inode to which we hold a reference (usually represented by
r_dentry or r_inode in the request).

In the case of a snapshotted directory however, we need to fetch
the non-snapped parent, which involves walking back up the parents
in the tree. The dentries in the snapshot dir are effectively frozen
but the overall parent is _not_, and could vanish if a concurrent
rename were to occur.

Clean this code up and take special care to ensure the validity of
the entries we're working with. First, try to use the inode in
r_locked_dir if one exists. If not and all we have is r_dentry,
then we have to walk back up the tree. Use the rcu_read_lock for
this so we can ensure that any d_parent we find won't go away, and
take extra care to deal with the possibility that the dentries could
go negative.

Change get_nonsnap_parent to return an inode, and take a reference to
that inode before returning (if any). Change all of the other places
where we set "inode" in __choose_mds to also take a reference, and then
call iput on that inode before exiting the function.

Link: http://tracker.ceph.com/issues/18148Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

30c71233

19 1月, 2017 1 次提交

ceph: fix bad endianness handling in parse_reply_info_extra · 6df8c9d8

由 Jeff Layton 提交于 1月 12, 2017

sparse says:

    fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer

The op value is __le32, so we need to convert it before comparing it.

Cc: stable@vger.kernel.org # needs backporting for < 3.14
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6df8c9d8

13 1月, 2017 1 次提交

ceph: fix mds cluster availability check · cc8e8342

由 Yan, Zheng 提交于 1月 04, 2017

We should apply the check after getting the initial mdsmap.

Fixes: e9e427f0 ("ceph: check availability of mds cluster on mount")
Link: http://tracker.ceph.com/issues/18161Signed-off-by: NYan, Zheng <zyan@redhat.com>

cc8e8342

13 12月, 2016 2 次提交

Y
ceph: check availability of mds cluster on mount · e9e427f0
由 Yan, Zheng 提交于 11月 10, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
e9e427f0

libceph: drop len argument of *verify_authorizer_reply() · 0dde5848

由 Ilya Dryomov 提交于 12月 02, 2016

The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

0dde5848

03 10月, 2016 2 次提交

ceph: use list_move instead of list_del/list_add · 8cdcc07d

由 Wei Yongjun 提交于 8月 13, 2016

Using list_move() instead of list_del() + list_add().
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8cdcc07d

Y
ceph: handle CEPH_SESSION_REJECT message · fcff415c
由 Yan, Zheng 提交于 9月 14, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
fcff415c

09 8月, 2016 1 次提交

ceph: initialize pathbase in the !dentry case in encode_caps_cb() · 4eacd4cb

由 Ilya Dryomov 提交于 8月 09, 2016

pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

4eacd4cb

28 7月, 2016 9 次提交

ceph: optimize cap flush waiting · c8799fc4

由 Yan, Zheng 提交于 7月 07, 2016

Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c8799fc4

ceph: unify cap flush and snapcap flush · 0e294387

由 Yan, Zheng 提交于 7月 04, 2016

This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
  s_cap_flushing list instead. Inode is removed from the list when
  there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
  flushes.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0e294387

ceph: use list instead of rbtree to track cap flushes · e4500b5e

由 Yan, Zheng 提交于 7月 06, 2016

We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e4500b5e

ceph: include 'follows' of pending snapflush in cap reconnect message · 3469ed0d

由 Yan, Zheng 提交于 7月 05, 2016

This helps the recovering MDS to reconstruct the internal states that
tracking pending snapflush.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3469ed0d

Y
ceph: update cap reconnect message to version 3 · 121f22a1
由 Yan, Zheng 提交于 7月 04, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
121f22a1

ceph: mount non-default filesystem by name · 430afbad

由 Yan, Zheng 提交于 7月 08, 2016

To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.

This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

430afbad

ceph: remove ceph_mdsc_lease_release · 8aa152c7

由 Jeff Layton 提交于 7月 01, 2016

Nothing calls it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

8aa152c7

ceph: don't use ->d_time · 9b16f03c

由 Miklos Szeredi 提交于 6月 22, 2016

Pretty simple: just use ceph_dentry_info.time instead (which was already
there, unused).
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

9b16f03c

ceph: rados pool namespace support · 779fe0fb

由 Yan, Zheng 提交于 3月 07, 2016

This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

779fe0fb

11 6月, 2016 1 次提交

vfs: make the string hashes salt the hash · 8387ff25

由 Linus Torvalds 提交于 6月 10, 2016

We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time.  It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.

A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.

Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.

Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8387ff25

26 5月, 2016 2 次提交

ceph: fix wake_up_session_cb() · e5360309

由 Yan, Zheng 提交于 5月 19, 2016

We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e5360309

ceph: using hash value to compose dentry offset · f3c4ebe6

由 Yan, Zheng 提交于 4月 29, 2016

If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:

  (0xff << 52) | ((24 bits hash) << 28) |
  (the nth entry hash hash collision)

This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

f3c4ebe6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功