提交 · 235a09821c2bc71d9d07f12217ce2ac00db99eba · openeuler / raspberrypi-kernel

26 5月, 2016 1 次提交

ceph: multiple filesystem support · 235a0982

由 Yan, Zheng 提交于 3月 30, 2016

To access non-default filesystem, we just need to subscribe to
mdsmap.<MDS_NAMESPACE_ID> and add a new mount option for mds
namespace id.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: switch to a new libceph API]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

235a0982

26 3月, 2016 4 次提交

Y
ceph: kill ceph_get_dentry_parent_inode() · 641235d8
由 Yan, Zheng 提交于 3月 16, 2016
```
use vfs helper dget_parent() instead
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
641235d8

ceph: fix security xattr deadlock · 315f2408

由 Yan, Zheng 提交于 3月 07, 2016

When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

315f2408

ceph: kill ceph_empty_snapc · 34b759b4

由 Ilya Dryomov 提交于 2月 16, 2016

ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.

Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless.  The two cases where it
actually mattered were removed in commits 86056090 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
queuing inode to mdsdir's snaprealm").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

34b759b4

ceph: don't enable rbytes mount option by default · 133e9156

由 Yan, Zheng 提交于 1月 25, 2016

When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)
Signed-off-by: NYan, Zheng <zyan@redhat.com>

133e9156

05 3月, 2016 1 次提交

ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support · 5ea5c5e0

由 Yan, Zheng 提交于 2月 14, 2016

Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5ea5c5e0

03 11月, 2015 1 次提交

ceph: make fsync() wait unsafe requests that created/modified inode · 68cd5b4b

由 Yan, Zheng 提交于 10月 27, 2015

If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

68cd5b4b

31 7月, 2015 1 次提交

ceph: always re-send cap flushes when MDS recovers · fc927cd3

由 Yan, Zheng 提交于 7月 20, 2015

commit e548e9b9 makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.

This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fc927cd3

25 6月, 2015 9 次提交

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9

由 Yan, Zheng 提交于 6月 10, 2015

if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e548e9b9

ceph: track pending caps flushing globally · 8310b089

由 Yan, Zheng 提交于 6月 09, 2015

So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

8310b089

ceph: track pending caps flushing accurately · 553adfd9

由 Yan, Zheng 提交于 6月 09, 2015

Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

553adfd9

ceph: don't pre-allocate space for cap release messages · 745a8e3b

由 Yan, Zheng 提交于 5月 14, 2015

Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

745a8e3b

ceph: avoid sending unnessesary FLUSHSNAP message · 86056090

由 Yan, Zheng 提交于 5月 01, 2015

when a snap notification contains no new snapshot, we can avoid
sending FLUSHSNAP message to MDS. But we still need to create
cap_snap in some case because it's required by write path and
page writeback path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

86056090

ceph: use empty snap context for uninline_data and get_pool_perm · 7b06a826

由 Yan, Zheng 提交于 5月 01, 2015

Cached_context in ceph_snap_realm is directly accessed by
uninline_data() and get_pool_perm(). This is racy in theory.
both uninline_data() and get_pool_perm() do not modify existing
object, they only create new object. So we can pass the empty
snap context to them.  Unlike cached_context in ceph_snap_realm,
we do not need to protect the empty snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7b06a826

Y
ceph: check OSD caps before read/write · 10183a69
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
10183a69

20 4月, 2015 2 次提交

ceph: remove redundant declaration · e1eba3ea

由 Fabian Frederick 提交于 3月 03, 2015

ceph_aops was already defined extern in addr.c section
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e1eba3ea

Y
ceph: fix dcache/nocache mount option · e2c3de04
由 Yan, Zheng 提交于 3月 04, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
e2c3de04

19 2月, 2015 2 次提交

Y
ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
由 Yan, Zheng 提交于 1月 14, 2015
```
remove all unsupported operations from {inode,file}_operations.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
38c48b5f

ceph: improve reference tracking for snaprealm · 982d6011

由 Yan, Zheng 提交于 12月 23, 2014

When snaprealm is created, its initial reference count is zero.
But in some rare cases, the newly created snaprealm is not referenced
by anyone. This causes snaprealm with zero reference count not freed.

The fix is set reference count of newly snaprealm to 1. The reference
is return the function who requests to create the snaprealm. When the
function finishes its job, it releases the reference.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

982d6011

18 12月, 2014 6 次提交

ceph: flush inline version · e20d258d

由 Yan, Zheng 提交于 11月 14, 2014

After converting inline data to normal data, client need to flush
the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
cap messages (sent to MDS) contain inline_version and inline_data.
Client always converts inline data to normal data before data write,
so the inline data length part is always zero.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e20d258d

ceph: convert inline data to normal data before data write · 28127bdd

由 Yan, Zheng 提交于 11月 14, 2014

Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

28127bdd

ceph: use getattr request to fetch inline data · 01deead0

由 Yan, Zheng 提交于 11月 14, 2014

Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
in getattr reply will be copied to the page.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

01deead0

ceph: add inline data to pagecache · 31c542a1

由 Yan, Zheng 提交于 11月 14, 2014

Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

31c542a1

ceph: introduce global empty snap context · 97c85a82

由 Yan, Zheng 提交于 11月 06, 2014

Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()

The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.

Fixes: http://tracker.ceph.com/issues/9928Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

97c85a82

ceph: introduce a new inode flag indicating if cached dentries are ordered · 70db4f36

由 Yan, Zheng 提交于 10月 21, 2014

After creating/deleting/renaming file, offsets of sibling dentries may
change. So we can not use cached dentries to satisfy readdir. But we can
still use the cached dentries to conclude -ENOENT for lookup.

This patch introduces a new inode flag indicating if child dentries are
ordered. The flag is set at the same time marking a directory complete.
After creating/deleting/renaming file, we clear the flag on directory
inode. This prevents ceph_readdir() from using cached dentries to satisfy
readdir syscall.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

70db4f36

15 10月, 2014 3 次提交

ceph: additional debugfs output · 14ed9703

由 John Spray 提交于 9月 12, 2014

MDS session state and client global ID is
useful instrumentation when testing.
Signed-off-by: NJohn Spray <john.spray@redhat.com>

14ed9703

ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa

由 Yan, Zheng 提交于 9月 16, 2014

Current code set new file/directory's initial ACL in a non-atomic
manner.
Client first sends request to MDS to create new file/directory, then set
the initial ACL after the new file/directory is successfully created.

The fix is include the initial ACL in create/mkdir/mknod MDS requests.
So MDS can handle creating file/directory and setting the initial ACL in
one request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b1ee94aa

ceph: request xattrs if xattr_version is zero · 508b32d8

由 Yan, Zheng 提交于 9月 16, 2014

Following sequence of events can happen.
  - Client releases an inode, queues cap release message.
  - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

The fix is force sending a getattr request to MDS if xattrs_version
is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
does not have xattr.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

508b32d8

06 6月, 2014 1 次提交

ceph: pre-allocate ceph_cap struct for ceph_add_cap() · d9df2783

由 Yan, Zheng 提交于 4月 18, 2014

So that ceph_add_cap() can be used while i_ceph_lock is locked.
This simplifies the code that handle cap import/export.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

d9df2783

29 4月, 2014 1 次提交

ceph: clear directory's completeness when creating file · 0a8a70f9

由 Yan, Zheng 提交于 4月 14, 2014

When creating a file, ceph_set_dentry_offset() puts the new dentry
at the end of directory's d_subdirs, then set the dentry's offset
based on directory's max offset. The offset does not reflect the
real postion of the dentry in directory. Later readdir reply from
MDS may change the dentry's position/offset. This inconsistency
can cause missing/duplicate entries in readdir result if readdir
is partly satisfied by dcache_readdir().

The fix is clear directory's completeness after creating/renaming
file. It prevents later readdir from using dcache_readdir().

Fixes: http://tracker.ceph.com/issues/8025Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0a8a70f9

05 4月, 2014 1 次提交

ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832

由 Yan, Zheng 提交于 3月 09, 2014

flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

eb13e832

03 4月, 2014 1 次提交

ceph: fix ceph_dir_llseek() · f0494206

由 Yan, Zheng 提交于 2月 27, 2014

Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for
directory. For a fragmented directory, offset (frag_t, off) can be
larger than inode->i_sb->s_maxbytes.

At the very beginning of ceph_dir_llseek(), local variable old_offset
is initialized to parameter offset. This doesn't make sense neither.
Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset).
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f0494206

18 2月, 2014 1 次提交

ceph: make ceph_forget_all_cached_acls() static inline · c969d9bf

由 Guangliang Zhao 提交于 2月 16, 2014

Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NSage Weil <sage@inktank.com>

c969d9bf

31 1月, 2014 1 次提交
- P
  ceph: remove duplicate declaration of ceph_setattr · 32d35d44
  由 Peter Rosin 提交于 1月 30, 2014
```
Signed-off-by: NPeter Rosin <peda@lysator.liu.se>
Signed-off-by: NSage Weil <sage@inktank.com>
```
  32d35d44
30 1月, 2014 1 次提交

ceph: fix posix ACL hooks · 72466d0b

由 Sage Weil 提交于 1月 29, 2014

The merge of commit 7221fe4c ("ceph: add acl for cephfs") raced with
upstream changes in the generic POSIX ACL code (eg commit 2aeccbe9
"fs: add generic xattr_acl handlers" and others).

Some of the fallout was fixed in commit 4db658ea ("ceph: Fix up after
semantic merge conflict"), but it was incomplete: the set_acl
inode_operation wasn't getting set, and the prototype needed to be
adjusted a bit (it doesn't take a dentry anymore).
Signed-off-by: NSage Weil <sage@inktank.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72466d0b

29 1月, 2014 1 次提交

ceph: Fix up after semantic merge conflict · 4db658ea

由 Linus Torvalds 提交于 1月 28, 2014

The previous ceph-client merge resulted in ceph not even building,
because there was a merge conflict that wasn't visible as an actual data
conflict: commit 7221fe4c ("ceph: add acl for cephfs") added support
for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
a lot of the POSIX ACL helper functions to be much more helpful to
filesystems (see for example commits 2aeccbe9 "fs: add generic
xattr_acl handlers", 5bf3258f "fs: make posix_acl_chmod more useful"
and 37bc1539 "fs: make posix_acl_create more useful")

The reason this conflict wasn't obvious was many-fold: because it was a
semantic conflict rather than a data conflict, it wasn't visible in the
git merge as a conflict.  And because the VFS tree hadn't been in
linux-next, people hadn't become aware of it that way.  And because I
was at jury duty this morning, I was using my laptop and as a result not
doing constant "allmodconfig" builds.

Anyway, this fixes the build and generally removes a fair chunk of the
Ceph POSIX ACL support code, since the improved helpers seem to match
really well for Ceph too.  But I don't actually have any way to *test*
the end result, and I was really hoping for some ACK's for this.  Oh,
well.

Not compiling certainly doesn't make things easier to test, so I'm
committing this without the acks after having waited for four hours...
Plus it's what I would have done for the merge had I noticed the
semantic conflict..
Reported-by: NDave Jones <davej@redhat.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Guangliang Zhao <lucienchao@gmail.com>
Cc: Li Wang <li.wang@ubuntykylin.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4db658ea

21 1月, 2014 2 次提交

ceph: add imported caps when handling cap export message · 11df2dfb

由 Yan, Zheng 提交于 11月 24, 2013

Version 3 cap export message includes information about the imported
caps. It allows us to add the imported caps if the corresponding cap
import message still hasn't been received.

This allow us to handle situation that the importer MDS crashes and
the cap import message is missing.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

11df2dfb

ceph: check inode caps in ceph_d_revalidate · 9215aeea

由 Yan, Zheng 提交于 11月 30, 2013

Some inodes in readdir reply may have no caps. Getattr mds request
for these inodes can return -ESTALE. The fix is consider dentry that
links to inode with no caps as invalid. Invalid dentry causes a
lookup request to send to the mds, the MDS will send caps back.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

9215aeea