提交 · 77310320c299b0dc050037ff8fc29fd1861fb005 · openeuler / raspberrypi-kernel

26 5月, 2016 1 次提交

ceph: renew caps for read/write if mds session got killed. · 77310320

由 Yan, Zheng 提交于 4月 08, 2016

When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

77310320

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

26 3月, 2016 1 次提交
- Y
  ceph: encode ctime in cap message · d1eee0c0
  由 Yan, Zheng 提交于 1月 22, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  d1eee0c0
05 3月, 2016 1 次提交

ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support · 5ea5c5e0

由 Yan, Zheng 提交于 2月 14, 2016

Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5ea5c5e0

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

03 11月, 2015 2 次提交

ceph: make fsync() wait unsafe requests that created/modified inode · 68cd5b4b

由 Yan, Zheng 提交于 10月 27, 2015

If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

68cd5b4b

ceph: don't invalidate page cache when inode is no longer used · 5e804ac4

由 Yan, Zheng 提交于 10月 26, 2015

ceph_check_caps() invalidate page cache when inode is not used
by any open file. This behaviour is not friendly for workload
that repeatly read files.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5e804ac4

09 9月, 2015 1 次提交

ceph: EIO all operations after forced umount · 48fec5d0

由 Yan, Zheng 提交于 7月 01, 2015

This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

48fec5d0

31 7月, 2015 1 次提交

ceph: always re-send cap flushes when MDS recovers · fc927cd3

由 Yan, Zheng 提交于 7月 20, 2015

commit e548e9b9 makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.

This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fc927cd3

25 6月, 2015 15 次提交

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9

由 Yan, Zheng 提交于 6月 10, 2015

if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e548e9b9

ceph: send TID of the oldest pending caps flush to MDS · a2971c8c

由 Yan, Zheng 提交于 6月 10, 2015

According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a2971c8c

ceph: track pending caps flushing globally · 8310b089

由 Yan, Zheng 提交于 6月 09, 2015

So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

8310b089

ceph: track pending caps flushing accurately · 553adfd9

由 Yan, Zheng 提交于 6月 09, 2015

Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

553adfd9

ceph: fix directory fsync · da819c81

由 Yan, Zheng 提交于 5月 27, 2015

fsync() on directory should flush dirty caps and wait for any
uncommitted directory opertions to commit. But ceph_dir_fsync()
only waits for uncommitted directory opertions.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

da819c81

ceph: fix flushing caps · 89b52fe1

由 Yan, Zheng 提交于 5月 27, 2015

Current ceph_fsync() only flushes dirty caps and wait for them to be
flushed. It doesn't wait for caps that has already been flushing.
This patch makes ceph_fsync() wait for pending flushing caps too.
Besides, this patch also makes caps_are_flushed() peroperly handle
tid wrapping.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

89b52fe1

ceph: don't include used caps in cap_wanted · 41445999

由 Yan, Zheng 提交于 5月 25, 2015

when copying files to cephfs, file data may stay in page cache after
corresponding file is closed. Cached data use Fc capability. If we
include Fc capability in cap_wanted, MDS will treat files with cached
data as open files, and journal them in an EOpen event when trimming
log segment.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

41445999

ceph: don't pre-allocate space for cap release messages · 745a8e3b

由 Yan, Zheng 提交于 5月 14, 2015

Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

745a8e3b

Y
ceph: make sure syncfs flushes all cap snaps · affbc19a
由 Yan, Zheng 提交于 5月 05, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
affbc19a

ceph: take snap_rwsem when accessing snap realm's cached_context · 604d1b02

由 Yan, Zheng 提交于 5月 01, 2015

When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps()
accesses snap realm's cached_context. So we need take read lock
of snap_rwsem.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

604d1b02

ceph: avoid sending unnessesary FLUSHSNAP message · 86056090

由 Yan, Zheng 提交于 5月 01, 2015

when a snap notification contains no new snapshot, we can avoid
sending FLUSHSNAP message to MDS. But we still need to create
cap_snap in some case because it's required by write path and
page writeback path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

86056090

ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c

由 Yan, Zheng 提交于 4月 30, 2015

In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5dda377c

Y
ceph: check OSD caps before read/write · 10183a69
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
10183a69

20 4月, 2015 3 次提交

ceph: hold on to exclusive caps on complete directories · 32ec4397

由 Yan, Zheng 提交于 3月 26, 2015

If a directory is complete, we want to keep the exclusive
cap. So that MDS does not end up revoking the shared cap
on every create/unlink operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

32ec4397

ceph: don't mark dirty caps when there is no auth cap · 571ade33

由 Yan, Zheng 提交于 3月 24, 2015

No i_auth_cap means reconnecting to MDS was denied. So don't
add new dirty caps.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

571ade33

ceph: keep i_snap_realm while there are writers · db40cc17

由 Yan, Zheng 提交于 3月 23, 2015

when reconnecting to MDS is denied, we remove session caps
forcibly. But it's possible there are ongoing write, the
write code needs to reference i_snap_realm. So if there are
ongoing write, we keep i_snap_realm.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

db40cc17

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

19 2月, 2015 4 次提交

ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) · c4d4a582

由 Yan, Zheng 提交于 1月 09, 2015

we should not do block operation in wait_event_interruptible()'s condition
check function, but reading inline data can block. so move the read inline
data code to ceph_get_caps()
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c4d4a582

ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) · d3383a8e

由 Yan, Zheng 提交于 1月 08, 2015

check_cap_flush() calls mutex_lock(), which may block. So we can't
use it as condition check function for wait_event();
Signed-off-by: NYan, Zheng <zyan@redhat.com>

d3383a8e

ceph: improve reference tracking for snaprealm · 982d6011

由 Yan, Zheng 提交于 12月 23, 2014

When snaprealm is created, its initial reference count is zero.
But in some rare cases, the newly created snaprealm is not referenced
by anyone. This causes snaprealm with zero reference count not freed.

The fix is set reference count of newly snaprealm to 1. The reference
is return the function who requests to create the snaprealm. When the
function finishes its job, it releases the reference.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

982d6011

ceph: handle SESSION_FORCE_RO message · 03f4fcb0

由 Yan, Zheng 提交于 1月 05, 2015

mark session as readonly and wake up all cap waiters.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

03f4fcb0

18 12月, 2014 5 次提交

ceph: flush inline version · e20d258d

由 Yan, Zheng 提交于 11月 14, 2014

After converting inline data to normal data, client need to flush
the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
cap messages (sent to MDS) contain inline_version and inline_data.
Client always converts inline data to normal data before data write,
so the inline data length part is always zero.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e20d258d

ceph: fetch inline data when getting Fcr cap refs · 3738daa6

由 Yan, Zheng 提交于 11月 14, 2014

we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3738daa6

ceph: add inline data to pagecache · 31c542a1

由 Yan, Zheng 提交于 11月 14, 2014

Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

31c542a1

Y
ceph: parse inline data in MClientReply and MClientCaps · fb01d1f8
由 Yan, Zheng 提交于 11月 14, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
fb01d1f8

ceph, rbd: delete unnecessary checks before two function calls · e96a650a

由 SF Markus Elfring 提交于 11月 02, 2014

The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
[idryomov@redhat.com: squashed rbd.c hunk, changelog]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

e96a650a

14 11月, 2014 1 次提交

ceph: fix flush tid comparision · 3231300b

由 Yan, Zheng 提交于 10月 22, 2014

TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bits one
to 16 bits.

Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

3231300b

15 10月, 2014 2 次提交

ceph: fix bool assignments · ab6c2c3e

由 Fabian Frederick 提交于 10月 09, 2014

Fix some coccinelle warnings:
fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ab6c2c3e

ceph: move ceph_find_inode() outside the s_mutex · 6cd3bcad

由 Yan, Zheng 提交于 9月 17, 2014

ceph_find_inode() may wait on freeing inode, using it inside the s_mutex
may cause deadlock. (the freeing inode is waiting for OSD read reply, but
dispatch thread is blocked by the s_mutex)
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6cd3bcad