提交 · adf0d68701c7f3e50f21308c76f41e60956a6832 · openeuler / Kernel

20 2月, 2017 1 次提交

ceph: fix unsafe dcache access in ceph_encode_dentry_release · adf0d687

由 Jeff Layton 提交于 12月 15, 2016

Accessing d_parent requires some sort of locking or it could vanish
out from under us. Since we take the d_lock anyway, use that to fetch
d_parent and take a reference to it, and then use that reference to
call ceph_encode_inode_release.

Link: http://tracker.ceph.com/issues/18148Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

adf0d687

19 1月, 2017 1 次提交

ceph: fix ceph_get_caps() interruption · 6e09d0fb

由 Yan, Zheng 提交于 12月 22, 2016

Commit 5c341ee3 ("ceph: fix scheduler warning due to nested
blocking") causes infinite loop when process is interrupted.  Fix it.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6e09d0fb

13 12月, 2016 8 次提交

Y
ceph: properly set issue_seq for cap release · dc24de82
由 Yan, Zheng 提交于 11月 17, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
dc24de82

ceph: add flags parameter to send_cap_msg · 1e4ef0c6

由 Jeff Layton 提交于 11月 10, 2016

Add a flags parameter to send_cap_msg, so we can request expedited
service from the MDS when we know we'll be waiting on the result.

Set that flag in the case of try_flush_caps. The callers of that
function generally wait synchronously on the result, so it's beneficial
to ask the server to expedite it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

1e4ef0c6

ceph: update cap message struct version to 10 · 43b29673

由 Jeff Layton 提交于 11月 10, 2016

The userland ceph has MClientCaps at struct version 10. This brings the
kernel up the same version.

For now, all of the the new stuff is set to default values including
the flags field, which will be conditionally set in a later patch.

Note that we don't need to set the change_attr and btime to anything
since we aren't currently setting the feature flag. The MDS should
ignore those values.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

43b29673

ceph: define new argument structure for send_cap_msg · 0ff8bfb3

由 Jeff Layton 提交于 11月 10, 2016

When we get to this many arguments, it's hard to work with positional
parameters. send_cap_msg is already at 25 arguments, with more needed.

Define a new args structure and pass a pointer to it to send_cap_msg.
Eventually it might make sense to embed one of these inside
ceph_cap_snap instead of tracking individual fields.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

0ff8bfb3

ceph: move xattr initialzation before the encoding past the ceph_mds_caps · 9670079f

由 Jeff Layton 提交于 11月 10, 2016

Just for clarity. This part is inside the header, so it makes sense to
group it with the rest of the stuff in the header.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

9670079f

J
ceph: fix minor typo in unsafe_request_wait · 4945a084
由 Jeff Layton 提交于 11月 16, 2016
```
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
```
4945a084

ceph: try getting buffer capability for readahead/fadvise · 2b1ac852

由 Yan, Zheng 提交于 10月 25, 2016

For readahead/fadvise cases, caller of ceph_readpages does not
hold buffer capability. Pages can be added to page cache while
there is no buffer capability. This can cause data integrity
issue.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

2b1ac852

ceph: fix scheduler warning due to nested blocking · 5c341ee3

由 Nikolay Borisov 提交于 10月 11, 2016

try_get_cap_refs can be used as a condition in a wait_event* calls.
This is all fine until it has to call __ceph_do_pending_vmtruncate,
which in turn acquires the i_truncate_mutex. This leads to a situation
in which a task's state is !TASK_RUNNING and at the same time it's
trying to acquire a sleeping primitive. In essence a nested sleeping
primitives are being used. This causes the following warning:

WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110
 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G           O    4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
Call Trace:
 [<ffffffff812f4409>] dump_stack+0x67/0x9e
 [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0
 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0
 [<ffffffff81612d30>] mutex_lock+0x20/0x40
 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
 [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph]
 [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph]
 [<ffffffff81094370>] ? wait_woken+0xb0/0xb0
 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph]
 [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260
 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200
 [<ffffffff811b46ce>] ? iput+0x9e/0x230
 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0
 [<ffffffff81156147>] ? __might_fault+0x37/0x40
 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170
 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0
 [<ffffffff81199369>] vfs_write+0xa9/0x190
 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70
 [<ffffffff8119a056>] SyS_write+0x46/0xa0

This happens since wait_event_interruptible can interfere with the
mutex locking code, since they both fiddle with the task state.

Fix the issue by using the newly-added nested blocking infrastructure
in 61ada528 ("sched/wait: Provide infrastructure to deal with
nested blocking")

Link: https://lwn.net/Articles/628628/Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5c341ee3

09 8月, 2016 1 次提交
- Y
  ceph: fix null pointer dereference in ceph_flush_snaps() · e4d2b16a
  由 Yan, Zheng 提交于 8月 04, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  e4d2b16a
28 7月, 2016 12 次提交

ceph: optimize cap flush waiting · c8799fc4

由 Yan, Zheng 提交于 7月 07, 2016

Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c8799fc4

ceph: cleanup ceph_flush_snaps() · ed9b430c

由 Yan, Zheng 提交于 7月 05, 2016

This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

ed9b430c

ceph: kick cap flushes before sending other cap message · 7bc00fdd

由 Yan, Zheng 提交于 7月 07, 2016

If ceph_check_caps() wants to send cap message to a recovering MDS,
make sure it kicks cap flushes first.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7bc00fdd

Y
ceph: introduce an inode flag to indicates if snapflush is needed · 70220ac8
由 Yan, Zheng 提交于 7月 06, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
70220ac8

ceph: avoid sending duplicated cap flush message · 13c2b57d

由 Yan, Zheng 提交于 7月 05, 2016

make ceph_kick_flushing_caps() ignore inodes whose cap flushes
have already been re-sent by ceph_early_kick_flushing_caps()
Signed-off-by: NYan, Zheng <zyan@redhat.com>

13c2b57d

ceph: unify cap flush and snapcap flush · 0e294387

由 Yan, Zheng 提交于 7月 04, 2016

This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
  s_cap_flushing list instead. Inode is removed from the list when
  there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
  flushes.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0e294387

ceph: use list instead of rbtree to track cap flushes · e4500b5e

由 Yan, Zheng 提交于 7月 06, 2016

We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e4500b5e

Y
ceph: update types of some local varibles · 3609404f
由 Yan, Zheng 提交于 7月 06, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
3609404f

ceph: wait unsafe sync writes for evicting inode · 9a5530c6

由 Yan, Zheng 提交于 6月 15, 2016

Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

9a5530c6

ceph: reduce i_nr_by_mode array size · 774a6a11

由 Yan, Zheng 提交于 6月 06, 2016

Track usage count for individual fmode bit. This can reduce the
array size by half.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

774a6a11

ceph: rados pool namespace support · 779fe0fb

由 Yan, Zheng 提交于 3月 07, 2016

This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

779fe0fb

libceph: define new ceph_file_layout structure · 7627151e

由 Yan, Zheng 提交于 2月 03, 2016

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7627151e

01 6月, 2016 2 次提交

ceph: improve fscache revalidation · f7f7e7a0

由 Yan, Zheng 提交于 5月 18, 2016

There are several issues in fscache revalidation code.
- In ceph_revalidate_work(), fscache_invalidate() is called when
  fscache_check_consistency() return 0. This is complete wrong
  because 0 means cache is valid.
- Handle_cap_grant() calls ceph_queue_revalidate() if client
  already has CAP_FILE_CACHE. This code is confusing. Client
  should revalidate the cache each time it got CAP_FILE_CACHE
  anew.
- In Handle_cap_grant(), fscache_invalidate() is called if MDS
  revokes CAP_FILE_CACHE. This is inconsistency with the case
  that inode get evicted. In the later case, the cache is not
  discarded. Client may use the cache when inode is reloaded.

This patch moves the fscache revalidation into ceph_get_caps().
Client revalidates the cache after it gets CAP_FILE_CACHE.
i_rdcache_gen should keep constance while CAP_FILE_CACHE is
used. If i_fscache_gen is not equal to i_rdcache_gen, client
needs to check cache's consistency.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

f7f7e7a0

ceph: avoid unnecessary fscache invalidation/revlidation · 14649758

由 Yan, Zheng 提交于 5月 20, 2016

ceph_fill_file_size() has already called ceph_fscache_invalidate()
if it return true.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

14649758

26 5月, 2016 2 次提交

ceph: don't use truncate_pagecache() to invalidate read cache · 9abd4db7

由 Yan, Zheng 提交于 5月 18, 2016

truncate_pagecache() drops dirty pages, it's dangerous to use it
to invalidate read cache. Besides, we shouldn't start invalidating
read cache while there are buffer writers. Because buffer writers
may add dirty pages later.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

9abd4db7

ceph: renew caps for read/write if mds session got killed. · 77310320

由 Yan, Zheng 提交于 4月 08, 2016

When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

77310320

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

26 3月, 2016 1 次提交
- Y
  ceph: encode ctime in cap message · d1eee0c0
  由 Yan, Zheng 提交于 1月 22, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  d1eee0c0
05 3月, 2016 1 次提交

ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support · 5ea5c5e0

由 Yan, Zheng 提交于 2月 14, 2016

Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5ea5c5e0

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

03 11月, 2015 2 次提交

ceph: make fsync() wait unsafe requests that created/modified inode · 68cd5b4b

由 Yan, Zheng 提交于 10月 27, 2015

If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

68cd5b4b

ceph: don't invalidate page cache when inode is no longer used · 5e804ac4

由 Yan, Zheng 提交于 10月 26, 2015

ceph_check_caps() invalidate page cache when inode is not used
by any open file. This behaviour is not friendly for workload
that repeatly read files.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5e804ac4

09 9月, 2015 1 次提交

ceph: EIO all operations after forced umount · 48fec5d0

由 Yan, Zheng 提交于 7月 01, 2015

This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

48fec5d0

31 7月, 2015 1 次提交

ceph: always re-send cap flushes when MDS recovers · fc927cd3

由 Yan, Zheng 提交于 7月 20, 2015

commit e548e9b9 makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.

This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fc927cd3

25 6月, 2015 5 次提交

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9

由 Yan, Zheng 提交于 6月 10, 2015

if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e548e9b9

ceph: send TID of the oldest pending caps flush to MDS · a2971c8c

由 Yan, Zheng 提交于 6月 10, 2015

According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a2971c8c

ceph: track pending caps flushing globally · 8310b089

由 Yan, Zheng 提交于 6月 09, 2015

So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

8310b089

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功