提交 · d6d5df1db6e9d7f8f76d2911707f7d5877251b02 · openeuler / Kernel

16 9月, 2019 11 次提交

ceph: remove incorrect comment above __send_cap · 98cd281a

由 Jeff Layton 提交于 7月 05, 2019

It doesn't do anything to invalidate the cache when dropping RD caps.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

98cd281a

ceph: remove CEPH_I_NOFLUSH · daca8bda

由 Jeff Layton 提交于 7月 05, 2019

Nothing sets this flag.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

daca8bda

ceph: remove unneeded test in try_flush_caps · 27b0a392

由 Jeff Layton 提交于 7月 05, 2019

cap->session is always non-NULL, so we can just do a single test for
equality w/o testing explicitly for a NULL pointer.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

27b0a392

ceph: have __mark_caps_flushing return flush_tid · 9f3345d8

由 Jeff Layton 提交于 7月 08, 2019

Currently, this function returns ci->i_dirty_caps, but the callers have
to check that that isn't 0 before calling this function. Have the
callers grab that value directly out of the inode, and have
__mark_caps_flushing return the flush_tid instead.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9f3345d8

ceph: fix comments over ceph_add_cap · 354c63a0

由 Jeff Layton 提交于 7月 19, 2019

We actually need the ci->i_ceph_lock here. The necessity of the s_mutex
is less clear. Also add a lockdep assertion for the i_ceph_lock.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

354c63a0

ceph: fetch cap_gen under spinlock in ceph_add_cap · 606d1023

由 Jeff Layton 提交于 7月 22, 2019

It's protected by the s_gen_ttl_lock, so we should fetch under it
and ensure that we're using the same generation in both places.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

606d1023

ceph: remove ceph_get_cap_mds and __ceph_get_cap_mds · 5de16b30

由 Jeff Layton 提交于 7月 23, 2019

Nothing calls these routines.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5de16b30

ceph: invalidate all write mode filp after reconnect · 81f148a9

由 Yan, Zheng 提交于 7月 25, 2019

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

81f148a9

ceph: return -EIO if read/write against filp that lost file locks · ff5d913d

由 Yan, Zheng 提交于 7月 25, 2019

After mds evicts session, file locks get lost sliently. It's not safe to
let programs continue to do read/write.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff5d913d

ceph: pass filp to ceph_get_caps() · 5e3ded1b

由 Yan, Zheng 提交于 7月 25, 2019

Also change several other functions' arguments, no logical changes.
This is preparetion for later patch that checks filp error.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5e3ded1b

ceph: track and report error of async metadata operation · f4b97866

由 Yan, Zheng 提交于 7月 25, 2019

Use errseq_t to track and report errors of async metadata operations,
similar to how kernel handles errors during writeback.

If any dirty caps or any unsafe request gets dropped during session
eviction, record -EIO in corresponding inode's i_meta_err. The error
will be reported by subsequent fsync,
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f4b97866

22 8月, 2019 1 次提交

ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() · 12fe3dda

由 Luis Henriques 提交于 7月 19, 2019

Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in
freeing the i_xattrs.blob buffer while holding the i_ceph_lock.  This can
be fixed by having this function returning the old blob buffer and have
the callers of this function freeing it when the lock is released.

The following backtrace was triggered by fstests generic/117.

  BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
  in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress
  4 locks held by fsstress/649:
   #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0
   #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60
   #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60
   #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60
  CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x67/0x90
   ___might_sleep.cold+0x9f/0xb1
   vfree+0x4b/0x60
   ceph_buffer_release+0x1b/0x60
   __ceph_build_xattrs_blob+0x12b/0x170
   __send_cap+0x302/0x540
   ? __lock_acquire+0x23c/0x1e40
   ? __mark_caps_flushing+0x15c/0x280
   ? _raw_spin_unlock+0x24/0x30
   ceph_check_caps+0x5f0/0xc60
   ceph_flush_dirty_caps+0x7c/0x150
   ? __ia32_sys_fdatasync+0x20/0x20
   ceph_sync_fs+0x5a/0x130
   iterate_supers+0x8f/0xf0
   ksys_sync+0x4f/0xb0
   __ia32_sys_sync+0xa/0x10
   do_syscall_64+0x50/0x1c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fc6409ab617
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

12fe3dda

08 7月, 2019 7 次提交

ceph: more precise CEPH_CLIENT_CAPS_PENDING_CAPSNAP · 49ada6e8

由 Yan, Zheng 提交于 6月 20, 2019

Client uses this flag to tell mds if there is more cap snap need to
flush. It's mainly for the case that client needs to re-send cap/snap
flushes after mds failover, but CEPH_CAP_ANY_FILE_WR on corresponding
inodes are all released before mds failover.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

49ada6e8

ceph: kick flushing and flush snaps before sending normal cap message · d6cee9db

由 Yan, Zheng 提交于 6月 20, 2019

Otherwise client may send cap flush messages in wrong order.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d6cee9db

ceph: clear CEPH_I_KICK_FLUSH flag inside __kick_flushing_caps() · 054f8d41

由 Yan, Zheng 提交于 6月 20, 2019

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

054f8d41

ceph: handle change_attr in cap messages · 176c77c9

由 Jeff Layton 提交于 6月 06, 2019

Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

176c77c9

ceph: handle btime in cap messages · ec62b894

由 Jeff Layton 提交于 5月 29, 2019

Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ec62b894

ceph: add selinux support · ac6713cc

由 Yan, Zheng 提交于 5月 26, 2019

When creating new file/directory, use security_dentry_init_security() to
prepare selinux context for the new inode, then send openc/mkdir request
to MDS, together with selinux xattr.

security_dentry_init_security() only supports single security module and
only selinux has dentry_init_security hook. So only selinux is supported
for now. We can add support for other security modules once kernel has a
generic version of dentry_init_security()
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ac6713cc

ceph: hold i_ceph_lock when removing caps for freeing inode · d6e47819

由 Yan, Zheng 提交于 5月 23, 2019

ceph_d_revalidate(, LOOKUP_RCU) may call __ceph_caps_issued_mask()
on a freeing inode.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d6e47819

06 6月, 2019 2 次提交

ceph: fix error handling in ceph_get_caps() · 7b2f936f

由 Yan, Zheng 提交于 5月 20, 2019

The function return 0 even when interrupted or try_get_cap_refs()
return error.

Fixes: 1199d7da ("ceph: simplify arguments and return semantics of try_get_cap_refs")
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7b2f936f

ceph: avoid iput_final() while holding mutex or in dispatch thread · 3e1d0452

由 Yan, Zheng 提交于 5月 18, 2019

iput_final() may wait for reahahead pages. The wait can cause deadlock.
For example:

  Workqueue: ceph-msgr ceph_con_workfn [libceph]
    Call Trace:
     schedule+0x36/0x80
     io_schedule+0x16/0x40
     __lock_page+0x101/0x140
     truncate_inode_pages_range+0x556/0x9f0
     truncate_inode_pages_final+0x4d/0x60
     evict+0x182/0x1a0
     iput+0x1d2/0x220
     iterate_session_caps+0x82/0x230 [ceph]
     dispatch+0x678/0xa80 [ceph]
     ceph_con_workfn+0x95b/0x1560 [libceph]
     process_one_work+0x14d/0x410
     worker_thread+0x4b/0x460
     kthread+0x105/0x140
     ret_from_fork+0x22/0x40

  Workqueue: ceph-msgr ceph_con_workfn [libceph]
    Call Trace:
     __schedule+0x3d6/0x8b0
     schedule+0x36/0x80
     schedule_preempt_disabled+0xe/0x10
     mutex_lock+0x2f/0x40
     ceph_check_caps+0x505/0xa80 [ceph]
     ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph]
     writepages_finish+0x2d3/0x410 [ceph]
     __complete_request+0x26/0x60 [libceph]
     handle_reply+0x6c8/0xa10 [libceph]
     dispatch+0x29a/0xbb0 [libceph]
     ceph_con_workfn+0x95b/0x1560 [libceph]
     process_one_work+0x14d/0x410
     worker_thread+0x4b/0x460
     kthread+0x105/0x140
     ret_from_fork+0x22/0x40

In above example, truncate_inode_pages_range() waits for readahead pages
while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks
OSD dispatch thread. Later OSD replies (for readahead) can't be handled.

ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock
can happen if iput_final() is called while holding snap_rwsem.

In general, it's not good to call iput_final() inside MDS/OSD dispatch
threads or while holding any mutex.

The fix is introducing ceph_async_iput(), which calls iput_final() in
workqueue.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3e1d0452

08 5月, 2019 4 次提交

ceph: print inode number in __caps_issued_mask debugging messages · 5ddc61fc

由 Jeff Layton 提交于 4月 23, 2019

To make it easier to correlate with MDS logs.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ddc61fc

ceph: simplify arguments and return semantics of try_get_cap_refs · 1199d7da

由 Jeff Layton 提交于 4月 02, 2019

The return of this function is rather complex. It can return 0 or 1,
and in the case of a 1 return, the "err" pointer will be filled out.
This necessitates a lot of copying of values.

We can achieve the same effect by just returning 0, 1 or a negative
error code, and drop the "err" argument from this function.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1199d7da

ceph: fix comment over ceph_drop_caps_for_unlink · a452bc06

由 Jeff Layton 提交于 4月 02, 2019

It's not clear what AUTH_RDCACHE means in this context, and we're
clearly just dropping LINK caps here.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a452bc06

ceph: remove superfluous inode_lock in ceph_fsync · ffb61c55

由 Jeff Layton 提交于 4月 08, 2019

Originally, filemap_write_and_wait took the i_mutex internally, but
commit 02c24a82 pushed the mutex acquisition into the individual
fsync routines, leaving it up to the subsystem maintainers to remove
it if it wasn't needed.

For ceph, I see no reason to take the inode_lock here. All of the
operations inside that lock are protected by their own locking.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ffb61c55

06 3月, 2019 4 次提交

ceph: add mount option to limit caps count · fe33032d

由 Yan, Zheng 提交于 2月 01, 2019

If number of caps exceed the limit, ceph_trim_dentires() also trim
dentries with valid leases. Trimming dentry releases references to
associated inode, which may evict inode and release caps.

By default, there is no limit for caps count.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fe33032d

ceph: touch existing cap when handling reply · 32f6511a

由 Yan, Zheng 提交于 1月 23, 2019

Move cap to tail of session->s_caps list. So ceph_trim_caps() will
trim older caps first.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

32f6511a

ceph: send cap releases more aggressively · e3ec8d68

由 Yan, Zheng 提交于 1月 14, 2019

When pending cap releases fill up one message, start a work to send
cap release message. (old way is sending cap releases every 5 seconds)
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e3ec8d68

Y
ceph: split large reconnect into multiple messages · 81c5a148
由 Yan, Zheng 提交于 1月 01, 2019
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
81c5a148

21 1月, 2019 1 次提交

ceph: clear inode pointer when snap realm gets dropped by its inode · d95e674c

由 Yan, Zheng 提交于 1月 10, 2019

snap realm and corresponding inode have pointers to each other.
The two pointer should get clear at the same time. Otherwise,
snap realm's pointer may reference freed inode.

Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NLuis Henriques <lhenriques@suse.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d95e674c

26 12月, 2018 4 次提交

ceph: update wanted caps after resuming stale session · d2f8bb27

由 Yan, Zheng 提交于 12月 10, 2018

mds contains an optimization, it does not re-issue stale caps if
client does not want any cap.

A special case of the optimization is that client wants some caps,
but skipped updating 'wanted'. For this case, client needs to update
'wanted' when stale session get renewed.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d2f8bb27

ceph: skip updating 'wanted' caps if caps are already issued · fdac94fa

由 Yan, Zheng 提交于 11月 22, 2018

When reading cached inode that already has Fscr caps, this can avoid
two cap messages (one updats 'wanted' caps, one clears 'wanted' caps).
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fdac94fa

Y
ceph: don't request excl caps when mount is readonly · 8a2ac3a8
由 Yan, Zheng 提交于 12月 05, 2018
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
8a2ac3a8

ceph: don't update importing cap's mseq when handing cap export · 3c1392d4

由 Yan, Zheng 提交于 11月 29, 2018

Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.

If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.

Cc: stable@vger.kernel.org
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3c1392d4

22 10月, 2018 3 次提交

ceph: add non-blocking parameter to ceph_try_get_caps() · 2ee9dd95

由 Luis Henriques 提交于 10月 15, 2018

ceph_try_get_caps currently calls try_get_cap_refs with the nonblock
parameter always set to 'true'.  This change adds a new parameter that
allows to set it's value.  This will be useful for a follow-up patch that
will need to get two sets of capabilities for two different inodes without
risking a deadlock.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2ee9dd95

ceph: set timeout conditionally in __cap_delay_requeue · 66802884

由 Xuehan Xu 提交于 10月 11, 2018

__cap_delay_requeue could be invoked through ceph_check_caps when there
exists caps that needs to be sent and are delayed by "i_hold_caps_min"
or "i_hold_caps_max". If __cap_delay_requeue sets timeout unconditionally,
there could be a chance that some "wanted" caps can not be release for a
long since their timeouts are reset every time they get delayed.

Fixes: http://tracker.ceph.com/issues/36369Signed-off-by: NXuehan Xu <xuxuehan@360.cn>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

66802884

ceph: reset cap hold timeout only for requeued inode · 3167893a

由 Chengguang Xu 提交于 7月 30, 2018

__cap_delay_requeue() only requeue inode which does not
have CEPH_I_FLUSH flag, so avoid reset cap hold timeout
for that inode.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3167893a

13 8月, 2018 2 次提交

ceph: refactor error handling code in ceph_reserve_caps() · e5bc08d0

由 Chengguang Xu 提交于 7月 28, 2018

Call new helper __ceph_unreserve_caps() to reduce duplicated code.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e5bc08d0

ceph: refactor ceph_unreserve_caps() · 7bf8f736

由 Chengguang Xu 提交于 7月 28, 2018

The code of ceph_unreserve_caps() and error handling in
ceph_reserve_caps() are duplicated, so introduce a helper
__ceph_unreserve_caps() to reduce duplicated code.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7bf8f736

03 8月, 2018 1 次提交

ceph: use timespec64 for inode timestamp · 9bbeab41

由 Arnd Bergmann 提交于 7月 13, 2018

Since the vfs structures are all using timespec64, we can now
change the internal representation, using ceph_encode_timespec64 and
ceph_decode_timespec64.

In case of ceph_aux_inode however, we need to avoid doing a memcmp()
on uninitialized padding data, so the members of the i_mtime field get
copied individually into 64-bit integers.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9bbeab41

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功