提交 · c843d13caefad9f2f182f38d6bfe492c9f00e086 · openeuler / Kernel

05 6月, 2018 11 次提交

libceph: make abort_on_full a per-osdc setting · c843d13c

由 Ilya Dryomov 提交于 5月 30, 2018

The intent behind making it a per-request setting was that it would be
set for writes, but not for reads.  As it is, the flag is set for all
fs/ceph requests except for pool perm check stat request (technically
a read).

ceph_osdc_abort_on_full() skips reads since the previous commit and
I don't see a use case for marking individual requests.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

c843d13c

ceph: flush pending works before shutdown super · a57d9064

由 Yan, Zheng 提交于 5月 18, 2018

Pending works hold inode references, which cause "Busy inodes after
unmount" warning.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a57d9064

ceph: abort osd requests on force umount · 12b69d5f

由 Yan, Zheng 提交于 5月 11, 2018

This avoid force umount waiting on page writeback:

  io_schedule+0xd/0x30
  wait_on_page_bit_common+0xc6/0x130
  __filemap_fdatawait_range+0xbd/0x100
  filemap_fdatawait_keep_errors+0x15/0x40
  sync_inodes_sb+0x1cf/0x240
  sync_filesystem+0x52/0x90
  generic_shutdown_super+0x1d/0x110
  ceph_kill_sb+0x28/0x80 [ceph]
  deactivate_locked_super+0x35/0x60
  cleanup_mnt+0x36/0x70
  task_work_run+0x79/0xa0
  exit_to_usermode_loop+0x62/0x70
  do_syscall_64+0xdb/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  0xffffffffffffffff
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

12b69d5f

ceph: fix st_nlink stat for directories · 8c6286f1

由 Luis Henriques 提交于 5月 21, 2018

Currently, calling stat on a cephfs directory returns 1 for st_nlink.
This behaviour has recently changed in the fuse client, as some
applications seem to expect this value to be either 0 (if it's
unlinked) or 2 + number of subdirectories. This behaviour was changed
in the fuse client with commit 67c7e4619188 ("client: use common
interp of st_nlink for dirs").

This patch modifies the kernel client to have a similar behaviour.

Link: https://tracker.ceph.com/issues/23873Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8c6286f1

ceph: support file lock on directory · 597817dd

由 Yan, Zheng 提交于 5月 15, 2018

Link: http://tracker.ceph.com/issues/24028Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

597817dd

ceph: show wsize only if non-default · 6dd4940b

由 Ilya Dryomov 提交于 5月 03, 2018

This is how it was before commit 95cca2b4 ("ceph: limit osd write
size") went in.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6dd4940b

ceph: handle the new nfiles/nsubdirs fields in cap message · 4985d6f9

由 Yan, Zheng 提交于 4月 27, 2018

Without these new fields, stale st_size is returned in following
case.

1. MDS modifies a directory
2. MDS issues CEPH_CAP_ANY_SHARED to client
3. The client satifies stat(2) by its cached metadata. set st_size
   to "i_files + i_subdirs".

Link: http://tracker.ceph.com/issues/23855Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4985d6f9

ceph: define argument structure for handle_cap_grant · a1c6b835

由 Yan, Zheng 提交于 4月 27, 2018

The data structure includes the versioned feilds of cap message.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a1c6b835

ceph: update i_files/i_subdirs only when Fs cap is issued · 2af54a72

由 Yan, Zheng 提交于 4月 27, 2018

In MDS, file/subdir counts of a directory inode are protected by
filelock. In request reply without Fs cap, nfiles/nsubdirs can be
stale.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2af54a72

ceph: always get rstat from auth mds · 49a9f4f6

由 Yan, Zheng 提交于 4月 25, 2018

rstat is not tracked by capability. client can't know if rstat from
non-auth mds is uptodate or not.

Link: http://tracker.ceph.com/issues/23538Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

49a9f4f6

ceph: use bit flags to define vxattr attributes · 4e9906e7

由 Yan, Zheng 提交于 4月 25, 2018

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4e9906e7

10 5月, 2018 2 次提交

ceph: fix iov_iter issues in ceph_direct_read_write() · fc218544

由 Ilya Dryomov 提交于 5月 04, 2018

dio_get_pagev_size() and dio_get_pages_alloc() introduced in commit
b5b98989 ("ceph: combine as many iovec as possile into one OSD
request") assume that the passed iov_iter is ITER_IOVEC.  This isn't
the case with splice where it ends up poking into the guts of ITER_BVEC
or ITER_PIPE iterators, causing lockups and crashes easily reproduced
with generic/095.

Rather than trying to figure out gap alignment and stuff pages into
a page vector, add a helper for going from iov_iter to a bio_vec array
and make use of the new CEPH_OSD_DATA_TYPE_BVECS code.

Fixes: b5b98989 ("ceph: combine as many iovec as possile into one OSD request")
Link: http://tracker.ceph.com/issues/18130Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Tested-by: NLuis Henriques <lhenriques@suse.com>

fc218544

ceph: fix rsize/wsize capping in ceph_direct_read_write() · 3a15b38f

由 Ilya Dryomov 提交于 5月 03, 2018

rsize/wsize cap should be applied before ceph_osdc_new_request() is
called.  Otherwise, if the size is limited by the cap instead of the
stripe unit, ceph_osdc_new_request() would setup an extent op that is
bigger than what dio_get_pages_alloc() would pin and add to the page
vector, triggering asserts in the messenger.

Cc: stable@vger.kernel.org
Fixes: 95cca2b4 ("ceph: limit osd write size")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

3a15b38f

23 4月, 2018 1 次提交

ceph: check if mds create snaprealm when setting quota · f1919826

由 Yan, Zheng 提交于 4月 08, 2018

If mds does not, return -EOPNOTSUPP.

Link: http://tracker.ceph.com/issues/23491Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f1919826

16 4月, 2018 1 次提交

ceph: always update atime/mtime/ctime for new inode · ffdeec7a

由 Yan, Zheng 提交于 3月 26, 2018

For new inode, atime/mtime/ctime are uninitialized.  Don't compare
against them.

Cc: stable@kernel.org
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ffdeec7a

06 4月, 2018 1 次提交

fscache: Pass object size in rather than calling back for it · ee1235a9

由 David Howells 提交于 4月 04, 2018

Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received.  This makes it easier to update the size of the object
when a new page is written that extends the object.

The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAnna Schumaker <anna.schumaker@netapp.com>
Tested-by: NSteve Dickson <steved@redhat.com>

ee1235a9

04 4月, 2018 1 次提交

fscache: Attach the index key and aux data to the cookie · 402cb8dd

由 David Howells 提交于 4月 04, 2018

Attach copies of the index key and auxiliary data to the fscache cookie so
that:

 (1) The callbacks to the netfs for this stuff can be eliminated.  This
     can simplify things in the cache as the information is still
     available, even after the cache has relinquished the cookie.

 (2) Simplifies the locking requirements of accessing the information as we
     don't have to worry about the netfs object going away on us.

 (3) The cache can do lazy updating of the coherency information on disk.
     As long as the cache is flushed before reboot/poweroff, there's no
     need to update the coherency info on disk every time it changes.

 (4) Cookies can be hashed or put in a tree as the index key is easily
     available.  This allows:

     (a) Checks for duplicate cookies can be made at the top fscache layer
     	 rather than down in the bowels of the cache backend.

     (b) Caching can be added to a netfs object that has a cookie if the
     	 cache is brought online after the netfs object is allocated.

A certain amount of space is made in the cookie for inline copies of the
data, but if it won't fit there, extra memory will be allocated for it.

The downside of this is that live cache operation requires more memory.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAnna Schumaker <anna.schumaker@netapp.com>
Tested-by: NSteve Dickson <steved@redhat.com>

402cb8dd

02 4月, 2018 23 次提交

ceph: quota: report root dir quota usage in statfs · 9122eed5

由 Luis Henriques 提交于 1月 31, 2018

This commit changes statfs default behaviour when reporting usage
statistics.  Instead of using the overall filesystem usage, statfs now
reports the quota for the filesystem root, if ceph.quota.max_bytes has
been set for this inode.  If quota hasn't been set, it falls back to the
old statfs behaviour.

A new mount option is also added ('noquotadf') to disable this behaviour.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9122eed5

ceph: quota: add counter for snaprealms with quota · d557c48d

由 Luis Henriques 提交于 1月 12, 2018

By keeping a counter with the number of snaprealms that have quota set
allows to optimize the functions that need to walk throught the realms
hierarchy looking for quotas.  Thus, if this counter is zero it's safe to
assume that there are no realms with quota.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d557c48d

ceph: quota: cache inode pointer in ceph_snap_realm · e3161f17

由 Luis Henriques 提交于 1月 12, 2018

Keep a pointer to the inode in struct ceph_snap_realm.  This allows to
optimize functions that walk the realms hierarchy (e.g. in quotas).
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e3161f17

ceph: fix root quota realm check · 0eb6bbe4

由 Yan, Zheng 提交于 1月 12, 2018

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0eb6bbe4

ceph: don't check quota for snap inode · 25963669

由 Yan, Zheng 提交于 1月 12, 2018

snap inode's i_snap_realm is not pointing to ceph_snap_realm.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

25963669

ceph: quota: update MDS when max_bytes is approaching · 1ab302a0

由 Luis Henriques 提交于 1月 05, 2018

When we're reaching the ceph.quota.max_bytes limit, i.e., when writing
more than 1/16th of the space left in a quota realm, update the MDS with
the new file size.

This mirrors the fuse-client approach with commit 122c50315ed1 ("client:
Inform mds file size when approaching quota limit"), in the ceph git tree.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1ab302a0

ceph: quota: support for ceph.quota.max_bytes · 2b83845f

由 Luis Henriques 提交于 1月 05, 2018

Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2b83845f

ceph: quota: don't allow cross-quota renames · cafe21a4

由 Luis Henriques 提交于 1月 05, 2018

This patch changes ceph_rename so that -EXDEV is returned if an attempt is
made to mv a file between two different dir trees with different quotas
setup.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

cafe21a4

ceph: quota: support for ceph.quota.max_files · b7a29217

由 Luis Henriques 提交于 1月 05, 2018

This patch adds support for the max_files quota.  It hooks into all the
ceph functions that add new filesystem objects that need to be checked
against the quota limits.  When these limits are hit, -EDQUOT is returned.

Note that we're not checking quotas on ceph_link().  ceph_link doesn't
really create a new inode,  and since the MDS doesn't update the directory
statistics when a new (hard) link is created (only with symlinks), they
are not accounted as a new file.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b7a29217

ceph: quota: add initial infrastructure to support cephfs quotas · fb18a575

由 Luis Henriques 提交于 1月 05, 2018

This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Link: http://tracker.ceph.com/issues/22372Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fb18a575

Y
ceph: rename function drop_leases() to a more descriptive name · 7aac453a
由 Yan, Zheng 提交于 3月 13, 2018
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
7aac453a

ceph: fix invalid point dereference for error case in mdsc destroy · 50c55aec

由 Chengguang Xu 提交于 3月 14, 2018

1. set fsc->mdsc after successfully allocate all necessary memory
in mdsc init.
2. if fsc->mdsc is NULL, just skip destroy operation in mdsc destroy.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

50c55aec

ceph: return proper bool type to caller instead of pointer · 98cfda81

由 Chengguang Xu 提交于 3月 13, 2018

Change to return true/false only for bool type return code.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

98cfda81

ceph: optimize memory usage · bb48bd4d

由 Chengguang Xu 提交于 3月 13, 2018

In current code, regular file and directory use same struct
ceph_file_info to store fs specific data so the struct has to
include some fields which are only used for directory
(e.g., readdir related info), when having plenty of regular files,
it will lead to memory waste.

This patch introduces dedicated ceph_dir_file_info cache for
readdir related thins. So that regular file does not include those
unused fields anymore.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bb48bd4d

ceph: optimize mds session register · 47474d0b

由 Chengguang Xu 提交于 3月 13, 2018

Do memory allocation first, so that avoid unnecessary
initialization of newly allocated session in error case.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

47474d0b

libceph, ceph: add __init attribution to init funcitons · 57a35dfb

由 Chengguang Xu 提交于 3月 10, 2018

Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

57a35dfb

ceph: filter out used flags when printing unused open flags · 51b10f3f

由 Chengguang Xu 提交于 3月 09, 2018

Filter out used access mode flags when printing unused open flags.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

51b10f3f

ceph: don't wait on writeback when there is no more dirty pages · 1582af2e

由 Yan, Zheng 提交于 3月 06, 2018

In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.

If there is no more dirty pages, writepages() should not wait on
writeback. Otherwise, dirty page writeback becomes very slow.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1582af2e

ceph: invalidate pages that beyond EOF in ceph_writepages_start() · af9cc401

由 Yan, Zheng 提交于 3月 04, 2018

Dirty pages can be associated with different capsnap. Different capsnap
may have different EOF value. So invalidating dirty pages according to
the largest EOF value is wrong. Dirty pages beyond EOF, but associated
with other capsnap, do not get invalidated.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

af9cc401

ceph: mark the cap cache as unreclaimable · bc4b5ad3

由 Chengguang Xu 提交于 2月 27, 2018

Releasing cap is affected by many factors (e.g., avail_count/reserve_count/min_count)
and min_count could be specified high volume in client mount option. Hence it's better
to mark cap cache as unreclaimable in case of non-trivial discrepancies between memory
shown as reclaimable and what is actually reclaimed.
Signed-off-by: NChengguang Xu <cgxu519@icloud.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bc4b5ad3

ceph: change variable name to follow common rule · 73737682

由 Chengguang Xu 提交于 2月 28, 2018

Variable name ci is mostly used for ceph_inode_info.
Variable name fi is mostly used for ceph_file_info.
Variable name cf is mostly used for ceph_cap_flush.

Change variable name to follow above common rules
in case of confusing.
Signed-off-by: NChengguang Xu <cgxu519@icloud.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

73737682

ceph: optimizing cap reservation · 79cd674a

由 Chengguang Xu 提交于 2月 24, 2018

When caps_avail_count is in a low level, most newly
trimmed caps will probably go into ->caps_list and
caps_avail_count will be increased. Hence after trimming,
should recheck caps_avail_count to effectly reuse
newly trimmed caps. Also, when releasing unnecessary
caps follow the same rule of ceph_put_cap.
Signed-off-by: NChengguang Xu <cgxu519@icloud.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

79cd674a

ceph: release unreserved caps if having enough available caps · b517c1d8

由 Chengguang Xu 提交于 2月 25, 2018

When unreserving caps check if there is too mamy available caps
in the ->caps_list, if so release unreserved caps.
Signed-off-by: NChengguang Xu <cgxu519@icloud.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b517c1d8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功