提交 · cdb897e3279ad1677138d6bdf1cfaf1393718a08 · openeuler / raspberrypi-kernel

07 9月, 2017 35 次提交

fscache: remove unused ->now_uncached callback · 26b433d0

由 Jan Kara 提交于 9月 06, 2017

Patch series "Ranged pagevec lookup", v2.

In this series I make pagevec_lookup() update the index (to be
consistent with pagevec_lookup_tag() and also as a preparation for
ranged lookups), provide ranged variant of pagevec_lookup() and use it
in places where it makes sense.  This not only removes some common code
but is also a measurable performance win for some use cases (see patch
4/10) where radix tree is sparse and searching & grabing of a page after
the end of the range has measurable overhead.

This patch (of 10):

The callback doesn't ever get called.  Remove it.

Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26b433d0

ceph: stop on-going cached readdir if mds revokes FILE_SHARED cap · 15b51bd6

由 Yan, Zheng 提交于 9月 06, 2017

If directory's FILE_SHARED cap get revoked, dentry in the directory
can get spliced into other directory (Eg, other client move the
dentry into directory B, then we do readdir on directory B). So we
should stop on-going cached readdir. this can be achieved by marking
dir not complete, because __dcache_readdir() checks dir completeness
before emitting each dentry.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

15b51bd6

ceph: wait on writeback after writing snapshot data · f275635e

由 Yan, Zheng 提交于 9月 01, 2017

In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.

Without this wait, writepages() keeps looking up dirty pages, but
the found dirty pages are not writeable. It wastes CPU time.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f275635e

ceph: fix capsnap dirty pages accounting · 7e1ee54a

由 Yan, Zheng 提交于 9月 03, 2017

writepages_finish() calls ceph_put_wrbuffer_cap_refs() once for
all pages, parameter snapc is set to req->r_snapc. So writepages()
shouldn't write dirty pages associated with different snapc in
one OSD request.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7e1ee54a

ceph: ignore wbc->range_{start,end} when write back snapshot data · 2a2d927e

由 Yan, Zheng 提交于 9月 01, 2017

writepages() needs to write dirty pages to OSD in strict order of
snapshot context. It must first write dirty pages associated with
the oldest snapshot context. In the write range case, dirty pages
in the specified range can be associated with newer snapc. They
are not writeable until we write all dirty pages associated with
the oldest snapc.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2a2d927e

ceph: fix "range cyclic" mode writepages · 590e9d98

由 Yan, Zheng 提交于 9月 03, 2017

In range cyclic mode, writepages() should first write dirty pages
in range [writeback_index, (pgoff_t)-1], then write pages in range
[0, writeback_index -1]. Besides, if writepages() encounters a page
that beyond EOF, it should restart from the beginning.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

590e9d98

ceph: cleanup local variables in ceph_writepages_start() · 0e5ecac7

由 Yan, Zheng 提交于 8月 31, 2017

Remove two variables and define variables of same type together.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e5ecac7

ceph: optimize pagevec iterating in ceph_writepages_start() · 0713e5f2

由 Yan, Zheng 提交于 8月 31, 2017

ceph_writepages_start() supports writing non-continuous pages.
If it encounters a non-dirty or non-writeable page in pagevec,
it can continue to check the rest pages in pagevec.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0713e5f2

ceph: make writepage_nounlock() invalidate page that beyonds EOF · 05455e11

由 Yan, Zheng 提交于 9月 02, 2017

Otherwise, the page left in state that page is associated with a
snapc, but (PageDirty(page) || PageWriteback(page)) is false.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

05455e11

ceph: properly get capsnap's size in get_oldest_context() · 1f934b00

由 Yan, Zheng 提交于 8月 30, 2017

capsnap's size is set by __ceph_finish_cap_snap(). If capsnap is under
writing, its size is zero. In this case, get_oldest_context() should
read i_size. Besides, ceph_writepages_start() should re-check capsnap's
size after dirty pages get locked.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1f934b00

ceph: remove stale check in ceph_invalidatepage() · b072d774

由 Yan, Zheng 提交于 8月 30, 2017

Both set_page_dirty and truncate_complete_page should be called
for locked page, they can't race with each other.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b072d774

ceph: queue cap snap only when snap realm's context changes · 3ae0bebc

由 Yan, Zheng 提交于 8月 28, 2017

If we create capsnap when snap realm's context does not change, the
new capsnap's snapc is equal to ci->i_head_snapc. Page writeback code
can't differentiates dirty pages associated with the new capsnap from
dirty pages associated with i_head_snapc.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3ae0bebc

ceph: handle race between vmtruncate and queuing cap snap · c8fd0d37

由 Yan, Zheng 提交于 8月 28, 2017

It's possible that we create a cap snap while there is pending
vmtruncate (truncate hasn't been processed by worker thread).
We should truncate dirty pages beyond capsnap->size in that case.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c8fd0d37

ceph: fix message order check in handle_cap_export() · fa0aa3b8

由 Yan, Zheng 提交于 8月 28, 2017

If caps for importer mds exists, but cap id mismatch, client should
have received corresponding import message. Because cap ID does not
change as long as client holds the caps.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fa0aa3b8

Y
ceph: fix NULL pointer dereference in ceph_flush_snaps() · c858a070
由 Yan, Zheng 提交于 8月 28, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
c858a070

ceph: adjust 36 checks for NULL pointers · d37b1d99

由 Markus Elfring 提交于 8月 20, 2017

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written ...

Thus fix the affected source code places.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d37b1d99

ceph: delete an unnecessary return statement in update_dentry_lease() · b529d1b3

由 Markus Elfring 提交于 8月 20, 2017

The script "checkpatch.pl" pointed information out like the following.

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected function.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b529d1b3

ceph: ENOMEM pr_err in __get_or_create_frag() is redundant · 51308806

由 Markus Elfring 提交于 8月 20, 2017

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

51308806

ceph: check negative offsets in ceph_llseek() · 397f2389

由 Luis Henriques 提交于 7月 28, 2017

When a user requests SEEK_HOLE or SEEK_DATA with a negative offset
ceph_llseek should return -ENXIO.  Currently -EINVAL is being returned for
SEEK_DATA and 0 for SEEK_HOLE.
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

397f2389

ceph: more accurate statfs · 06d74376

由 Douglas Fuller 提交于 8月 16, 2017

Improve accuracy of statfs reporting for Ceph filesystems comprising
exactly one data pool. In this case, the Ceph monitor can now report
the space usage for the single data pool instead of the global data
for the entire Ceph cluster. Include support for this message in
mon_client and leverage it in ceph/super.
Signed-off-by: NDouglas Fuller <dfuller@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

06d74376

Y
ceph: properly set snap follows for cap reconnect · 92776fd2
由 Yan, Zheng 提交于 8月 16, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
92776fd2

ceph: don't use CEPH_OSD_FLAG_ORDERSNAP · b178cf43

由 Yan, Zheng 提交于 8月 16, 2017

Inode can be moved between snap realms. It's possible inode is moved
into a snap realm whose seq number is smaller than old snap realm's.
So there is no guarantee that seq number inode's snap context always
increases.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b178cf43

ceph: include snapc in debug message of write · 1c0a9c2d

由 Yan, Zheng 提交于 8月 16, 2017

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1c0a9c2d

ceph: make sure flushsnap messages are sent in proper order · 24d063ac

由 Yan, Zheng 提交于 8月 15, 2017

Before sending new flushsnap message, check if there are old
flushsnap messages that need to be re-sent. If there are, re-send
old messages first. This guarantees ordering of flushsnap messages.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

24d063ac

ceph: fix -EOLDSNAPC handling · a5cd74ad

由 Yan, Zheng 提交于 8月 14, 2017

Need to drop cap reference before retry. Besides, it's better to
redo file write checks for each retry because we re-lock inode.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a5cd74ad

ceph: send LSSNAP request to auth mds of directory inode · 5d37ca14

由 Yan, Zheng 提交于 7月 26, 2017

Snapdir inode has no capability. __choose_mds() should choose mds
base on capabilities of snapdir's parent inode.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5d37ca14

Y
ceph: don't fill readdir cache for LSSNAP reply · 8d45b911
由 Yan, Zheng 提交于 7月 26, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
8d45b911

ceph: cleanup ceph_readdir_prepopulate() · 9a86962b

由 Yan, Zheng 提交于 7月 26, 2017

In LSSNAP case, req->r_dentry is already set to snapdir dentry.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9a86962b

ceph: use errseq_t for writeback error reporting · b74fceae

由 Jeff Layton 提交于 7月 25, 2017

Ensure that when writeback errors are marked that we report those to all
file descriptions that were open at the time of the error.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b74fceae

ceph: new cap message flags indicate if there is pending capsnap · 95569713

由 Yan, Zheng 提交于 7月 24, 2017

These flags tell mds if there is pending capsnap explicitly.
Without this explicit notification, mds can only conclude if
client has pending capsnap. The method mds use is inefficient
and error-prone.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

95569713

ceph: nuke startsync op · 3fb99d48

由 Yanhu Cao 提交于 7月 21, 2017

startsync is a no-op, has been for years.  Remove it.

Link: http://tracker.ceph.com/issues/20604Signed-off-by: NYanhu Cao <gmayyyha@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3fb99d48

Y
ceph: validate correctness of some mount options · 4214fb15
由 Yan, Zheng 提交于 7月 11, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
4214fb15

ceph: limit osd write size · 95cca2b4

由 Yan, Zheng 提交于 7月 11, 2017

OSD has a configurable limitation of max write size. OSD return
error if write request size is larger than the limitation. For now,
set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small
enough.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

95cca2b4

ceph: limit osd read size to CEPH_MSG_MAX_DATA_LEN · aa187926

由 Yan, Zheng 提交于 7月 11, 2017

libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN.

Link: http://tracker.ceph.com/issues/20528Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

aa187926

Y
ceph: remove unused cap_release_safety mount option · 2ae409dc
由 Yan, Zheng 提交于 7月 11, 2017
```
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
2ae409dc

01 9月, 2017 1 次提交

ceph: fix readpage from fscache · dd2bc473

由 Yan, Zheng 提交于 8月 04, 2017

ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()

Cc: stable@vger.kernel.org # 4.1+, needs backporting for < 4.7
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dd2bc473

17 7月, 2017 1 次提交

ceph: fix race in concurrent readdir · 84583cfb

由 Yan, Zheng 提交于 7月 06, 2017

For a large directory, program needs to issue multiple readdir
syscalls to get all dentries. When there are multiple programs
read the directory concurrently. Following sequence of events
can happen.

 - program calls readdir with pos = 2. ceph sends readdir request
   to mds. The reply contains N1 entries. ceph adds these N1 entries
   to readdir cache.
 - program calls readdir with pos = N1+2. The readdir is satisfied
   by the readdir cache, N2 entries are returned. (Other program
   calls readdir in the middle, which fills the cache)
 - program calls readdir with pos = N1+N2+2. ceph sends readdir
   request to mds. The reply contains N3 entries and it reaches
   directory end. ceph adds these N3 entries to the readdir cache
   and marks directory complete.

The second readdir call does not update fi->readdir_cache_idx.
ceph add the last N3 entries to wrong places.

Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

84583cfb

16 7月, 2017 1 次提交

fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks · 9d5b86ac

由 Benjamin Coddington 提交于 7月 16, 2017

Since commit c69899a1 "NFSv4: Update of VFS byte range lock must be
atomic with the stateid update", NFSv4 has been inserting locks in rpciod
worker context. The result is that the file_lock's fl_nspid is the
kworker's pid instead of the original userspace pid.

The fl_nspid is only used to represent the namespaced virtual pid number
when displaying locks or returning from F_GETLK. There's no reason to set
it for every inserted lock, since we can usually just look it up from
fl_pid. So, instead of looking up and holding struct pid for every lock,
let's just look up the virtual pid number from fl_pid when it is needed.
That means we can remove fl_nspid entirely.

The translaton and presentation of fl_pid should handle the following four
cases:

1 - F_GETLK on a remote file with a remote lock:
In this case, the filesystem should determine the l_pid to return here.
Filesystems should indicate that the fl_pid represents a non-local pid
value that should not be translated by returning an fl_pid <= 0.

2 - F_GETLK on a local file with a remote lock:
This should be the l_pid of the lock manager process, and translated.

3 - F_GETLK on a remote file with a local lock, and
4 - F_GETLK on a local file with a local lock:
These should be the translated l_pid of the local locking process.

Fuse was already doing the correct thing by translating the pid into the
caller's namespace. With this change we must update fuse to translate
to init's pid namespace, so that the locks API can then translate from
init's pid namespace into the pid namespace of the caller.

With this change, the locks API will expect that if a filesystem returns
a remote pid as opposed to a local pid for F_GETLK, that remote pid will
be <= 0. This signifies that the pid is remote, and the locks API will
forego translating that pid into the pid namespace of the local calling
process.

Finally, we convert remote filesystems to present remote pids using
negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
pid returned for F_GETLK lock requests.

Since local pids will never be larger than PID_MAX_LIMIT (which is
currently defined as <= 4 million), but pid_t is an unsigned int, we
should have plenty of room to represent remote pids with negative
numbers if we assume that remote pid numbers are similarly limited.

If this is not the case, then we run the risk of having a remote pid
returned for which there is also a corresponding local pid. This is a
problem we have now, but this patch should reduce the chances of that
occurring, while also returning those remote pid numbers, for whatever
that may be worth.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

9d5b86ac

07 7月, 2017 2 次提交

ceph: update ceph_dentry_info::lease_session when necessary · 481f001f

由 Yan, Zheng 提交于 7月 03, 2017

Current code does not update ceph_dentry_info::lease_session once
it is set. If auth mds of corresponding dentry changes, dentry lease
keeps in an invalid state.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

481f001f

ceph: new mount option that specifies fscache uniquifier · 1d8f8360

由 Yan, Zheng 提交于 6月 27, 2017

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1d8f8360