提交 · cdb897e3279ad1677138d6bdf1cfaf1393718a08 · openeuler / raspberrypi-kernel

07 9月, 2017 14 次提交

ceph: wait on writeback after writing snapshot data · f275635e

由 Yan, Zheng 提交于 9月 01, 2017

In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.

Without this wait, writepages() keeps looking up dirty pages, but
the found dirty pages are not writeable. It wastes CPU time.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f275635e

ceph: fix capsnap dirty pages accounting · 7e1ee54a

由 Yan, Zheng 提交于 9月 03, 2017

writepages_finish() calls ceph_put_wrbuffer_cap_refs() once for
all pages, parameter snapc is set to req->r_snapc. So writepages()
shouldn't write dirty pages associated with different snapc in
one OSD request.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7e1ee54a

ceph: ignore wbc->range_{start,end} when write back snapshot data · 2a2d927e

由 Yan, Zheng 提交于 9月 01, 2017

writepages() needs to write dirty pages to OSD in strict order of
snapshot context. It must first write dirty pages associated with
the oldest snapshot context. In the write range case, dirty pages
in the specified range can be associated with newer snapc. They
are not writeable until we write all dirty pages associated with
the oldest snapc.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2a2d927e

ceph: fix "range cyclic" mode writepages · 590e9d98

由 Yan, Zheng 提交于 9月 03, 2017

In range cyclic mode, writepages() should first write dirty pages
in range [writeback_index, (pgoff_t)-1], then write pages in range
[0, writeback_index -1]. Besides, if writepages() encounters a page
that beyond EOF, it should restart from the beginning.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

590e9d98

ceph: cleanup local variables in ceph_writepages_start() · 0e5ecac7

由 Yan, Zheng 提交于 8月 31, 2017

Remove two variables and define variables of same type together.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e5ecac7

ceph: optimize pagevec iterating in ceph_writepages_start() · 0713e5f2

由 Yan, Zheng 提交于 8月 31, 2017

ceph_writepages_start() supports writing non-continuous pages.
If it encounters a non-dirty or non-writeable page in pagevec,
it can continue to check the rest pages in pagevec.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0713e5f2

ceph: make writepage_nounlock() invalidate page that beyonds EOF · 05455e11

由 Yan, Zheng 提交于 9月 02, 2017

Otherwise, the page left in state that page is associated with a
snapc, but (PageDirty(page) || PageWriteback(page)) is false.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

05455e11

ceph: properly get capsnap's size in get_oldest_context() · 1f934b00

由 Yan, Zheng 提交于 8月 30, 2017

capsnap's size is set by __ceph_finish_cap_snap(). If capsnap is under
writing, its size is zero. In this case, get_oldest_context() should
read i_size. Besides, ceph_writepages_start() should re-check capsnap's
size after dirty pages get locked.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1f934b00

ceph: remove stale check in ceph_invalidatepage() · b072d774

由 Yan, Zheng 提交于 8月 30, 2017

Both set_page_dirty and truncate_complete_page should be called
for locked page, they can't race with each other.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b072d774

ceph: adjust 36 checks for NULL pointers · d37b1d99

由 Markus Elfring 提交于 8月 20, 2017

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written ...

Thus fix the affected source code places.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d37b1d99

ceph: include snapc in debug message of write · 1c0a9c2d

由 Yan, Zheng 提交于 8月 16, 2017

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1c0a9c2d

ceph: nuke startsync op · 3fb99d48

由 Yanhu Cao 提交于 7月 21, 2017

startsync is a no-op, has been for years.  Remove it.

Link: http://tracker.ceph.com/issues/20604Signed-off-by: NYanhu Cao <gmayyyha@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3fb99d48

ceph: limit osd write size · 95cca2b4

由 Yan, Zheng 提交于 7月 11, 2017

OSD has a configurable limitation of max write size. OSD return
error if write request size is larger than the limitation. For now,
set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small
enough.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

95cca2b4

ceph: limit osd read size to CEPH_MSG_MAX_DATA_LEN · aa187926

由 Yan, Zheng 提交于 7月 11, 2017

libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN.

Link: http://tracker.ceph.com/issues/20528Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

aa187926

01 9月, 2017 1 次提交

ceph: fix readpage from fscache · dd2bc473

由 Yan, Zheng 提交于 8月 04, 2017

ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()

Cc: stable@vger.kernel.org # 4.1+, needs backporting for < 4.7
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dd2bc473

07 7月, 2017 4 次提交

ceph: cleanup writepage_nounlock() · 43986881

由 Yan, Zheng 提交于 5月 23, 2017

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

43986881

ceph: redirty page when writepage_nounlock() skips unwritable page · fa71fefb

由 Yan, Zheng 提交于 5月 23, 2017

Ceph needs to flush dirty page in the order in which in which snap
context they belong to. Dirty pages belong to older snap context
should be flushed earlier. if writepage_nounlock() can not flush a
page, it should redirty the page.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fa71fefb

ceph: remove useless page->mapping check in writepage_nounlock() · f2b0c45f

由 Yan, Zheng 提交于 5月 23, 2017

Callers of writepage_nounlock() have already ensured non-null
page->mapping.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f2b0c45f

ceph: update the 'approaching max_size' code · efb0ca76

由 Yan, Zheng 提交于 5月 22, 2017

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

efb0ca76

04 5月, 2017 3 次提交

ceph: when seeing write errors on an inode, switch to sync writes · 26544c62

由 Jeff Layton 提交于 4月 04, 2017

Currently, we don't have a real feedback mechanism in place for when we
start seeing buffered writeback errors. If writeback is failing, there
is nothing that prevents an application from continuing to dirty pages
that aren't being cleaned.

In the event that we're seeing write errors of any sort occur on an
inode, have the callback set a flag to force further writes to be
synchronous. When the next write succeeds, clear the flag to allow
buffered writeback to continue.

Since this is just a hint to the write submission mechanism, we only
take the i_ceph_lock when a lockless check shows that the flag needs to
be changed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng” <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

26544c62

Revert "ceph: SetPageError() for writeback pages if writepages fails" · 6fc1fe5e

由 Jeff Layton 提交于 4月 04, 2017

This reverts commit b109eec6.

If I'm filling up a filesystem with this sort of command:

    $ dd if=/dev/urandom of=/mnt/cephfs/fillfile bs=2M oflag=sync

...then I'll eventually get back EIO on a write. Further calls
will give us ENOSPC.

I'm not sure what prompted this change, but I don't think it's what we
want to do. If writepages failed, we will have already set the mapping
error appropriately, and that's what gets reported by fsync() or
close().

__filemap_fdatawait_range however, does this:

	wait_on_page_writeback(page);
	if (TestClearPageError(page))
		ret = -EIO;

...and that -EIO ends up trumping the mapping's error if one exists.

When writepages fails, we only want to set the error in the mapping,
and not flag the individual pages.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng” <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fc1fe5e

libceph: allow requests to return immediately on full conditions if caller wishes · a1f4020a

由 Jeff Layton 提交于 4月 04, 2017

Usually, when the osd map is flagged as full or the pool is at quota,
write requests just hang. This is not what we want for cephfs, where
it would be better to simply report -ENOSPC back to userland instead
of stalling.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior.

Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller),
and on any other write request from ceph.ko.

A later patch will deal with requests that were submitted before the new
map showing the full condition came in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a1f4020a

21 4月, 2017 1 次提交

ceph: Convert to separately allocated bdi · 09dc9fc2

由 Jan Kara 提交于 4月 12, 2017

Allocate struct backing_dev_info separately instead of embedding it
inside client structure. This unifies handling of bdi among users.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: "Yan, Zheng" <zyan@redhat.com>
CC: Sage Weil <sage@redhat.com>
CC: ceph-devel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

09dc9fc2

02 3月, 2017 1 次提交

sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency · f361bf4a

由 Ingo Molnar 提交于 2月 03, 2017

Instead of including the full <linux/signal.h>, we are going to include the
types-only <linux/signal_types.h> header in <linux/sched.h>, to further
decouple the scheduler header from the signal headers.

This means that various files which relied on the full <linux/signal.h> need
to be updated to gain an explicit dependency on it.

Update the code that relies on sched.h's inclusion of the <linux/signal.h> header.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f361bf4a

28 2月, 2017 1 次提交

fs: add i_blocksize() · 93407472

由 Fabian Frederick 提交于 2月 27, 2017

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93407472

25 2月, 2017 3 次提交

mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf · 11bac800

由 Dave Jiang 提交于 2月 24, 2017

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11bac800

libceph, rbd, ceph: WRITE | ONDISK -> WRITE · 54ea0046

由 Ilya Dryomov 提交于 2月 11, 2017

CEPH_OSD_FLAG_ONDISK is set in account_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

54ea0046

ceph: remove special ack vs commit behavior · 55f2a045

由 Ilya Dryomov 提交于 2月 13, 2017

- ask for a commit reply instead of an ack reply in
  __ceph_pool_perm_get()
- don't ask for both ack and commit replies in ceph_sync_write()
- since just only one reply is requested now, i_unsafe_writes list
  will always be empty -- kill ceph_sync_write_wait() and go back to
  a standard ->evict_inode()
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

55f2a045

20 2月, 2017 2 次提交

ceph: update readpages osd request according to size of pages · d641df81

由 Yan, Zheng 提交于 1月 19, 2017

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

d641df81

ceph: cleanup ACCESS_ONCE -> READ_ONCE · 52953d55

由 Seraphime Kirkovski 提交于 12月 26, 2016

This removes the uses of ACCESS_ONCE in favor of READ_ONCE
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

52953d55

13 1月, 2017 1 次提交

ceph: fix get_oldest_context() · 84fcc2d2

由 Geng, Jichao 提交于 1月 05, 2017

For no snapshot case, we should use ci->truncate_{seq,size}.

Fixes: 5f743e45 ("ceph: record truncate size/seq for snap data writeback")
Signed-off-by: NGeng, Jichao <geng.jichao@h3c.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

84fcc2d2

15 12月, 2016 1 次提交

ceph: avoid creating orphan object when checking pool permission · 80e80fbb

由 Yan, Zheng 提交于 12月 13, 2016

Pool permission check needs to write to the first object. But for
snapshot, head of the first object may have already been deleted.
Skip the check for snapshot inode to avoid creating orphan object.

Link: http://tracker.ceph.com/issues/18211Signed-off-by: NYan, Zheng <zyan@redhat.com>

80e80fbb

13 12月, 2016 2 次提交

ceph: record truncate size/seq for snap data writeback · 5f743e45

由 Yan, Zheng 提交于 11月 15, 2016

Dirty snapshot data needs to be flushed unconditionally. If they
were created before truncation, writeback should use old truncate
size/seq.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5f743e45

ceph: try getting buffer capability for readahead/fadvise · 2b1ac852

由 Yan, Zheng 提交于 10月 25, 2016

For readahead/fadvise cases, caller of ceph_readpages does not
hold buffer capability. Pages can be added to page cache while
there is no buffer capability. This can cause data integrity
issue.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

2b1ac852

11 12月, 2016 1 次提交

fix ceph_write_end() · b9de313c

由 Al Viro 提交于 9月 05, 2016

don't zero on short copies; if the page was uptodate it's just plain
wrong, and if it wasn't we'll be better off just returning 0 and
buggering off.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b9de313c

03 10月, 2016 2 次提交

ceph: remove warning when ceph_releasepage() is called on dirty page · e55f1a18

由 NeilBrown 提交于 8月 31, 2016

If O_DIRECT writes are racing with buffered writes, then
the call to invalidate_inode_pages2_range() can call ceph_releasepage()
on dirty pages.

Most filesystems hold inode_lock() across O_DIRECT writes so they do not
suffer this race, but cephfs deliberately drops the lock, and opens a window
for the race.

This race can be triggered with the generic/036 test from the xfstests
test suite.  It doesn't happen every time, but it does happen often.

As the possibilty is expected, remove the warning, and instead include
the PageDirty() status in the debug message.
Signed-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

e55f1a18

ceph: fix error handling of start_read() · 1afe4785

由 Yan, Zheng 提交于 8月 24, 2016

If start_page() fails to add a page to page cache or fails to send
OSD request. It should cal put_page() (instead of free_page()) for
relevant pages.

Besides, start_page() need to cancel fscache readpage if it fails
to send OSD request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reported-by: NZhi Zhang <zhang.david2011@gmail.com>

1afe4785

28 7月, 2016 2 次提交

ceph: rados pool namespace support · 779fe0fb

由 Yan, Zheng 提交于 3月 07, 2016

This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

779fe0fb

libceph: define new ceph_file_layout structure · 7627151e

由 Yan, Zheng 提交于 2月 03, 2016

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7627151e

01 6月, 2016 1 次提交

ceph: disable fscache when inode is opened for write · 46b59b2b

由 Yan, Zheng 提交于 5月 18, 2016

All other filesystems do not add dirty pages to fscache. They all
disable fscache when inode is opened for write. Only ceph adds
dirty pages to fscache, but the code is buggy.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

46b59b2b