提交 · 439868812aac01ec5d1b133a51e768280f3fc8d5 · openeuler / Kernel

07 7月, 2017 4 次提交

ceph: cleanup writepage_nounlock() · 43986881

由 Yan, Zheng 提交于 5月 23, 2017

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

43986881

ceph: redirty page when writepage_nounlock() skips unwritable page · fa71fefb

由 Yan, Zheng 提交于 5月 23, 2017

Ceph needs to flush dirty page in the order in which in which snap
context they belong to. Dirty pages belong to older snap context
should be flushed earlier. if writepage_nounlock() can not flush a
page, it should redirty the page.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fa71fefb

ceph: remove useless page->mapping check in writepage_nounlock() · f2b0c45f

由 Yan, Zheng 提交于 5月 23, 2017

Callers of writepage_nounlock() have already ensured non-null
page->mapping.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f2b0c45f

ceph: update the 'approaching max_size' code · efb0ca76

由 Yan, Zheng 提交于 5月 22, 2017

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

efb0ca76

04 5月, 2017 3 次提交

ceph: when seeing write errors on an inode, switch to sync writes · 26544c62

由 Jeff Layton 提交于 4月 04, 2017

Currently, we don't have a real feedback mechanism in place for when we
start seeing buffered writeback errors. If writeback is failing, there
is nothing that prevents an application from continuing to dirty pages
that aren't being cleaned.

In the event that we're seeing write errors of any sort occur on an
inode, have the callback set a flag to force further writes to be
synchronous. When the next write succeeds, clear the flag to allow
buffered writeback to continue.

Since this is just a hint to the write submission mechanism, we only
take the i_ceph_lock when a lockless check shows that the flag needs to
be changed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng” <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

26544c62

Revert "ceph: SetPageError() for writeback pages if writepages fails" · 6fc1fe5e

由 Jeff Layton 提交于 4月 04, 2017

This reverts commit b109eec6.

If I'm filling up a filesystem with this sort of command:

    $ dd if=/dev/urandom of=/mnt/cephfs/fillfile bs=2M oflag=sync

...then I'll eventually get back EIO on a write. Further calls
will give us ENOSPC.

I'm not sure what prompted this change, but I don't think it's what we
want to do. If writepages failed, we will have already set the mapping
error appropriately, and that's what gets reported by fsync() or
close().

__filemap_fdatawait_range however, does this:

	wait_on_page_writeback(page);
	if (TestClearPageError(page))
		ret = -EIO;

...and that -EIO ends up trumping the mapping's error if one exists.

When writepages fails, we only want to set the error in the mapping,
and not flag the individual pages.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng” <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fc1fe5e

libceph: allow requests to return immediately on full conditions if caller wishes · a1f4020a

由 Jeff Layton 提交于 4月 04, 2017

Usually, when the osd map is flagged as full or the pool is at quota,
write requests just hang. This is not what we want for cephfs, where
it would be better to simply report -ENOSPC back to userland instead
of stalling.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior.

Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller),
and on any other write request from ceph.ko.

A later patch will deal with requests that were submitted before the new
map showing the full condition came in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a1f4020a

21 4月, 2017 1 次提交

ceph: Convert to separately allocated bdi · 09dc9fc2

由 Jan Kara 提交于 4月 12, 2017

Allocate struct backing_dev_info separately instead of embedding it
inside client structure. This unifies handling of bdi among users.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: "Yan, Zheng" <zyan@redhat.com>
CC: Sage Weil <sage@redhat.com>
CC: ceph-devel@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

09dc9fc2

02 3月, 2017 1 次提交

sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency · f361bf4a

由 Ingo Molnar 提交于 2月 03, 2017

Instead of including the full <linux/signal.h>, we are going to include the
types-only <linux/signal_types.h> header in <linux/sched.h>, to further
decouple the scheduler header from the signal headers.

This means that various files which relied on the full <linux/signal.h> need
to be updated to gain an explicit dependency on it.

Update the code that relies on sched.h's inclusion of the <linux/signal.h> header.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f361bf4a

28 2月, 2017 1 次提交

fs: add i_blocksize() · 93407472

由 Fabian Frederick 提交于 2月 27, 2017

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93407472

25 2月, 2017 3 次提交

mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf · 11bac800

由 Dave Jiang 提交于 2月 24, 2017

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11bac800

libceph, rbd, ceph: WRITE | ONDISK -> WRITE · 54ea0046

由 Ilya Dryomov 提交于 2月 11, 2017

CEPH_OSD_FLAG_ONDISK is set in account_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

54ea0046

ceph: remove special ack vs commit behavior · 55f2a045

由 Ilya Dryomov 提交于 2月 13, 2017

- ask for a commit reply instead of an ack reply in
  __ceph_pool_perm_get()
- don't ask for both ack and commit replies in ceph_sync_write()
- since just only one reply is requested now, i_unsafe_writes list
  will always be empty -- kill ceph_sync_write_wait() and go back to
  a standard ->evict_inode()
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

55f2a045

20 2月, 2017 2 次提交

ceph: update readpages osd request according to size of pages · d641df81

由 Yan, Zheng 提交于 1月 19, 2017

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

d641df81

ceph: cleanup ACCESS_ONCE -> READ_ONCE · 52953d55

由 Seraphime Kirkovski 提交于 12月 26, 2016

This removes the uses of ACCESS_ONCE in favor of READ_ONCE
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

52953d55

13 1月, 2017 1 次提交

ceph: fix get_oldest_context() · 84fcc2d2

由 Geng, Jichao 提交于 1月 05, 2017

For no snapshot case, we should use ci->truncate_{seq,size}.

Fixes: 5f743e45 ("ceph: record truncate size/seq for snap data writeback")
Signed-off-by: NGeng, Jichao <geng.jichao@h3c.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

84fcc2d2

15 12月, 2016 1 次提交

ceph: avoid creating orphan object when checking pool permission · 80e80fbb

由 Yan, Zheng 提交于 12月 13, 2016

Pool permission check needs to write to the first object. But for
snapshot, head of the first object may have already been deleted.
Skip the check for snapshot inode to avoid creating orphan object.

Link: http://tracker.ceph.com/issues/18211Signed-off-by: NYan, Zheng <zyan@redhat.com>

80e80fbb

13 12月, 2016 2 次提交

ceph: record truncate size/seq for snap data writeback · 5f743e45

由 Yan, Zheng 提交于 11月 15, 2016

Dirty snapshot data needs to be flushed unconditionally. If they
were created before truncation, writeback should use old truncate
size/seq.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5f743e45

ceph: try getting buffer capability for readahead/fadvise · 2b1ac852

由 Yan, Zheng 提交于 10月 25, 2016

For readahead/fadvise cases, caller of ceph_readpages does not
hold buffer capability. Pages can be added to page cache while
there is no buffer capability. This can cause data integrity
issue.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

2b1ac852

11 12月, 2016 1 次提交

fix ceph_write_end() · b9de313c

由 Al Viro 提交于 9月 05, 2016

don't zero on short copies; if the page was uptodate it's just plain
wrong, and if it wasn't we'll be better off just returning 0 and
buggering off.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b9de313c

03 10月, 2016 2 次提交

ceph: remove warning when ceph_releasepage() is called on dirty page · e55f1a18

由 NeilBrown 提交于 8月 31, 2016

If O_DIRECT writes are racing with buffered writes, then
the call to invalidate_inode_pages2_range() can call ceph_releasepage()
on dirty pages.

Most filesystems hold inode_lock() across O_DIRECT writes so they do not
suffer this race, but cephfs deliberately drops the lock, and opens a window
for the race.

This race can be triggered with the generic/036 test from the xfstests
test suite.  It doesn't happen every time, but it does happen often.

As the possibilty is expected, remove the warning, and instead include
the PageDirty() status in the debug message.
Signed-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

e55f1a18

ceph: fix error handling of start_read() · 1afe4785

由 Yan, Zheng 提交于 8月 24, 2016

If start_page() fails to add a page to page cache or fails to send
OSD request. It should cal put_page() (instead of free_page()) for
relevant pages.

Besides, start_page() need to cancel fscache readpage if it fails
to send OSD request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reported-by: NZhi Zhang <zhang.david2011@gmail.com>

1afe4785

28 7月, 2016 2 次提交

ceph: rados pool namespace support · 779fe0fb

由 Yan, Zheng 提交于 3月 07, 2016

This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

779fe0fb

libceph: define new ceph_file_layout structure · 7627151e

由 Yan, Zheng 提交于 2月 03, 2016

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7627151e

01 6月, 2016 2 次提交

ceph: disable fscache when inode is opened for write · 46b59b2b

由 Yan, Zheng 提交于 5月 18, 2016

All other filesystems do not add dirty pages to fscache. They all
disable fscache when inode is opened for write. Only ceph adds
dirty pages to fscache, but the code is buggy.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

46b59b2b

ceph: call __fscache_uncache_page() if readpages fails · 368e3585

由 Yan, Zheng 提交于 5月 17, 2016

If readpages fails, fscache needs to cleanup its internal state.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

368e3585

26 5月, 2016 14 次提交

Y
ceph: SetPageError() for writeback pages if writepages fails · b109eec6
由 Yan, Zheng 提交于 5月 13, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
b109eec6

ceph: handle interrupted ceph_writepage() · ad15ec06

由 Yan, Zheng 提交于 5月 13, 2016

writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

ad15ec06

ceph: make ceph_update_writeable_page() uninterruptible · a78bbd4b

由 Yan, Zheng 提交于 5月 13, 2016

ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a78bbd4b

ceph: handle -EAGAIN returned by ceph_update_writeable_page() · f0b33df5

由 Yan, Zheng 提交于 5月 10, 2016

when ceph_update_writeable_page() return -EAGAIN, caller should
lock the page and call ceph_update_writeable_page() again.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

f0b33df5

Y
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM · 6ce026e4
由 Yan, Zheng 提交于 5月 10, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
6ce026e4

ceph: block non-fatal signals for fault/page_mkwrite · 4f7e89f6

由 Yan, Zheng 提交于 5月 10, 2016

Fault and page_mkwrite are supposed to be uninterruptable. But they
call ceph functions that are interruptible. So they should block
signals before calling functions that are interruptible
Signed-off-by: NYan, Zheng <zyan@redhat.com>

4f7e89f6

ceph: don't call truncate_pagecache in ceph_writepages_start · 6c93df5d

由 Yan, Zheng 提交于 4月 15, 2016

truncate_pagecache() may decrease inode's reference. This can cause
deadlock if inode's last reference is dropped and iput_final() wants
to evict the inode. (evict() calls inode_wait_for_writeback(), which
waits for ceph_writepages_start() to return).

The fix is use work thead to truncate dirty pages. Also add 'forced
umount' check to ceph_update_writeable_page(), which prevents new
pages getting dirty.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

6c93df5d

libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e

由 Ilya Dryomov 提交于 4月 28, 2016

If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fe5da05e

libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe

由 Ilya Dryomov 提交于 4月 28, 2016

finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

85e084fe

libceph: switch to calc_target(), part 2 · bb873b53

由 Ilya Dryomov 提交于 5月 26, 2016

The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bb873b53

libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1

由 Ilya Dryomov 提交于 4月 28, 2016

Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

63244fa1

libceph: variable-sized ceph_object_id · d30291b9

由 Ilya Dryomov 提交于 4月 29, 2016

Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d30291b9

libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16

由 Ilya Dryomov 提交于 4月 27, 2016

The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13d1ad16

I
libceph: make ceph_osdc_put_request() accept NULL · 3ed97d63
由 Ilya Dryomov 提交于 4月 26, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
3ed97d63

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功