提交 · a380a031cbe4323a3f638a8468c862510ace1919 · openanolis / cloud-kernel

13 12月, 2016 1 次提交

ceph: fix printing wrong return variable in ceph_direct_read_write() · a380a031

由 Zhi Zhang 提交于 11月 08, 2016

Fix printing wrong return variable for invalidate_inode_pages2_range in
ceph_direct_read_write().
Signed-off-by: NZhi Zhang <zhang.david2011@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a380a031

11 11月, 2016 1 次提交

ceph: use default file splice read callback · 8a8d5617

由 Yan, Zheng 提交于 11月 09, 2016

Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8a8d5617

16 10月, 2016 1 次提交

ceph: fix error handling in ceph_read_iter · 0d7718f6

由 Nikolay Borisov 提交于 10月 10, 2016

In case __ceph_do_getattr returns an error and the retry_op in
ceph_read_iter is not READ_INLINE, then it's possible to invoke
__free_page on a page which is NULL, this naturally leads to a crash.
This can happen when, for example, a process waiting on a MDS reply
receives sigterm.

Fix this by explicitly checking whether the page is set or not.

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0d7718f6

03 10月, 2016 1 次提交

ceph: ignore error from invalidate_inode_pages2_range() in direct write · 5d7eb1a3

由 NeilBrown 提交于 9月 01, 2016

This call can fail if there are dirty pages.  The preceding call to
filemap_write_and_wait_range() will normally remove dirty pages, but
as inode_lock() is not held over calls to ceph_direct_read_write(), it
could race with non-direct writes and pages could be dirtied
immediately after filemap_write_and_wait_range() returns

If there are dirty pages, they will be removed by the subsequent call
to truncate_inode_pages_range(), so having them here is not a problem.

If the 'ret' value is left holding an error, then in the async IO case
(aio_req is not NULL) the loop that would normally call
ceph_osdc_start_request() will see the error in 'ret' and abort all
requests.  This doesn't seem like correct behaviour.

So use separate 'ret2' instead of overloading 'ret'.
Signed-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

5d7eb1a3

28 9月, 2016 1 次提交

fs: Replace current_fs_time() with current_time() · c2050a45

由 Deepa Dinamani 提交于 9月 14, 2016

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c2050a45

28 7月, 2016 5 次提交

ceph: Correctly return NXIO errors from ceph_llseek · 955818cd

由 Phil Turnbull 提交于 7月 21, 2016

ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.

Fixes: 06222e49 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

955818cd

ceph: wait unsafe sync writes for evicting inode · 9a5530c6

由 Yan, Zheng 提交于 6月 15, 2016

Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

9a5530c6

ceph: fix use-after-free bug in ceph_direct_read_write() · fc8c3892

由 Yan, Zheng 提交于 6月 14, 2016

ceph_aio_complete() can free the ceph_aio_request struct before
the code exits the while loop.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fc8c3892

Y
ceph: set user pages dirty after direct IO read · a22bd5ff
由 Yan, Zheng 提交于 5月 26, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
a22bd5ff

libceph: define new ceph_file_layout structure · 7627151e

由 Yan, Zheng 提交于 2月 03, 2016

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7627151e

06 7月, 2016 1 次提交

Use the right predicate in ->atomic_open() instances · 00699ad8

由 Al Viro 提交于 7月 05, 2016

->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache.  Use d_in_lookup() to tell one from another, rather
than d_unhashed().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

00699ad8

01 6月, 2016 1 次提交

ceph: disable fscache when inode is opened for write · 46b59b2b

由 Yan, Zheng 提交于 5月 18, 2016

All other filesystems do not add dirty pages to fscache. They all
disable fscache when inode is opened for write. Only ceph adds
dirty pages to fscache, but the code is buggy.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

46b59b2b

31 5月, 2016 1 次提交

libceph: change ceph_osdmap_flag() to take osdc · b7ec35b3

由 Ilya Dryomov 提交于 4月 28, 2016

For the benefit of every single caller, take osdc instead of map.
Also, now that osdc->osdmap can't ever be NULL, drop the check.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b7ec35b3

26 5月, 2016 7 次提交

ceph: renew caps for read/write if mds session got killed. · 77310320

由 Yan, Zheng 提交于 4月 08, 2016

When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

77310320

libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e

由 Ilya Dryomov 提交于 4月 28, 2016

If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fe5da05e

libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe

由 Ilya Dryomov 提交于 4月 28, 2016

finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

85e084fe

libceph: switch to calc_target(), part 2 · bb873b53

由 Ilya Dryomov 提交于 5月 26, 2016

The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bb873b53

libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1

由 Ilya Dryomov 提交于 4月 28, 2016

Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

63244fa1

libceph: variable-sized ceph_object_id · d30291b9

由 Ilya Dryomov 提交于 4月 29, 2016

Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d30291b9

libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16

由 Ilya Dryomov 提交于 4月 27, 2016

The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13d1ad16

02 5月, 2016 1 次提交

ceph: use generic_write_sync · 6aa657c8

由 Christoph Hellwig 提交于 4月 07, 2016

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6aa657c8

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

26 3月, 2016 4 次提交

ceph: use kmem_cache_zalloc · 99ec2697

由 Geliang Tang 提交于 3月 13, 2016

Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

99ec2697

ceph: fix security xattr deadlock · 315f2408

由 Yan, Zheng 提交于 3月 07, 2016

When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

315f2408

ceph: replace CURRENT_TIME by current_fs_time() · 8bbd4714

由 Deepa Dinamani 提交于 2月 02, 2016

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

8bbd4714

ceph: remove useless BUG_ON · a587d71b

由 Yan, Zheng 提交于 1月 27, 2016

ceph_osdc_start_request() never return -EOLDSNAP
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a587d71b

05 2月, 2016 2 次提交

Y
ceph: fix snap context leak in error path · db6aed70
由 Yan, Zheng 提交于 1月 26, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
db6aed70

ceph: checking for IS_ERR instead of NULL · 1418bf07

由 Dan Carpenter 提交于 1月 26, 2016

ceph_osdc_alloc_request() returns NULL on error, it never returns error
pointers.

Fixes: 5be0389d ('ceph: re-send AIO write request when getting -EOLDSNAP error')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1418bf07

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

22 1月, 2016 3 次提交

ceph: use i_size_{read,write} to get/set i_size · 99c88e69

由 Yan, Zheng 提交于 12月 30, 2015

Cap message from MDS can update i_size. In that case, we don't
hold i_mutex. So it's unsafe to directly access inode->i_size
while holding i_mutex.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

99c88e69

ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d

由 Yan, Zheng 提交于 12月 24, 2015

When receiving -EOLDSNAP from OSD, we need to re-send corresponding
write request. Due to locking issue, we can send new request inside
another OSD request's complete callback. So we use worker to re-send
request for AIO write.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5be0389d

ceph: Asynchronous IO support · c8fe9b17

由 Yan, Zheng 提交于 12月 23, 2015

The basic idea of AIO support is simple, just call kiocb::ki_complete()
in OSD request's complete callback. But there are several special cases.

when IO span multiple objects, we need to wait until all OSD requests
are complete, then call kiocb::ki_complete(). Error handling in this case
is tricky too. For simplify, AIO both span multiple objects and extends
i_size are not allowed.

Another special case is check EOF for reading (other client can write to
the file and extend i_size concurrently). For simplify, the direct-IO/AIO
code path does do the check, fallback to normal syn read instead.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c8fe9b17

03 11月, 2015 1 次提交

ceph: combine as many iovec as possile into one OSD request · b5b98989

由 Zhu, Caifeng 提交于 10月 08, 2015

Both ceph_sync_direct_write and ceph_sync_read iterate iovec elements
one by one, send one OSD request for each iovec. This is sub-optimal,
We can combine serveral iovec into one page vector, and send an OSD
request for the whole page vector.
Signed-off-by: NZhu, Caifeng <zhucaifeng@unissoft-nj.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

b5b98989

09 9月, 2015 3 次提交

Y
ceph: get inode size for each append write · 55b0b31c
由 Yan, Zheng 提交于 9月 07, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
55b0b31c

ceph: no need to get parent inode in ceph_open · e36d571d

由 Jianpeng Ma 提交于 8月 18, 2015

parent inode is needed in creating new inode case.  For ceph_open,
the target inode already exists.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

e36d571d

ceph: remove the useless judgement · a43137f7

由 Jianpeng Ma 提交于 8月 18, 2015

err != 0 is already handled. So skip this.
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

a43137f7

25 6月, 2015 4 次提交

ceph: rework dcache readdir · fdd4e158

由 Yan, Zheng 提交于 6月 16, 2015

Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

fdd4e158

ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL · 687265e5

由 Yan, Zheng 提交于 6月 13, 2015

GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path
Signed-off-by: NYan, Zheng <zyan@redhat.com>

687265e5

Y
ceph: pre-allocate data structure that tracks caps flushing · f66fd9f0
由 Yan, Zheng 提交于 6月 10, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
f66fd9f0

ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference · 5dda377c

由 Yan, Zheng 提交于 4月 30, 2015

In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.

Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5dda377c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功