提交 · 005a07bf0a92e7f0e73fc9a6c9acc992c5dbd00c · openeuler / Kernel

25 8月, 2016 6 次提交

rbd: add 'client_addr' sysfs rbd device attribute · 005a07bf

由 Ilya Dryomov 提交于 8月 18, 2016

Export client addr/nonce, so userspace can check if a image is being
blacklisted.
Signed-off-by: NMike Christie <mchristi@redhat.com>
[idryomov@gmail.com: ceph_client_addr(), endianess fix]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

005a07bf

rbd: print capacity in decimal and features in hex · ca7909e8

由 Ilya Dryomov 提交于 8月 18, 2016

With exclusive-lock added and more to come, print features into dmesg.
Change capacity to decimal while at it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>

ca7909e8

rbd: support for exclusive-lock feature · ed95b21a

由 Ilya Dryomov 提交于 8月 12, 2016

Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature.  Maintenance
operations (resize, snapshot create, etc) are offloaded to librbd via
returning -EOPNOTSUPP - librbd should request the lock and execute the
operation.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Tested-by: NMike Christie <mchristi@redhat.com>

ed95b21a

rbd: retry watch re-registration periodically · 99d16943

由 Ilya Dryomov 提交于 8月 12, 2016

Revamp watch code to support retrying watch re-registration:

- add rbd_dev->watch_state for more robust errcb handling
- store watch cookie separately to avoid dereferencing watch_handle
  which is set to NULL on unwatch
- move re-register code into a delayed work and retry re-registration
  every second, unless the client is blacklisted
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Tested-by: NMike Christie <mchristi@redhat.com>

99d16943

rbd: introduce a per-device ordered workqueue · 1643dfa4

由 Ilya Dryomov 提交于 8月 12, 2016

This is going to be used for re-registering watch requests and
exclusive-lock tasks: acquire/request lock, notify-acquired, release
lock, notify-released.  Some refactoring in the map/unmap paths was
necessary to give this workqueue a meaningful name: "rbdX-tasks".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>

1643dfa4

libceph: rename ceph_client_id() -> ceph_client_gid() · 033268a5

由 Ilya Dryomov 提交于 8月 12, 2016

It's gid / global_id in other places.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

033268a5

09 8月, 2016 2 次提交

rbd: nuke the 32-bit pool id check · d8734849

由 Ilya Dryomov 提交于 8月 08, 2016

ceph_file_layout::pool_id is now s64.  rbd_add_get_pool_id() and
ceph_pg_poolid_by_name() both return an int, so it's bogus anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d8734849

rbd: destroy header_oloc in rbd_dev_release() · 6b6dddbe

由 Ilya Dryomov 提交于 8月 05, 2016

Purely cosmetic at this point, as rbd doesn't use RADOS namespaces and
hence rbd_dev->header_oloc->pool_ns is always NULL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6b6dddbe

28 7月, 2016 2 次提交

libceph: rados pool namespace support · 30c156d9

由 Yan, Zheng 提交于 2月 14, 2016

Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.

The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

30c156d9

libceph: define new ceph_file_layout structure · 7627151e

由 Yan, Zheng 提交于 2月 03, 2016

Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7627151e

08 6月, 2016 1 次提交

drivers: use req op accessor · c2df40df

由 Mike Christie 提交于 6月 05, 2016

The req operation REQ_OP is separated from the rq_flag_bits
definition. This converts the block layer drivers to
use req_op to get the op from the request struct.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c2df40df

26 5月, 2016 10 次提交

libceph: replace ceph_monc_request_next_osdmap() · 7cca78c9

由 Ilya Dryomov 提交于 4月 28, 2016

... with a wrapper around maybe_request_map() - no need for two
osdmap-specific functions.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7cca78c9

libceph: async MON client generic requests · d0b19705

由 Ilya Dryomov 提交于 4月 28, 2016

For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
messages asynchronously and get a callback on completion.  Refactor MON
client to allow firing off generic requests asynchronously and add an
async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
switched over and remains sync.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d0b19705

libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61

由 Ilya Dryomov 提交于 5月 26, 2016

This adds support and switches rbd to a new, more reliable version of
watch/notify protocol.  As with the OSD client update, this is mostly
about getting the right structures linked into the right places so that
reconnects are properly sent when needed.  watch/notify v2 also
requires sending regular pings to the OSDs - send_linger_ping().

A major change from the old watch/notify implementation is the
introduction of ceph_osd_linger_request - linger requests no longer
piggy back on ceph_osd_request.  ceph_osd_event has been merged into
ceph_osd_linger_request.

All the details are now hidden within libceph, the interface consists
of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
the lifetime management simple.

ceph_osdc_notify_ack() accepts an optional data payload, which is
relayed back to the notifier.

Portions of this patch are loosely based on work by Douglas Fuller
<dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

922dab61

rbd: rbd_dev_header_unwatch_sync() variant · c525f036

由 Ilya Dryomov 提交于 4月 28, 2016

Introduce __rbd_dev_header_unwatch_sync(), which doesn't flush notify
callbacks.  This is for the new rados_watcherrcb_t, which would be
called from a notify callback.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c525f036

libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe

由 Ilya Dryomov 提交于 4月 28, 2016

finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

85e084fe

libceph: switch to calc_target(), part 2 · bb873b53

由 Ilya Dryomov 提交于 5月 26, 2016

The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bb873b53

rbd: use header_oid instead of header_name · c41d13a3

由 Ilya Dryomov 提交于 4月 29, 2016

Switch to ceph_object_id and use ceph_oid_aprintf() instead of a bare
const char *.  This reduces noise in rbd_dev_header_name().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c41d13a3

libceph: variable-sized ceph_object_id · d30291b9

由 Ilya Dryomov 提交于 4月 29, 2016

Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d30291b9

libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16

由 Ilya Dryomov 提交于 4月 27, 2016

The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13d1ad16

rbd: get/put img_request in rbd_img_request_submit() · 663ae2cc

由 Ilya Dryomov 提交于 5月 16, 2016

By the time we get to checking for_each_obj_request_safe(img_request)
terminating condition, all obj_requests may be complete and img_request
ref, that rbd_img_request_submit() takes away from its caller, may be
put.  Moving the next_obj_request cursor is then a use-after-free on
img_request.

It's totally benign, as the value that's read is never used, but
I think it's still worth fixing.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

663ae2cc

28 4月, 2016 2 次提交

rbd: report unsupported features to syslog · d3767f0f

由 Ilya Dryomov 提交于 4月 13, 2016

... instead of just returning an error.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

d3767f0f

rbd: fix rbd map vs notify races · 811c6688

由 Ilya Dryomov 提交于 4月 15, 2016

A while ago, commit 9875201e ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

811c6688

06 4月, 2016 1 次提交

rbd: use GFP_NOIO consistently for request allocations · 2224d879

由 David Disseldorp 提交于 4月 05, 2016

As of 5a60e876, RBD object request
allocations are made via rbd_obj_request_create() with GFP_NOIO.
However, subsequent OSD request allocations in rbd_osd_req_create*()
use GFP_ATOMIC.

With heavy page cache usage (e.g. OSDs running on same host as krbd
client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been
observed to fail, where direct reclaim would have allowed GFP_NOIO
allocations to succeed.

Cc: stable@vger.kernel.org # 3.18+
Suggested-by: NVlastimil Babka <vbabka@suse.cz>
Suggested-by: NNeil Brown <neilb@suse.com>
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2224d879

26 3月, 2016 3 次提交

rbd: use KMEM_CACHE macro · 03d94406

由 Geliang Tang 提交于 3月 13, 2016

Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

03d94406

libceph: enable large, variable-sized OSD requests · 3f1af42a

由 Ilya Dryomov 提交于 2月 09, 2016

Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests.  The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once.  ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3f1af42a

libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op · 7665d85b

由 Yan, Zheng 提交于 1月 07, 2016

This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7665d85b

22 1月, 2016 1 次提交

rbd: delete an unnecessary check before rbd_dev_destroy() · 1761b229

由 Markus Elfring 提交于 11月 23, 2015

The rbd_dev_destroy() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1761b229

04 12月, 2015 1 次提交

rbd: don't put snap_context twice in rbd_queue_workfn() · 70b16db8

由 Ilya Dryomov 提交于 11月 27, 2015

Commit 4e752f0a ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn().  However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put.  Fix it.

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

70b16db8

03 11月, 2015 5 次提交

rbd: remove duplicate calls to rbd_dev_mapping_clear() · 4afb04c0

由 Ilya Dryomov 提交于 10月 22, 2015

Commit d1cf5788 ("rbd: set mapping info earlier") defined
rbd_dev_mapping_clear(), but, just a few days after, commit
f35a4dee ("rbd: set the mapping size and features later") moved
rbd_dev_mapping_set() calls and added another rbd_dev_mapping_clear()
call instead of moving the old one. Around the same time, another
duplicate was introduced in rbd_dev_device_release() - kill both.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4afb04c0

rbd: set device_type::release instead of device::release · 6cac4695

由 Ilya Dryomov 提交于 10月 16, 2015

No point in providing an empty device_type::release callback and then
setting device::release for each rbd_dev dynamically.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6cac4695

rbd: don't free rbd_dev outside of the release callback · dd5ac32d

由 Ilya Dryomov 提交于 10月 16, 2015

struct rbd_device has struct device embedded in it, which means it's
part of kobject universe and has an unpredictable life cycle.  Freeing
its memory outside of the release callback is flawed, yet commits
200a6a8b ("rbd: don't destroy rbd_dev in device release function")
and 8ad42cd0 ("rbd: don't have device release destroy rbd_dev")
moved rbd_dev_destroy() out to rbd_dev_image_release().

This commit reverts most of that, the key points are:

- rbd_dev->dev is initialized in rbd_dev_create(), making it possible
  to use rbd_dev_destroy() - which is just a put_device() - both before
  we register with device core and after.

- rbd_dev_release() (the release callback) is the only place we
  kfree(rbd_dev).  It's also where we do module_put(), keeping the
  module unload race window as small as possible.

- We pin the module in rbd_dev_create(), but only for mapping
  rbd_dev-s.  Moving image related stuff out of struct rbd_device into
  another struct which isn't tied with sysfs and device core is long
  overdue, but until that happens, this will keep rbd module refcount
  (which users can observe with lsmod) sane.

Fixes: http://tracker.ceph.com/issues/12697

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dd5ac32d

rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails · b51c83c2

由 Ilya Dryomov 提交于 10月 15, 2015

Returning pool id (i.e. >= 0) from a sysfs ->store() callback makes
userspace think it needs to retry the write.  Fix it - it's a leftover
from the times when the equivalent of rbd_dev_create() was the first
action in rbd_add().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b51c83c2

rbd: drop null test before destroy functions · 13bf2834

由 Julia Lawall 提交于 9月 13, 2015

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL) {
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
  x = NULL;
-}
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13bf2834

31 10月, 2015 1 次提交

rbd: require stable pages if message data CRCs are enabled · bae818ee

由 Ronny Hegewald 提交于 10月 15, 2015

rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.

But since kernel 3.9 (patch 1d1d1a76
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.

This patch sets the necessary flag to get stable pages back for rbd.

In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).

Cc: stable@vger.kernel.org # 3.9+, needs backporting
Signed-off-by: NRonny Hegewald <Ronny.Hegewald@online.de>
[idryomov@gmail.com: require stable pages only in crc case, changelog]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bae818ee

24 10月, 2015 2 次提交

rbd: prevent kernel stack blow up on rbd map · 6d69bb53

由 Ilya Dryomov 提交于 10月 11, 2015

Mapping an image with a long parent chain (e.g. image foo, whose parent
is bar, whose parent is baz, etc) currently leads to a kernel stack
overflow, due to the following recursion in the reply path:

  rbd_osd_req_callback()
    rbd_obj_request_complete()
      rbd_img_obj_callback()
        rbd_img_parent_read_callback()
          rbd_obj_request_complete()
            ...

Limit the parent chain to 16 images, which is ~5K worth of stack.  When
the above recursion is eliminated, this limit can be lifted.

Fixes: http://tracker.ceph.com/issues/12538

Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

6d69bb53

rbd: don't leak parent_spec in rbd_dev_probe_parent() · 1f2c6651

由 Ilya Dryomov 提交于 10月 11, 2015

Currently we leak parent_spec and trigger a "parent reference
underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
The problem is we take the !parent out_err branch and that only drops
refcounts; parent_spec that would've been freed had we called
rbd_dev_unparent() remains and triggers rbd_warn() in
rbd_dev_parent_put() - at that point we have parent_spec != NULL and
parent_ref == 0, so counter ends up being -1 after the decrement.

Redo rbd_dev_probe_parent() to fix this.

Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

1f2c6651

16 10月, 2015 2 次提交

rbd: use writefull op for object size writes · e30b7577

由 Ilya Dryomov 提交于 10月 07, 2015

This covers only the simplest case - an object size sized write, but
it's still useful in tiering setups when EC is used for the base tier
as writefull op can be proxied, saving an object promotion.

Even though updating ceph_osdc_new_request() to allow writefull should
just be a matter of fixing an assert, I didn't do it because its only
user is cephfs.  All other sites were updated.

Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e30b7577

rbd: set max_sectors explicitly · 0d9fde4f

由 Ilya Dryomov 提交于 10月 07, 2015

Commit 30e2bc08 ("Revert "block: remove artifical max_hw_sectors
cap"") restored a clamp on max_sectors.  It's now 2560 sectors instead
of 1024, but it's not good enough: we set max_hw_sectors to rbd object
size because we don't want object sized I/Os to be split, and the
default object size is 4M.

So, set max_sectors to max_hw_sectors in rbd at queue init time.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

0d9fde4f

09 9月, 2015 1 次提交

rbd: plug rbd_dev->header.object_prefix memory leak · d194cd1d

由 Ilya Dryomov 提交于 8月 31, 2015

Need to free object_prefix when rbd_dev_v2_snap_context() fails, but
only if this is the first time we are reading in the header.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d194cd1d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功