提交 · 0f2776e6151a123552fd06b666fe755fa780a967 · openanolis / cloud-kernel

30 3月, 2014 1 次提交

rbd: drop an unsafe assertion · 638c323c

由 Alex Elder 提交于 3月 25, 2014

Olivier Bonvalet reported having repeated crashes due to a failed
assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
	rbd_assert(which >= img_request->next_completion);

With a lot of help from Olivier with reproducing the problem
we were able to determine the object and image requests had
already been completed (and often freed) at the point the
assertion failed.

There was a great deal of discussion on the ceph-devel mailing list
about this.  The problem only arose when there were two (or more)
object requests in an image request, and the problem was always
seen when the second request was being completed.

The problem is due to a race in the window between setting the
"done" flag on an object request and checking the image request's
next completion value.  When the first object request completes, it
checks to see if its successor request is marked "done", and if
so, that request is also completed.  In the process, the image
request's next_completion value is updated to reflect that both
the first and second requests are completed.  By the time the
second request is able to check the next_completion value, it
has been set to a value *greater* than its own "which" value,
which caused an assertion to fail.

Fix this problem by skipping over any completion processing
unless the completing object request is the next one expected.
Test only for inequality (not >=), and eliminate the bad
assertion.
Tested-by: NOlivier Bonvalet <ob@daevel.fr>
Signed-off-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NIlya Dryomov <ilya.dryomov@inktank.com>

638c323c

28 1月, 2014 4 次提交

libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} · 3c972c95

由 Ilya Dryomov 提交于 1月 27, 2014

Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before
introducing r_target_{oloc,oid} needed for redirects.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3c972c95

libceph: introduce and start using oid abstraction · 4295f221

由 Ilya Dryomov 提交于 1月 27, 2014

In preparation for tiering support, which would require having two
(base and target) object names for each osd request and also copying
those names around, introduce struct ceph_object_id (oid) and a couple
helpers to facilitate those copies and encapsulate the fact that object
name is not necessarily a NUL-terminated string.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

4295f221

libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN · 2d0ebc5d

由 Ilya Dryomov 提交于 1月 27, 2014

In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to
CEPH_MAX_OID_NAME_LEN.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

2d0ebc5d

libceph: start using oloc abstraction · 22116525

由 Ilya Dryomov 提交于 1月 27, 2014

Instead of relying on pool fields in ceph_file_layout (for mapping) and
ceph_pg (for enconding), start using ceph_object_locator (oloc)
abstraction.  Note that userspace oloc currently consists of pool, key,
nspace and hash fields, while this one contains only a pool.  This is
OK, because at this point we only send (i.e. encode) olocs and never
have to receive (i.e. decode) them.

This makes keeping a copy of ceph_file_layout in every osd request
unnecessary, so ceph_osd_request::r_file_layout field is nuked.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

22116525

01 1月, 2014 10 次提交

rbd: tear down watch request if rbd_dev_device_setup() fails · e37180c0

由 Ilya Dryomov 提交于 12月 16, 2013

Tear down watch request if rbd_dev_device_setup() fails.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e37180c0

rbd: introduce rbd_dev_header_unwatch_sync() and switch to it · fca27065

由 Ilya Dryomov 提交于 12月 16, 2013

Rename rbd_dev_header_watch_sync() to __rbd_dev_header_watch_sync() and
introduce two helpers: rbd_dev_header_{,un}watch_sync() to make it more
clear what is going on.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

fca27065

rbd: enable extended devt in single-major mode · 7e513d43

由 Ilya Dryomov 提交于 12月 16, 2013

If single-major device number allocation scheme is turned on, instead
of reserving 256 minors per device, which imposes a limit of 4096
images mapped at once, reserve 16 minors per device and enable extended
devt feature.  This results in a theoretical limit of 65536 images
mapped at once, and still allows to have more than 15 partititions:
partitions starting with 16th are mapped under major 259 (Block
Extended Major):

$ rbd showmapped
id pool image snap device
0  rbd  b5    -    /dev/rbd0    # no partitions
1  rbd  b2    -    /dev/rbd1    # 40 partitions
2  rbd  b3    -    /dev/rbd2    #  2 partitions

$ cat /proc/partitions
 251        0       1024 rbd0
 251       16       1024 rbd1
 251       17          0 rbd1p1
 251       18          0 rbd1p2
 ...
 251       30          0 rbd1p14
 251       31          0 rbd1p15
 259        0          0 rbd1p16
 259        1          0 rbd1p17
 ...
 259       23          0 rbd1p39
 259       24          0 rbd1p40
 251       32       1024 rbd2
 251       33          0 rbd2p1
 251       34          0 rbd2p2

(major 251 was assigned dynamically at module load time)
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

7e513d43

rbd: add support for single-major device number allocation scheme · 9b60e70b