提交 · 9b60e70b3b6a8e4bc2d1b6d9f858a30e1cec496b · openeuler / Kernel

01 1月, 2014 7 次提交

rbd: add support for single-major device number allocation scheme · 9b60e70b

由 Ilya Dryomov 提交于 12月 13, 2013

Currently each rbd device is allocated its own major number, which
leads to a hard limit of 230-250 images mapped at once.  This commit
adds support for a new single-major device number allocation scheme,
which is hidden behind a new single_major boolean module parameter and
is disabled by default for backwards compatibility reasons.  (Old
userspace cannot correctly unmap images mapped under single-major
scheme and would essentially just unmap a random image, if that.)

$ rbd showmapped
id pool image snap device
0  rbd  b100  -    /dev/rbd0
1  rbd  b101  -    /dev/rbd1
2  rbd  b102  -    /dev/rbd2
3  rbd  b103  -    /dev/rbd3

Old scheme (modprobe rbd):

$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253, 0 Dec 10 12:24 /dev/rbd0
brw-rw---- 1 root disk 252, 0 Dec 10 12:28 /dev/rbd1
brw-rw---- 1 root disk 252, 1 Dec 10 12:28 /dev/rbd1p1
brw-rw---- 1 root disk 252, 2 Dec 10 12:28 /dev/rbd1p2
brw-rw---- 1 root disk 252, 3 Dec 10 12:28 /dev/rbd1p3
brw-rw---- 1 root disk 251, 0 Dec 10 12:28 /dev/rbd2
brw-rw---- 1 root disk 251, 1 Dec 10 12:28 /dev/rbd2p1
brw-rw---- 1 root disk 250, 0 Dec 10 12:24 /dev/rbd3

New scheme (modprobe rbd single_major=Y):

$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253,   0 Dec 10 12:30 /dev/rbd0
brw-rw---- 1 root disk 253, 256 Dec 10 12:30 /dev/rbd1
brw-rw---- 1 root disk 253, 257 Dec 10 12:30 /dev/rbd1p1
brw-rw---- 1 root disk 253, 258 Dec 10 12:30 /dev/rbd1p2
brw-rw---- 1 root disk 253, 259 Dec 10 12:30 /dev/rbd1p3
brw-rw---- 1 root disk 253, 512 Dec 10 12:30 /dev/rbd2
brw-rw---- 1 root disk 253, 513 Dec 10 12:30 /dev/rbd2p1
brw-rw---- 1 root disk 253, 768 Dec 10 12:30 /dev/rbd3

(major 253 was assigned dynamically at module load time)

The new limit is 4096 images mapped at once, and it comes from the fact
that, as before, 256 minor numbers are reserved for each mapping.
(A follow-up commit changes the number of minors reserved and the way
we deal with partitions over that number.)

If single_major is set to true, two new sysfs interfaces show up:
/sys/bus/rbd/{add,remove}_single_major.  These are to be used instead
of /sys/bus/rbd/{add,remove}, which are disabled for backwards
compatibility reasons outlined above.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9b60e70b

rbd: wire up is_visible() sysfs callback for rbd bus · 92c76dc0

由 Ilya Dryomov 提交于 12月 13, 2013

In preparation for single-major device number allocation scheme, wire
up attribute_group::is_visible() callback for rbd bus.  This allows us
to make the new single-major attributes conditional.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

92c76dc0

rbd: add 'minor' sysfs rbd device attribute · dd82fff1

由 Ilya Dryomov 提交于 12月 13, 2013

Introduce /sys/bus/rbd/devices/<id>/minor sysfs attribute for exporting
rbd whole disk minor numbers.  This is a step towards single-major
device number allocation scheme, but also a good thing on its own.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

dd82fff1

rbd: switch to ida for rbd id assignments · f8a22fc2

由 Ilya Dryomov 提交于 12月 13, 2013

Currently rbd ids are allocated using an atomic variable that keeps
track of the highest id currently in use and each new id is simply one
more than the value of that variable.  That's nice and cheap, but it
does mean that rbd ids are allowed to grow boundlessly, and, more
importantly, it's completely unpredictable.  So, in preparation for
single-major device number allocation scheme, which is going to
establish and rely on a constant mapping between rbd ids and device
numbers, switch to ida for rbd id assignments.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f8a22fc2

rbd: refactor rbd_init() a bit · e1b4d96d

由 Ilya Dryomov 提交于 12月 13, 2013

Refactor rbd_init() a bit to make it more clear what's going on.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e1b4d96d

rbd: tweak "loaded" message and module description · 90da258b

由 Ilya Dryomov 提交于 12月 13, 2013

Tweak "loaded" message, so that it looks like

[   30.184235] rbd: loaded

instead of

[   38.056564] rbd: loaded rbd (rados block device)

Also move (and slightly tweak) MODULE_DESCRIPTION so that all authors
are next to each other in modinfo output.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

90da258b

rbd: rbd_device::dev_id is an int, format it as such · 70eebd20

由 Ilya Dryomov 提交于 12月 13, 2013

rbd_device::dev_id is an int, format it as such.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

70eebd20

12 9月, 2013 1 次提交

block: replace strict_strtoul() with kstrtoul() · bb8e0e84

由 Jingoo Han 提交于 9月 11, 2013

The use of strict_strtoul() is not preferred, because strict_strtoul() is
obsolete.  Thus, kstrtoul() should be used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb8e0e84

10 9月, 2013 5 次提交

rbd: fix error handling from rbd_snap_name() · da6a6b63

由 Josh Durgin 提交于 9月 04, 2013

rbd_snap_name() calls rbd_dev_v{1,2}_snap_name() depending on the
format of the image. The format 1 version returns NULL on error, which
is handled by the caller. The format 2 version returns an ERR_PTR,
which the caller of rbd_snap_name() does not expect.

Fortunately this is unlikely to occur in practice because
rbd_snap_id_by_name() is called before rbd_snap_name(). This would hit
similar errors to rbd_snap_name() (like the snapshot not existing) and
return early, so rbd_snap_name() would not hit an error unless the
snapshot was removed between the two calls or memory was exhausted.

Use an ERR_PTR in rbd_dev_v1_snap_name() so that the specific error
can be propagated, and it is consistent with rbd_dev_v2_snap_name().
Handle the ERR_PTR in the only rbd_snap_name() caller.
Suggested-by: NAlex Elder <alex.elder@linaro.org>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

da6a6b63

rbd: ignore unmapped snapshots that no longer exist · efadc98a

由 Josh Durgin 提交于 8月 29, 2013

This prevents erroring out while adding a device when a snapshot
unrelated to the current mapping is deleted between reading the
snapshot context and reading the snapshot names. If the mapped
snapshot name is not found an error still occurs as usual.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

efadc98a

rbd: fix use-after free of rbd_dev->disk · 9875201e

由 Josh Durgin 提交于 8月 29, 2013

Removing a device deallocates the disk, unschedules the watch, and
finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
from the watch callback, updates the disk size and rbd_dev
structure. With no locking between them, rbd_dev_refresh() may use the
device or rbd_dev after they've been freed.

To fix this, check whether RBD_DEV_FLAG_REMOVING is set before
updating the disk size in rbd_dev_refresh(). In order to prevent a
race where rbd_dev_refresh() is already revalidating the disk when
rbd_remove() is called, move the call to rbd_bus_del_dev() after the
watch is unregistered and all notifies are complete. It's safe to
defer deleting this structure because no new requests can be submitted
once the RBD_DEV_FLAG_REMOVING is set, since the device cannot be
opened.

Fixes: http://tracker.ceph.com/issues/5636Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

9875201e

rbd: make rbd_obj_notify_ack() synchronous · 20e0af67

由 Josh Durgin 提交于 8月 29, 2013

The only user of rbd_obj_notify_ack() is rbd_watch_cb(). It used
asynchronously with no tracking of when the notify ack completes, so
it may still be in progress when the osd_client is shut down.  This
results in a BUG() since the osd client assumes no requests are in
flight when it stops. Since all notifies are flushed before the
osd_client is stopped, waiting for the notify ack to complete before
returning from the watch callback ensures there are no notify acks in
flight during shutdown.

Rename rbd_obj_notify_ack() to rbd_obj_notify_ack_sync() to reflect
its new synchronous nature.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

20e0af67

rbd: complete notifies before cleaning up osd_client and rbd_dev · 9abc5990

由 Josh Durgin 提交于 8月 29, 2013

To ensure rbd_dev is not used after it's released, flush all pending
notify callbacks before calling rbd_dev_image_release(). No new
notifies can be added to the queue at this point because the watch has
already be unregistered with the osd_client.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

9abc5990

04 9月, 2013 3 次提交

rbd: fix null dereference in dout · c3545579

由 Josh Durgin 提交于 8月 28, 2013

The order parameter is sometimes NULL in _rbd_dev_v2_snap_size(), but
the dout() always derefences it. Move this to another dout() protected
by a check that order is non-NULL.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <alex.elder@linaro.org>

c3545579

rbd: fix buffer size for writes to images with snapshots · 03507db6

由 Josh Durgin 提交于 8月 27, 2013

rbd_osd_req_create() needs to know the snapshot context size to create
a buffer large enough to send it with the message front. It gets this
from the img_request, which was not set for the obj_request yet. This
resulted in trying to write past the end of the front payload, hitting
this BUG:

libceph: BUG_ON(p > msg->front.iov_base + msg->front.iov_len);

Fix this by associating the obj_request with its img_request
immediately after it's created, before the osd request is created.

Fixes: http://tracker.ceph.com/issues/5760Suggested-by: NAlex Elder <alex.elder@linaro.org>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <alex.elder@linaro.org>

03507db6

rbd: fix I/O error propagation for reads · 17c1cc1d

由 Josh Durgin 提交于 8月 26, 2013

When a request returns an error, the driver needs to report the entire
extent of the request as completed.  Writes already did this, since
they always set xferred = length, but reads were skipping that step if
an error other than -ENOENT occurred.  Instead, rbd would end up
passing 0 xferred to blk_end_request(), which would always report
needing more data.  This resulted in an assert failing when more data
was required by the block layer, but all the object requests were
done:

[ 1868.719077] rbd: obj_request read result -108 xferred 0
[ 1868.719077]
[ 1868.719518] end_request: I/O error, dev rbd1, sector 0
[ 1868.719739]
[ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
[ 1868.719739]
[ 1868.719739]   rbd_assert(more ^ (which == img_request->obj_request_count));

Without this assert, reads that hit errors would hang forever, since
the block layer considered them incomplete.

Fixes: http://tracker.ceph.com/issues/5647
CC: stable@vger.kernel.org  # v3.10
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <alex.elder@linaro.org>

17c1cc1d

28 8月, 2013 1 次提交

rbd: convert bus code to use bus_groups · b15a21dd

由 Greg Kroah-Hartman 提交于 8月 23, 2013

The bus_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the RBD bus code to use the
correct field.

Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Acked-by: NAlex Elder <elder@linaro.org>
Cc: <ceph-devel@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b15a21dd

10 8月, 2013 1 次提交

block: rbd: use NULL instead of 0 · a158073c

由 Jingoo Han 提交于 8月 09, 2013

The local variables such as 'bio_list', and 'pages' are pointers;
thus, use NULL instead of 0 to fix the following sparse warnings.

drivers/block/rbd.c:2166:32: warning: Using plain integer as NULL pointer
drivers/block/rbd.c:2168:31: warning: Using plain integer as NULL pointer
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a158073c

04 7月, 2013 9 次提交

rbd: fix a couple warnings · e976cad0

由 Sage Weil 提交于 6月 09, 2013

gcc isn't quite smart enough and generates these warnings:

drivers/block/rbd.c: In function 'rbd_img_request_fill':
drivers/block/rbd.c:1266:22: warning: 'bio_list' may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/block/rbd.c:2186:14: note: 'bio_list' was declared here
drivers/block/rbd.c:2247:10: warning: 'pages' may be used uninitialized in this function [-Wmaybe-uninitialized]

even though they are initialized for their respective code paths.
Signed-off-by: NSage Weil <sage@inktank.com>

e976cad0

rbd: take a little credit · d552c619

由 Alex Elder 提交于 5月 31, 2013

Add a name to the list of authors.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d552c619

rbd: use rwsem to protect header updates · cfbf6377

由 Alex Elder 提交于 5月 31, 2013

Updating an image header needs to be protected to ensure it's
done consistently.  However distinct headers can be updated
concurrently without a problem.  Instead of using the global
control lock to serialize headder updates, just rely on the header
semaphore.  (It's already used, this just moves it out to cover
a broader section of the code.)

That leaves the control mutex protecting only the creation of rbd
clients, so rename it.

This resolves:
    http://tracker.ceph.com/issues/5222Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

cfbf6377

rbd: don't hold ctl_mutex to get/put device · 1ba0f1e7

由 Alex Elder 提交于 5月 31, 2013

When an rbd device is first getting mapped, its device registration
is protected the control mutex.  There is no need to do that though,
because the device has already been assigned an id that's guaranteed
to be unique.

An unmap of an rbd device won't proceed if the device has a non-zero
open count or is already being unmapped.  So there's no need to hold
the control mutex in that case either.

Finally, an rbd device can't be opened if it is being removed, and
it won't go away if there is a non-zero open count.  So here too
there's no need to hold the control mutex while getting or putting a
reference to an rbd device's Linux device structure.

Drop the mutex calls in these cases.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

1ba0f1e7

rbd: protect against concurrent unmaps · 82a442d2

由 Alex Elder 提交于 5月 31, 2013

Make sure two concurrent unmap operations on the same rbd device
won't collide, by only proceeding with the removal and cleanup of a
device if is not already underway.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

82a442d2

rbd: set removing flag while holding list lock · 751cc0e3

由 Alex Elder 提交于 5月 31, 2013

When unmapping a device, its id is supplied, and that is used to
look up which rbd device should be unmapped.  Looking up the
device involves searching the rbd device list while holding
a spinlock that protects access to that list.

Currently all of this is done under protection of the control lock,
but that protection is going away soon.  To ensure the rbd_dev is
still valid (still on the list) while setting its REMOVING flag, do
so while still holding the list lock.  To do so, get rid of
__rbd_get_dev(), and open code what it did in the one place it
was used.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

751cc0e3

rbd: protect against duplicate client creation · 08f75463

由 Alex Elder 提交于 5月 29, 2013

If more than one rbd image has the same ceph cluster configuration
(same options, same set of monitors, same keys) they normally share
a single rbd client.

When an image is getting mapped, rbd looks to see if an existing
client can be used, and creates a new one if not.

The lookup and creation are not done under a common lock though, so
mapping two images concurrently could lead to duplicate clients
getting set up needlessly.  This isn't a major problem, but it's
wasteful and different from what's intended.

This patch fixes that by using the control mutex to protect
both the lookup and (if needed) creation of the client.  It
was previously used just when creating.

This resolves:
    http://tracker.ceph.com/issues/3094Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

08f75463

rbd: clean up a few things in the refresh path · 3b5cf2a2

由 Alex Elder 提交于 5月 29, 2013

This includes a few relatively small fixes I found while examining
the code that refreshes image information.

This resolves:
    http://tracker.ceph.com/issues/5040Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3b5cf2a2

rbd: flush dcache after zeroing page data · e2156054

由 Alex Elder 提交于 5月 22, 2013

Neither zero_bio_chain() nor zero_pages() contains a call to flush
caches after zeroing a portion of a page.  This can cause problems
on architectures that have caches that allow virtual address
aliasing.

This resolves:
    http://tracker.ceph.com/issues/4777Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e2156054

02 7月, 2013 2 次提交

rbd: drop original request earlier for existence check · 912c317d

由 Alex Elder 提交于 5月 13, 2013

The reference to the original request dropped at the end of
rbd_img_obj_exists_callback() corresponds to the reference taken
in rbd_img_obj_exists_submit() to account for the stat request
referring to it.  Move the put of that reference up right after
clearing that pointer to make its purpose more obvious.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

912c317d

rbd: Use min_t() to fix comparison of distinct pointer types warning · 491205a8

由 Geert Uytterhoeven 提交于 5月 13, 2013

drivers/block/rbd.c: In function ‘zero_pages’:
drivers/block/rbd.c:1102: warning: comparison of distinct pointer types lacks a cast

Remove the hackish casts and use min_t() to fix this.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: NAlex Elder <elder@inktank.com>

491205a8

27 6月, 2013 1 次提交

rbd: send snapshot context with writes · d2d1f17a

由 Josh Durgin 提交于 6月 26, 2013

Sending the right snapshot context with each write is required for
snapshots to work. Due to the ordering of calls, the snapshot context
is never set for any requests. This causes writes to the current
version of the image to be reflected in all snapshots, which are
supposed to be read-only.

This happens because rbd_osd_req_format_write() sets the snapshot
context based on obj_request->img_request. At this point, however,
obj_request->img_request has not been set yet, to the snapshot context
is set to NULL. Fix this by moving rbd_img_obj_request_add(), which
sets obj_request->img_request, before the osd request formatting
calls.

This resolves:
    http://tracker.ceph.com/issues/5465Reported-by: NKarol Jurak <karol.jurak@gmail.com>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d2d1f17a

26 6月, 2013 1 次提交

rbd: fetch object order before using it · 1617e40c

由 Josh Durgin 提交于 6月 12, 2013

rbd_dev_v2_header_onetime() fetches striping information, and
checks whether the image can be read by compariing the stripe unit
to the object size. It determines the object size by shifting
the object order, which is 0 at this point since it has not been
read yet. Move the call to get the image size and object order
before rbd_dev_v2_header_onetime() so it is set before use.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1617e40c

13 6月, 2013 1 次提交

rbd: use the correct length for format 2 object names · 3a96d5cd

由 Josh Durgin 提交于 6月 12, 2013

Format 2 objects use 16 characters for the object name suffix to be
able to express the full 64-bit range of object numbers. Format 1
images only use 12 characters for this. Using 12-character names for
format 2 caused userspace and kernel rbd clients to read differently
named objects, which made an image written by one client look empty to
the other client.

CC: stable@vger.kernel.org  # 3.9+
Reported-by: NChris Dunlop <chris@onthe.net.au>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3a96d5cd

18 5月, 2013 2 次提交

rbd: fix cleanup in rbd_add() · 3abef3b3

由 Alex Elder 提交于 5月 13, 2013

Bjorn Helgaas pointed out that a recent commit introduced a
use-after-free condition in an error path for rbd_add().
He correctly stated:

    I think b536f69a "rbd: set up devices only for mapped images"
    introduced a use-after-free error in rbd_add():
	...
    If rbd_dev_device_setup() returns an error, we call
    rbd_dev_image_release(), which ultimately kfrees rbd_dev.
    Then we call rbd_dev_destroy(), which references fields in
    the already-freed rbd_dev struct before kfreeing it again.

The simple fix is to return the error code after the call to
rbd_dev_image_release().

Closer examination revealed that there's no need to clean up
rbd_opts in that function, so fix that too.

Update some other comments that have also become out of date.
Reported-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3abef3b3

rbd: don't destroy ceph_opts in rbd_add() · 7262cfca

由 Alex Elder 提交于 5月 16, 2013

Whether rbd_client_create() successfully creates a new client or
not, it takes responsibility for getting the ceph_opts structure
it's passed destroyed.  If successful, the structure becomes
associated with the created client; if not, rbd_client_create()
will destroy it.

Previously, rbd_get_client() would call ceph_destroy_options()
if rbd_get_client() failed, and that meant it got called twice.
That led freeing various pointers more than once, which is never a
good idea.

This resolves:
    http://tracker.ceph.com/issues/4559

Cc: stable@vger.kernel.org # 3.8+
Reported-by: NDan van der Ster <dan@vanderster.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

7262cfca

14 5月, 2013 6 次提交

rbd: re-submit flattened write request (part 2) · 638f5abe

由 Alex Elder 提交于 5月 06, 2013

Add code to rbd_img_obj_exists_callback() to detect when a clone's
parent image has disappeared, and re-submit the original write
request in that case.

Kill off some redundant assertions.

This completes the resolution for:
    http://tracker.ceph.com/issues/3763Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

638f5abe

rbd: re-submit write request for flattened clone · bbea1c1a

由 Alex Elder 提交于 5月 06, 2013

Add code to rbd_img_parent_read_full_callback() to detect when a
clone's parent image has disappeared, and re-submit the original
write request in that case.  (See the previous commit for more
reasoning about why this is appropriate.)

Rename some variables in rbd_img_obj_parent_read_full_callback()
to match the convention used in the previous patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

bbea1c1a

rbd: re-submit read request for flattened clone · 02c74fba

由 Alex Elder 提交于 5月 06, 2013

If a clone image gets flattened while a parent read request is
underway, the original rbd object request needs to be resubmitted.

The reason is that by the time we get the response to the parent
read request, the data read from the parent may be out of date.
In other words, we could see this sequence of events:

    rbd client                      parent image/osd
    ----------                      ----------------
    original object ENOENT;
        issue parent read
                                    respond to parent read
                                    child image flattened
    original image header refresh
             <--- original object written independently here
    parent read response received

Add code to rbd_img_parent_read_callback() to detect when a clone's
parent image has disappeared (as evidenced by its parent overlap
becoming 0), and re-submit the original read request in that case.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

02c74fba

rbd: detect when clone image is flattened · 392a9dad

由 Alex Elder 提交于 5月 06, 2013

A format 2 clone image can be the subject of a "flatten" operation,
during which all of its data gets "copied up" from its parent image,
leaving the image fully populated.  Once this is complete, the
clone's association with the parent is abolished.

Since this can occur when a clone is mapped, we need to detect when
it has occurred and handle it accordingly.  We know an image has
been flattened when we know it at one time had a parent, but we have
learned (via a "get_parent" object class method call) it no longer
has one.

There might be in-flight requests at the point we learn an image has
been flattened, so we can't simply clean up parent data structures
right away.  Instead, we'll drop the initial parent reference when
the parent has disappeared (rather than when the image gets
destroyed), which will allow the last in-flight reference to clean
things up when it's complete.

We leverage the fact that a zero parent overlap renders an image
effectively unlayered.  We set the overlap to 0 at the point we
detect the clone image has flattened, which allows the unlayered
behavior to take effect immediately, while keeping other parent
structures in place until in-flight requests to complete.

This and the next few patches resolve:
    http://tracker.ceph.com/issues/3763Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

392a9dad

rbd: reference count parent requests · a2acd00e

由 Alex Elder 提交于 5月 08, 2013

Keep a reference count for uses of the parent information for an rbd
device.

An initial reference is set in rbd_img_request_create() if the
target image has a parent (with non-zero overlap).  Each image
request for an image with a non-zero parent overlap gets another
reference when it's created, and that reference is dropped when the
request is destroyed.

The initial reference is dropped when the image gets torn down.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

a2acd00e

rbd: define parent image request routines · e93f3152

由 Alex Elder 提交于 5月 08, 2013

Define rbd_parent_request_create() and rbd_parent_request_destroy()
to handle the creation of parent image requests submitted for
layered image objects.  For simplicity, let rbd_img_request_put()
handle dropping the reference to any image request (parent or not),
and call whichever destructor is appropriate on the last put.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e93f3152

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功