提交 · d2d1f17a0dad823a4cb71583433d26cd7f734e08 · openeuler / Kernel

27 6月, 2013 1 次提交

rbd: send snapshot context with writes · d2d1f17a

由 Josh Durgin 提交于 6月 26, 2013

Sending the right snapshot context with each write is required for
snapshots to work. Due to the ordering of calls, the snapshot context
is never set for any requests. This causes writes to the current
version of the image to be reflected in all snapshots, which are
supposed to be read-only.

This happens because rbd_osd_req_format_write() sets the snapshot
context based on obj_request->img_request. At this point, however,
obj_request->img_request has not been set yet, to the snapshot context
is set to NULL. Fix this by moving rbd_img_obj_request_add(), which
sets obj_request->img_request, before the osd request formatting
calls.

This resolves:
    http://tracker.ceph.com/issues/5465Reported-by: NKarol Jurak <karol.jurak@gmail.com>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d2d1f17a

26 6月, 2013 1 次提交

rbd: fetch object order before using it · 1617e40c

由 Josh Durgin 提交于 6月 12, 2013

rbd_dev_v2_header_onetime() fetches striping information, and
checks whether the image can be read by compariing the stripe unit
to the object size. It determines the object size by shifting
the object order, which is 0 at this point since it has not been
read yet. Move the call to get the image size and object order
before rbd_dev_v2_header_onetime() so it is set before use.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1617e40c

13 6月, 2013 1 次提交

rbd: use the correct length for format 2 object names · 3a96d5cd

由 Josh Durgin 提交于 6月 12, 2013

Format 2 objects use 16 characters for the object name suffix to be
able to express the full 64-bit range of object numbers. Format 1
images only use 12 characters for this. Using 12-character names for
format 2 caused userspace and kernel rbd clients to read differently
named objects, which made an image written by one client look empty to
the other client.

CC: stable@vger.kernel.org  # 3.9+
Reported-by: NChris Dunlop <chris@onthe.net.au>
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3a96d5cd

18 5月, 2013 5 次提交

rbd: fix cleanup in rbd_add() · 3abef3b3

由 Alex Elder 提交于 5月 13, 2013

Bjorn Helgaas pointed out that a recent commit introduced a
use-after-free condition in an error path for rbd_add().
He correctly stated:

    I think b536f69a "rbd: set up devices only for mapped images"
    introduced a use-after-free error in rbd_add():
	...
    If rbd_dev_device_setup() returns an error, we call
    rbd_dev_image_release(), which ultimately kfrees rbd_dev.
    Then we call rbd_dev_destroy(), which references fields in
    the already-freed rbd_dev struct before kfreeing it again.

The simple fix is to return the error code after the call to
rbd_dev_image_release().

Closer examination revealed that there's no need to clean up
rbd_opts in that function, so fix that too.

Update some other comments that have also become out of date.
Reported-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3abef3b3

rbd: don't destroy ceph_opts in rbd_add() · 7262cfca

由 Alex Elder 提交于 5月 16, 2013

Whether rbd_client_create() successfully creates a new client or
not, it takes responsibility for getting the ceph_opts structure
it's passed destroyed.  If successful, the structure becomes
associated with the created client; if not, rbd_client_create()
will destroy it.

Previously, rbd_get_client() would call ceph_destroy_options()
if rbd_get_client() failed, and that meant it got called twice.
That led freeing various pointers more than once, which is never a
good idea.

This resolves:
    http://tracker.ceph.com/issues/4559

Cc: stable@vger.kernel.org # 3.8+
Reported-by: NDan van der Ster <dan@vanderster.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

7262cfca

ceph: ceph_pagelist_append might sleep while atomic · 39be95e9

由 Jim Schutt 提交于 5月 15, 2013

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
[13439.384353]  [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
[13439.409277]  [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [<ffffffff81196748>] ? igrab+0x28/0x70
[13439.420610]  [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
[13439.473934]  [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

39be95e9

ceph: add cpu_to_le32() calls when encoding a reconnect capability · c420276a

由 Jim Schutt 提交于 5月 15, 2013

In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

c420276a

libceph: must hold mutex for reset_changed_osds() · 14d2f38d

由 Alex Elder 提交于 5月 15, 2013

An osd client has a red-black tree describing its osds, and
occasionally we would get crashes due to one of these trees tree
becoming corrupt somehow.

The problem turned out to be that reset_changed_osds() was being
called without protection of the osd client request mutex.  That
function would call __reset_osd() for any osd that had changed, and
__reset_osd() would call __remove_osd() for any osd with no
outstanding requests, and finally __remove_osd() would remove the
corresponding entry from the red-black tree.  Thus, the tree was
getting modified without having any lock protection, and was
vulnerable to problems due to concurrent updates.

This appears to be the only osd tree updating path that has this
problem.  It can be fairly easily fixed by moving the call up
a few lines, to just before the request mutex gets dropped
in kick_requests().

This resolves:
    http://tracker.ceph.com/issues/5043

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

14d2f38d

14 5月, 2013 13 次提交

rbd: re-submit flattened write request (part 2) · 638f5abe

由 Alex Elder 提交于 5月 06, 2013

Add code to rbd_img_obj_exists_callback() to detect when a clone's
parent image has disappeared, and re-submit the original write
request in that case.

Kill off some redundant assertions.

This completes the resolution for:
    http://tracker.ceph.com/issues/3763Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

638f5abe

rbd: re-submit write request for flattened clone · bbea1c1a

由 Alex Elder 提交于 5月 06, 2013

Add code to rbd_img_parent_read_full_callback() to detect when a
clone's parent image has disappeared, and re-submit the original
write request in that case.  (See the previous commit for more
reasoning about why this is appropriate.)

Rename some variables in rbd_img_obj_parent_read_full_callback()
to match the convention used in the previous patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

bbea1c1a

rbd: re-submit read request for flattened clone · 02c74fba

由 Alex Elder 提交于 5月 06, 2013

If a clone image gets flattened while a parent read request is
underway, the original rbd object request needs to be resubmitted.

The reason is that by the time we get the response to the parent
read request, the data read from the parent may be out of date.
In other words, we could see this sequence of events:

    rbd client                      parent image/osd
    ----------                      ----------------
    original object ENOENT;
        issue parent read
                                    respond to parent read
                                    child image flattened
    original image header refresh
             <--- original object written independently here
    parent read response received

Add code to rbd_img_parent_read_callback() to detect when a clone's
parent image has disappeared (as evidenced by its parent overlap
becoming 0), and re-submit the original read request in that case.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

02c74fba

rbd: detect when clone image is flattened · 392a9dad

由 Alex Elder 提交于 5月 06, 2013

A format 2 clone image can be the subject of a "flatten" operation,
during which all of its data gets "copied up" from its parent image,
leaving the image fully populated.  Once this is complete, the
clone's association with the parent is abolished.

Since this can occur when a clone is mapped, we need to detect when
it has occurred and handle it accordingly.  We know an image has
been flattened when we know it at one time had a parent, but we have
learned (via a "get_parent" object class method call) it no longer
has one.

There might be in-flight requests at the point we learn an image has
been flattened, so we can't simply clean up parent data structures
right away.  Instead, we'll drop the initial parent reference when
the parent has disappeared (rather than when the image gets
destroyed), which will allow the last in-flight reference to clean
things up when it's complete.

We leverage the fact that a zero parent overlap renders an image
effectively unlayered.  We set the overlap to 0 at the point we
detect the clone image has flattened, which allows the unlayered
behavior to take effect immediately, while keeping other parent
structures in place until in-flight requests to complete.

This and the next few patches resolve:
    http://tracker.ceph.com/issues/3763Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

392a9dad

rbd: reference count parent requests · a2acd00e

由 Alex Elder 提交于 5月 08, 2013

Keep a reference count for uses of the parent information for an rbd
device.

An initial reference is set in rbd_img_request_create() if the
target image has a parent (with non-zero overlap).  Each image
request for an image with a non-zero parent overlap gets another
reference when it's created, and that reference is dropped when the
request is destroyed.

The initial reference is dropped when the image gets torn down.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

a2acd00e

rbd: define parent image request routines · e93f3152

由 Alex Elder 提交于 5月 08, 2013

Define rbd_parent_request_create() and rbd_parent_request_destroy()
to handle the creation of parent image requests submitted for
layered image objects.  For simplicity, let rbd_img_request_put()
handle dropping the reference to any image request (parent or not),
and call whichever destructor is appropriate on the last put.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e93f3152

rbd: define rbd_dev_unparent() · fb65d228

由 Alex Elder 提交于 5月 08, 2013

Define rbd_dev_unparent() to encapsulate cleaning up parent data
structures from a layered rbd image.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

fb65d228

rbd: don't release write request until necessary · 8785b1d4

由 Alex Elder 提交于 5月 09, 2013

Previously when a layered write was going to involve a copyup
request, the original osd request was released before submitting the
parent full-object read.  The osd request for the copyup would then
be allocated in rbd_img_obj_parent_read_full_callback().

Shortly we will be handling the event of mapped layered images
getting flattened, and when that occurs we need to resubmit the
original request.  We therefore don't want to release the osd
request until we really konw we're going to replace it--in the
callback function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

8785b1d4

rbd: get parent info on refresh · 642a2537

由 Alex Elder 提交于 5月 06, 2013

Get parent info for format 2 images on every refresh (rather than
just during the initial probe).  This will be needed to detect the
disappearance of the parent image in the event a mapped image
becomes unlayered (i.e., flattened).  Avoid leaking the previous
parent spec on the second and subsequent times this information is
requested by dropping the previous one (if any) before updating it.
(Also, extract the pool id into a local variable before assigning
it into the parent spec.)

Switch to using a non-zero parent overlap value rather than the
existence of a parent (a non-null parent_spec pointer) to determine
whether to mark a request layered.  It will soon be possible for
a layered image to become unlayered while a request is in flight.

This means that the layered flag for an image request indicates that
there was a non-zero parent overlap at the time the image request
was created.  The parent overlap can change thereafter, which may
lead to special handling at request submission or completion time.

This and the next several patches are related to:
    http://tracker.ceph.com/issues/3763

NOTE:
If an error occurs while refreshing the parent info (i.e.,
requesting it after initial probe), the old parent info will
persist.  This is not really correct, and is a scenario that needs
to be addressed.  For now we'll assert that the failure mode is
unlikely, but the issue has been documented in tracker issue 5040.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

642a2537

rbd: ignore zero-overlap parent · 70cf49cf

由 Alex Elder 提交于 5月 06, 2013

An rbd clone image that has an overlap with its parent of 0 is
effectively not a layered image at all.  Detect this case and treat
such an image as non-layered.  Issue a warning to be sure the user
knows what's going on.

This resolves:
    http://tracker.ceph.com/issues/5028Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

70cf49cf

rbd: support reading parent page data for writes · b91f09f1

由 Alex Elder 提交于 5月 10, 2013

Currently, rbd_img_obj_parent_read_full() assumes the incoming
object request contains bio data.  But if a layered image is part of
a multi-layer stack of images it will result in read requests of
page data to parent images.

This is handling the same kind of issue as was resolved by this
commit:
    5b2ab72d  rbd: support reading parent page data

This resolves:
    http://tracker.ceph.com/issues/5027Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b91f09f1

rbd: fix parent request size assumption · ebda6408

由 Alex Elder 提交于 5月 10, 2013

The code that reads object data from the parent for a copyup on
write request currently assumes that the size of that request is the
size of a "full" object from the original target image.

That is not necessarily the case.  The parent overlap could reduce
the request size below that.  To fix that assumption we need to
record the number of pages in the copyup_pages array, for both an
image request and an object request.  Rename a local variable in
rbd_img_obj_parent_read_full_callback() to reflect we're recording
the length of the parent read request, not the size of the target
object.

This resolves:
    http://tracker.ceph.com/issues/5038Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ebda6408

libceph: init sent and completed when starting · c10ebbf5

由 Alex Elder 提交于 5月 09, 2013

The rbd code has a need to be able to restart an osd request that
has already been started and completed once before.  This currently
wouldn't work right because the osd client code assumes an osd
request will be started exactly once  Certain fields in a request
are never cleared and this leads to trouble if you try to reuse it.

Specifically, the r_sent, r_got_reply, and r_completed fields are
never cleared.  The r_sent field records the osd incarnation at the
time the request was sent to that osd.  If that's non-zero, the
message won't get re-mapped to a target osd properly, and won't be
put on the unsafe requests list the first time it's sent as it
should.  The r_got_reply field is used in handle_reply() to ensure
the reply to a request is processed only once.  And the r_completed
field is used for lingering requests to avoid calling the callback
function every time the osd client re-sends the request on behalf of
its initiator.

Each osd request passes through ceph_osdc_start_request() when
responsibility for the request is handed over to the osd client for
completion.  We can safely zero these three fields there each time a
request gets started.

One last related change--clear the r_linger flag when a request
is no longer registered as a linger request.

This resolves:
    http://tracker.ceph.com/issues/5026Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c10ebbf5

09 5月, 2013 12 次提交

rbd: kill rbd_img_request_get() · c48f3f86

由 Alex Elder 提交于 5月 08, 2013

Get rid of rbd_img_request_get(), because it isn't used, and maybe
won't ever be needed.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c48f3f86

rbd: only set up watch for mapped images · 1f3ef788

由 Alex Elder 提交于 5月 06, 2013

Any changes to parent images are immaterial to any mapped clone.
So there is no need to have a watch event registered on header
objects except for the header object of an image that is mapped.
In fact, a watch request is a write operation, and we may only
have read access to a parent image.

We can't set up the watch request until we know the name of the
header object though.  So pass a flag to rbd_dev_image_probe() to
indicate whether this probe is for a mapping or for a parent image.

Change the second parameter to rbd_dev_header_watch_sync() be
Boolean while we're at it.

This resolves:
    http://tracker.ceph.com/issues/4941Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

1f3ef788

rbd: set mapping read-only flag in rbd_add() · 7ce4eef7

由 Alex Elder 提交于 5月 06, 2013

The rbd_dev->mapping field for a parent image is not meaningful.
Since rbd_image_probe() is used both for images being mapped and
their parents, it doesn't make sense to set that flag in that
function.

So move the setting of the mapping.read_only flag out of
rbd_dev_image_probe() and into rbd_add() instead.

This resolves:
    http://tracker.ceph.com/issues/4940Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

7ce4eef7

rbd: support reading parent page data · 5b2ab72d

由 Alex Elder 提交于 5月 06, 2013

Currently, rbd_img_parent_read() assumes the incoming object request
contains bio data.  But if a layered image is part of a multi-layer
stack of images it will result in read requests of page data to parent
images.

Fortunately, it's not hard to add support for page data.

This resolves:
    http://tracker.ceph.com/issues/4939Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

5b2ab72d

rbd: fix an incorrect assertion condition · 91c6febb

由 Alex Elder 提交于 5月 06, 2013

In rbd_img_obj_parent_read_full_callback() there is an assertion
intended to verify the size of the image request for a full parent
read was the size of the original request's target object.  But
assertion was looking at the parent image order rather than the
original one, and these values can differ.

Fix that.

This resolves:
    http://tracker.ceph.com/issues/4938Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

91c6febb

rbd: define rbd_dev_v2_header_info() · 2df3fac7

由 Alex Elder 提交于 5月 06, 2013

This rearranges rbd_dev_v2_refresh() so it works more like
rbd_dev_v1_header_info().  While format 1 images need to read the
whole header object to get any information, format 2 can collect
almost all information selectively.  So the one-time initialization
will remain in a separate function--based on rbd_dev_v2_probe().

Rename rbd_dev_v2_refresh() to be rbd_dev_v2_header_info(), and have
it call rbd_dev_v2_header_onetime() if it's being called for the
first time for the given rbd device.

Rename rbd_dev_v2_probe() to be rbd_dev_v2_header_onetime() and
remove the image size and snapshot context calls it held in
common with the refresh function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

2df3fac7

rbd: get rid of trivial v1 header wrappers · 99a41ebc

由 Alex Elder 提交于 5月 06, 2013

Get rid of the trivial wrapper functions rbd_dev_v1_refresh() and
rbd_dev_v1_probe(), substituting rbd_dev_v1_header_read() calls
in their place.

Rename rbd_dev_v1_header_read() to be rbd_dev_v1_header_info(), to
be more generic (it will better reflect what happens with format 2
images).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

99a41ebc

rbd: simplify rbd_dev_v1_probe() · 30d60ba2

由 Alex Elder 提交于 5月 06, 2013

An rbd_dev structure's fields are all zero-filled for an initial
probe, so there's no need to explicitly zero the parent_spec
and parent_overlap fields in rbd_dev_v1_probe().  Removing these
assignments makes rbd_dev_v1_probe() *almost* trivial.

Move the dout() message that announces discovery of an image into
rbd_dev_image_probe(), generalize to support images in either format
and only show it if an image is fully discovered.

This highlights that are some unnecessary cleanups in the error
path for rbd_dev_v1_probe(), so they can be removed.

Now rbd_dev_v1_probe() *is* a trivial wrapper function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

30d60ba2

rbd: update in-core header directly · 662518b1

由 Alex Elder 提交于 5月 06, 2013

Now that rbd_header_from_disk() only fills in one-time fields once,
we can extend it slightly so it releases the other fields before
replacing their values.  This way there's no need to pass a
temporary buffer and then copy all the results in.  Just use the rbd
device header structure in rbd_header_from_disk() so its values get
updated directly.

Note that this means we need to take the header semaphore at the
point we update things.  So pass the rbd_dev rather than the address
of its header as its first argument to rbd_header_from_disk(), and
have it return an error code.

As a result, rbd_dev_v1_header_read() does all the work,
rbd_read_header() becomes unnecessary, and rbd_dev_v1_refresh()
becomes a very simple wrapper.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

662518b1

rbd: refactor rbd_header_from_disk() · bb23e37a

由 Alex Elder 提交于 5月 06, 2013

This rearranges rbd_header_from_disk so that it:
    - allocates the snapshot context right away
    - keeps results in local variables, not changing the passed-in
      header until it's known we'll succeed
    - does initialization of set-once fields in a header only if
      they have not already been set

The last point is moot at the moment, because rbd_read_header()
(the only caller) always supplies a zero-filled header buffer.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

bb23e37a

rbd: zero format 1 header structure earlier · 46578dcd

由 Alex Elder 提交于 5月 06, 2013

The passed-in header structure is zeroed in rbd_header_from_disk().
Instead, have the caller do it.  Note that there are two callers,
rbd_dev_v1_refresh() and rbd_dev_v1_probe().  The latter already has
a zeroed header structure so zeroing it isn't necessary there.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

46578dcd

rbd: set the mapping size and features later · f35a4dee

由 Alex Elder 提交于 5月 06, 2013

Defer setting the size and features fields of a mapped image until
after the Linux disk structure is set up.  Set the capacity of the
disk after that.

Rearrange the definition of rbd_image_header, separating the fields
that are set only once from those that can be updated.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f35a4dee

08 5月, 2013 7 次提交

rbd: always set read-only flag in rbd_add() · 51344a38

由 Alex Elder 提交于 5月 06, 2013

Hold off setting the read-only flag in rbd_add() for an image being
mapped until we have successfully probed the image. At that point
we know whether it's a snapshot mapping or not, so we can set the
read-only flag in that one place rather than doing so (for
snapshots) in rbd_dev_mapping_set(). To do this, pass a flag to the
image probe routine indicating whether we want a read-only mapping.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

51344a38

rbd: kill rbd_dev_clear_mapping() · 6d80b130

由 Alex Elder 提交于 5月 06, 2013

This function is a duplicate of rbd_dev_mapping_clear(), and was
added by mistake.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6d80b130

rbd: don't look up snapshot id in rbd_dev_mapping_set() · 8f4b7d98

由 Alex Elder 提交于 5月 06, 2013

Currently rbd_dev_mapping_set() looks up the snapshot id for the
snapshot whose name is found in the rbd device's spec structure.

That function gets called by rbd_dev_device_setup(), which is
called by rbd_add() *after* rbd_dev_image_probe().  If the
image probe succeeds, the rbd device's spec will already have
been updated to include names and ids for all fields.

Therefore there's no need to look up the snapshot id in
rbd_dev_mapping_set().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

8f4b7d98

rbd: don't print warning if not mapping a parent · c734b796

由 Alex Elder 提交于 5月 06, 2013

The presence of the LAYERING bit in an rbd image's feature mask does
not guarantee the image actually has a parent image.  Currently that
bit is set only when a clone (i.e., image with a parent) is created,
but it is (currently) not cleared if that clone gets flattened back
into a "normal" image.  A "parent_id" query will leave the
parent_spec for the image being mapped a null pointer, but will not
return an error.

Currently, whenever an image with the LAYERED feature gets mapped, a
warning about the use of layered images gets printed.  But we don't
want to do this for a flattened image, so print the warning only
if we find there is a parent spec after the probe.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c734b796

rbd: kill rbd_update_mapping_size() · 29334ba4

由 Alex Elder 提交于 5月 06, 2013

Since rbd_update_mapping_size() is now a trivial wrapper, just open
code it in its two callers.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

29334ba4

rbd: update capacity in rbd_dev_refresh() · 00a653e2

由 Alex Elder 提交于 5月 06, 2013

When a mapped image changes size, we change the capacity recorded
for the Linux disk associated with it, in rbd_update_mapping_size().
That function is called in two places--the format 1 and format 2
refresh routines.

There is no need to set the capacity while holding the header
semaphore.  Instead, do it in the common rbd_dev_refresh(), using
the logic that's already there to initiate disk revalidation.

Add handling in the request function, just in case a request
that exceeds the capacity of the device comes in (perhaps one
that was started before a refresh shrunk the device).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

00a653e2

rbd: revalidate only for mapping size changes · e627db08

由 Alex Elder 提交于 5月 06, 2013

This commit:
    d98df63e rbd: revalidate_disk upon rbd resize
instituted a call to revalidate_disk() to notify interested parties
that a mapped image has changed size.  This works well, as long as
the the rbd device doesn't map a snapshot.

A snapshot will never change size.  However, the base image the
snapshot is associated with can, and it can do so while the snapshot
is mapped.

The problem is that the test for the size is looking at the size of
the base image, not the size of the mapped snapshot.  This patch
corrects that.

Update the warning message shown in the event of error, and move
it into the callers.

This resolves:
    http://tracker.ceph.com/issues/4911Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e627db08

openeuler / Kernel 12 个月 前同步成功

openeuler / Kernel
12 个月前同步成功