提交 · e8afad656cbcd06d02a7bacd4b318fa0e2907de0 · openeuler / Kernel

18 1月, 2013 7 次提交

libceph: pass length to ceph_calc_file_object_mapping() · e8afad65

由 Alex Elder 提交于 11月 14, 2012

ceph_calc_file_object_mapping() takes (among other things) a "file"
offset and length, and based on the layout, determines the object
number ("bno") backing the affected portion of the file's data and
the offset into that object where the desired range begins.  It also
computes the size that should be used for the request--either the
amount requested or something less if that would exceed the end of
the object.

This patch changes the input length parameter in this function so it
is used only for input.  That is, the argument will be passed by
value rather than by address, so the value provided won't get
updated by the function.

The value would only get updated if the length would surpass the
current object, and in that case the value it got updated to would
be exactly that returned in *oxlen.

Only one of the two callers is affected by this change.  Update
ceph_calc_raw_layout() so it records any updated value.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e8afad65

libceph: pass length to ceph_osdc_build_request() · 0120be3c

由 Alex Elder 提交于 11月 14, 2012

The len argument to ceph_osdc_build_request() is set up to be
passed by address, but that function never updates its value
so there's no need to do this.  Tighten up the interface by
passing the length directly.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0120be3c

libceph: kill op_needs_trail() · 5b9d1b1c

由 Alex Elder 提交于 11月 13, 2012

Since every osd message is now prepared to include trailing data,
there's no need to check ahead of time whether any operations will
make use of the trail portion of the message.

We can drop the second argument to get_num_ops(), and as a result we
can also get rid of op_needs_trail() which is no longer used.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

5b9d1b1c

libceph: always allow trail in osd request · c885837f

由 Alex Elder 提交于 11月 13, 2012

An osd request structure contains an optional trail portion, which
if present will contain data to be passed in the payload portion of
the message containing the request.  The trail field is a
ceph_pagelist pointer, and if null it indicates there is no trail.

A ceph_pagelist structure contains a length field, and it can
legitimately hold value 0.  Make use of this to change the
interpretation of the "trail" of an osd request so that every osd
request has trailing data, it just might have length 0.

This means we change the r_trail field in a ceph_osd_request
structure from a pointer to a structure that is always initialized.

Note that in ceph_osdc_start_request(), the trail pointer (or now
address of that structure) is assigned to a ceph message's trail
field.  Here's why that's still OK (looking at net/ceph/messenger.c):
    - What would have resulted in a null pointer previously will now
      refer to a 0-length page list.  That message trail pointer
      is used in two functions, write_partial_msg_pages() and
      out_msg_pos_next().
    - In write_partial_msg_pages(), a null page list pointer is
      handled the same as a message with 0-length trail, and both
      result in a "in_trail" variable set to false.  The trail
      pointer is only used if in_trail is true.
    - The only other place the message trail pointer is used is
      out_msg_pos_next().  That function is only called by
      write_partial_msg_pages() and only touches the trail pointer
      if the in_trail value it is passed is true.
Therefore a null ceph_msg->trail pointer is equivalent to a non-null
pointer referring to a 0-length page list structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c885837f

rbd: drop oid parameters from ceph_osdc_build_request() · af77f26c

由 Alex Elder 提交于 11月 09, 2012

The last two parameters to ceph_osd_build_request() describe the
object id, but the values passed always come from the osd request
structure whose address is also provided.  Get rid of those last
two parameters.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

af77f26c

libceph: reformat __reset_osd() · c3acb181

由 Alex Elder 提交于 12月 07, 2012

Reformat __reset_osd() into three distinct blocks of code
handling the three return cases.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c3acb181

ceph: re-calculate truncate_size for strip object · a41bad1a

由 Yan, Zheng 提交于 11月 30, 2012

Otherwise osd may truncate the object to larger size.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a41bad1a

28 12月, 2012 2 次提交

libceph: always reset osds when kicking · e6d50f67

由 Alex Elder 提交于 12月 26, 2012

When ceph_osdc_handle_map() is called to process a new osd map,
kick_requests() is called to ensure all affected requests are
updated if necessary to reflect changes in the osd map.  This
happens in two cases:  whenever an incremental map update is
processed; and when a full map update (or the last one if there is
more than one) gets processed.

In the former case, the kick_requests() call is followed immediately
by a call to reset_changed_osds() to ensure any connections to osds
affected by the map change are reset.  But for full map updates
this isn't done.

Both cases should be doing this osd reset.

Rather than duplicating the reset_changed_osds() call, move it into
the end of kick_requests().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e6d50f67

libceph: move linger requests sooner in kick_requests() · ab60b16d

由 Alex Elder 提交于 12月 19, 2012

The kick_requests() function is called by ceph_osdc_handle_map()
when an osd map change has been indicated.  Its purpose is to
re-queue any request whose target osd is different from what it
was when it was originally sent.

It is structured as two loops, one for incomplete but registered
requests, and a second for handling completed linger requests.
As a special case, in the first loop if a request marked to linger
has not yet completed, it is moved from the request list to the
linger list.  This is as a quick and dirty way to have the second
loop handle sending the request along with all the other linger
requests.

Because of the way it's done now, however, this quick and dirty
solution can result in these incomplete linger requests never
getting re-sent as desired.  The problem lies in the fact that
the second loop only arranges for a linger request to be sent
if it appears its target osd has changed.  This is the proper
handling for *completed* linger requests (it avoids issuing
the same linger request twice to the same osd).

But although the linger requests added to the list in the first loop
may have been sent, they have not yet completed, so they need to be
re-sent regardless of whether their target osd has changed.

The first required fix is we need to avoid calling __map_request()
on any incomplete linger request.  Otherwise the subsequent
__map_request() call in the second loop will find the target osd
has not changed and will therefore not re-send the request.

Second, we need to be sure that a sent but incomplete linger request
gets re-sent.  If the target osd is the same with the new osd map as
it was when the request was originally sent, this won't happen.
This can be fixed through careful handling when we move these
requests from the request list to the linger list, by unregistering
the request *before* it is registered as a linger request.  This
works because a side-effect of unregistering the request is to make
the request's r_osd pointer be NULL, and *that* will ensure the
second loop actually re-sends the linger request.

Processing of such a request is done at that point, so continue with
the next one once it's been moved.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab60b16d

21 12月, 2012 4 次提交

libceph: register request before unregister linger · c89ce05e

由 Alex Elder 提交于 12月 06, 2012

In kick_requests(), we need to register the request before we
unregister the linger request.  Otherwise the unregister will
reset the request's osd pointer to NULL.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c89ce05e

libceph: don't use rb_init_node() in ceph_osdc_alloc_request() · a978fa20

由 Alex Elder 提交于 12月 17, 2012

The red-black node in the ceph osd request structure is initialized
in ceph_osdc_alloc_request() using rbd_init_node().  We do need to
initialize this, because in __unregister_request() we call
RB_EMPTY_NODE(), which expects the node it's checking to have
been initialized.  But rb_init_node() is apparently overkill, and
may in fact be on its way out.  So use RB_CLEAR_NODE() instead.

For a little more background, see this commit:
    4c199a93 rbtree: empty nodes have no color"
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a978fa20

libceph: init event->node in ceph_osdc_create_event() · 3ee5234d

由 Alex Elder 提交于 12月 17, 2012

The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event().  Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3ee5234d

libceph: init osd->o_node in create_osd() · f407731d

由 Alex Elder 提交于 12月 06, 2012

The red-black node node in the ceph osd structure is not initialized
in create_osd().  Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that.  Add a call to RB_CLEAR_NODE()
initialize it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f407731d

18 12月, 2012 1 次提交

rbd: remove linger unconditionally · 61c74035

由 Alex Elder 提交于 12月 06, 2012

In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

61c74035

17 12月, 2012 2 次提交

libceph: avoid using freed osd in __kick_osd_requests() · 685a7555

由 Alex Elder 提交于 12月 07, 2012

If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

685a7555

ceph: don't reference req after put · 7d5f2481

由 Alex Elder 提交于 11月 29, 2012

In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-off-by: NSage Weil <sage@inktank.com>

7d5f2481

13 12月, 2012 1 次提交

libceph: remove 'osdtimeout' option · 83aff95e

由 Sage Weil 提交于 11月 28, 2012

This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

83aff95e

02 10月, 2012 2 次提交

ceph: propagate layout error on osd request creation · 6816282d

由 Sage Weil 提交于 9月 24, 2012

If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6816282d

libceph: check for invalid mapping · d63b77f4

由 Sage Weil 提交于 9月 24, 2012

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

d63b77f4

31 7月, 2012 5 次提交

libceph: be less chatty about stray replies · 756a16a5

由 Sage Weil 提交于 7月 30, 2012

There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts.  There's no need to spam the console about it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

756a16a5

libceph: trivial fix for the incorrect debug output · 048a9d2d

由 Jiaju Zhang 提交于 7月 20, 2012

This is a trivial fix for the debug output, as it is inconsistent
with the function name so may confuse people when debugging.

[elder@inktank.com: switched to use __func__]
Signed-off-by: NJiaju Zhang <jjzhang@suse.de>
Reviewed-by: NAlex Elder <elder@inktank.com>

048a9d2d

libceph: resubmit linger ops when pg mapping changes · 6194ea89

由 Sage Weil 提交于 7月 30, 2012

The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6194ea89

libceph: initialize rb, list nodes in ceph_osd_request · cd43045c

由 Sage Weil 提交于 7月 09, 2012

These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

cd43045c

libceph: initialize msgpool message types · d50b409f

由 Sage Weil 提交于 7月 09, 2012

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

d50b409f

06 7月, 2012 1 次提交

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

20 6月, 2012 2 次提交

libceph: use con get/put ops from osd_client · 88ed6ea0

由 Sage Weil 提交于 5月 31, 2012

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>
(cherry picked from commit 0d47766f)

88ed6ea0

libceph: osd_client: don't drop reply reference too early · 680584fa

由 Alex Elder 提交于 6月 04, 2012

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
(cherry picked from commit ab8cb34a)

680584fa

06 6月, 2012 6 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

libceph: make ceph_con_revoke() a msg operation · 6740a845

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6740a845

libceph: tweak ceph_alloc_msg() · 1c20f2d2

由 Alex Elder 提交于 6月 04, 2012

The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer.  Rename the
function so this implied usage is more clear.

In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for).  This allows us
to return what is now provided via the passed-in address of a "skip"
variable.  The return type is now Boolean to be explicit that there
are only two possible outcomes.

Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1c20f2d2

libceph: fully initialize connection in con_init() · 1bfd89f4

由 Alex Elder 提交于 5月 26, 2012

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1bfd89f4

libceph: use con get/put ops from osd_client · 0d47766f

由 Sage Weil 提交于 5月 31, 2012

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>

0d47766f

libceph: osd_client: don't drop reply reference too early · ab8cb34a

由 Alex Elder 提交于 6月 04, 2012

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab8cb34a

01 6月, 2012 2 次提交

libceph: provide osd number when creating osd · e10006f8

由 Alex Elder 提交于 5月 26, 2012

Pass the osd number to the create_osd() routine, and move the
initialization of fields that depend on it therein.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e10006f8

libceph: embed ceph messenger structure in ceph_client · 15d9882c

由 Alex Elder 提交于 5月 26, 2012

A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

15d9882c

19 5月, 2012 1 次提交

libceph: avoid unregistering osd request when not registered · 35f9f8a0

由 Sage Weil 提交于 5月 16, 2012

There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request().  If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.

Simply check if the request is still already unregistered; if so,
return immediately and do nothing.

Fixes http://tracker.newdream.net/issues/2420Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

35f9f8a0

17 5月, 2012 4 次提交

ceph: use info returned by get_authorizer · 8f43fb53

由 Alex Elder 提交于 5月 16, 2012

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8f43fb53

ceph: have get_authorizer methods return pointers · a3530df3

由 Alex Elder 提交于 5月 16, 2012

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a3530df3

ceph: ensure auth ops are defined before use · a255651d

由 Alex Elder 提交于 5月 16, 2012

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a255651d

ceph: messenger: reduce args to create_authorizer · 74f1869f

由 Alex Elder 提交于 5月 16, 2012

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

74f1869f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功