提交 · 6e8575faa8fa680d59404a4d58d12190667be815 · openeuler / raspberrypi-kernel

28 12月, 2012 2 次提交

libceph: always reset osds when kicking · e6d50f67

由 Alex Elder 提交于 12月 26, 2012

When ceph_osdc_handle_map() is called to process a new osd map,
kick_requests() is called to ensure all affected requests are
updated if necessary to reflect changes in the osd map.  This
happens in two cases:  whenever an incremental map update is
processed; and when a full map update (or the last one if there is
more than one) gets processed.

In the former case, the kick_requests() call is followed immediately
by a call to reset_changed_osds() to ensure any connections to osds
affected by the map change are reset.  But for full map updates
this isn't done.

Both cases should be doing this osd reset.

Rather than duplicating the reset_changed_osds() call, move it into
the end of kick_requests().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e6d50f67

libceph: move linger requests sooner in kick_requests() · ab60b16d

由 Alex Elder 提交于 12月 19, 2012

The kick_requests() function is called by ceph_osdc_handle_map()
when an osd map change has been indicated.  Its purpose is to
re-queue any request whose target osd is different from what it
was when it was originally sent.

It is structured as two loops, one for incomplete but registered
requests, and a second for handling completed linger requests.
As a special case, in the first loop if a request marked to linger
has not yet completed, it is moved from the request list to the
linger list.  This is as a quick and dirty way to have the second
loop handle sending the request along with all the other linger
requests.

Because of the way it's done now, however, this quick and dirty
solution can result in these incomplete linger requests never
getting re-sent as desired.  The problem lies in the fact that
the second loop only arranges for a linger request to be sent
if it appears its target osd has changed.  This is the proper
handling for *completed* linger requests (it avoids issuing
the same linger request twice to the same osd).

But although the linger requests added to the list in the first loop
may have been sent, they have not yet completed, so they need to be
re-sent regardless of whether their target osd has changed.

The first required fix is we need to avoid calling __map_request()
on any incomplete linger request.  Otherwise the subsequent
__map_request() call in the second loop will find the target osd
has not changed and will therefore not re-send the request.

Second, we need to be sure that a sent but incomplete linger request
gets re-sent.  If the target osd is the same with the new osd map as
it was when the request was originally sent, this won't happen.
This can be fixed through careful handling when we move these
requests from the request list to the linger list, by unregistering
the request *before* it is registered as a linger request.  This
works because a side-effect of unregistering the request is to make
the request's r_osd pointer be NULL, and *that* will ensure the
second loop actually re-sends the linger request.

Processing of such a request is done at that point, so continue with
the next one once it's been moved.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab60b16d

21 12月, 2012 4 次提交

libceph: register request before unregister linger · c89ce05e

由 Alex Elder 提交于 12月 06, 2012

In kick_requests(), we need to register the request before we
unregister the linger request.  Otherwise the unregister will
reset the request's osd pointer to NULL.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c89ce05e

libceph: don't use rb_init_node() in ceph_osdc_alloc_request() · a978fa20

由 Alex Elder 提交于 12月 17, 2012

The red-black node in the ceph osd request structure is initialized
in ceph_osdc_alloc_request() using rbd_init_node().  We do need to
initialize this, because in __unregister_request() we call
RB_EMPTY_NODE(), which expects the node it's checking to have
been initialized.  But rb_init_node() is apparently overkill, and
may in fact be on its way out.  So use RB_CLEAR_NODE() instead.

For a little more background, see this commit:
    4c199a93 rbtree: empty nodes have no color"
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a978fa20

libceph: init event->node in ceph_osdc_create_event() · 3ee5234d

由 Alex Elder 提交于 12月 17, 2012

The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event().  Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3ee5234d

libceph: init osd->o_node in create_osd() · f407731d

由 Alex Elder 提交于 12月 06, 2012

The red-black node node in the ceph osd structure is not initialized
in create_osd().  Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that.  Add a call to RB_CLEAR_NODE()
initialize it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f407731d

18 12月, 2012 1 次提交

rbd: remove linger unconditionally · 61c74035

由 Alex Elder 提交于 12月 06, 2012

In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

61c74035

17 12月, 2012 2 次提交

libceph: avoid using freed osd in __kick_osd_requests() · 685a7555

由 Alex Elder 提交于 12月 07, 2012

If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

685a7555

ceph: don't reference req after put · 7d5f2481

由 Alex Elder 提交于 11月 29, 2012

In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-off-by: NSage Weil <sage@inktank.com>

7d5f2481

13 12月, 2012 1 次提交

libceph: remove 'osdtimeout' option · 83aff95e

由 Sage Weil 提交于 11月 28, 2012

This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

83aff95e

02 10月, 2012 2 次提交

ceph: propagate layout error on osd request creation · 6816282d

由 Sage Weil 提交于 9月 24, 2012

If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6816282d

libceph: check for invalid mapping · d63b77f4

由 Sage Weil 提交于 9月 24, 2012

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

d63b77f4

31 7月, 2012 5 次提交

libceph: be less chatty about stray replies · 756a16a5

由 Sage Weil 提交于 7月 30, 2012

There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts.  There's no need to spam the console about it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

756a16a5

libceph: trivial fix for the incorrect debug output · 048a9d2d

由 Jiaju Zhang 提交于 7月 20, 2012

This is a trivial fix for the debug output, as it is inconsistent
with the function name so may confuse people when debugging.

[elder@inktank.com: switched to use __func__]
Signed-off-by: NJiaju Zhang <jjzhang@suse.de>
Reviewed-by: NAlex Elder <elder@inktank.com>

048a9d2d

libceph: resubmit linger ops when pg mapping changes · 6194ea89

由 Sage Weil 提交于 7月 30, 2012

The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6194ea89

libceph: initialize rb, list nodes in ceph_osd_request · cd43045c

由 Sage Weil 提交于 7月 09, 2012

These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

cd43045c

libceph: initialize msgpool message types · d50b409f

由 Sage Weil 提交于 7月 09, 2012

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

d50b409f

06 7月, 2012 1 次提交

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

20 6月, 2012 2 次提交

libceph: use con get/put ops from osd_client · 88ed6ea0

由 Sage Weil 提交于 5月 31, 2012

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>
(cherry picked from commit 0d47766f)

88ed6ea0

libceph: osd_client: don't drop reply reference too early · 680584fa

由 Alex Elder 提交于 6月 04, 2012

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
(cherry picked from commit ab8cb34a)

680584fa

06 6月, 2012 6 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

libceph: make ceph_con_revoke() a msg operation · 6740a845

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6740a845

libceph: tweak ceph_alloc_msg() · 1c20f2d2

由 Alex Elder 提交于 6月 04, 2012

The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer.  Rename the
function so this implied usage is more clear.

In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for).  This allows us
to return what is now provided via the passed-in address of a "skip"
variable.  The return type is now Boolean to be explicit that there
are only two possible outcomes.

Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1c20f2d2

libceph: fully initialize connection in con_init() · 1bfd89f4

由 Alex Elder 提交于 5月 26, 2012

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1bfd89f4

libceph: use con get/put ops from osd_client · 0d47766f

由 Sage Weil 提交于 5月 31, 2012

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>

0d47766f

libceph: osd_client: don't drop reply reference too early · ab8cb34a

由 Alex Elder 提交于 6月 04, 2012

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab8cb34a

01 6月, 2012 2 次提交

libceph: provide osd number when creating osd · e10006f8

由 Alex Elder 提交于 5月 26, 2012

Pass the osd number to the create_osd() routine, and move the
initialization of fields that depend on it therein.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e10006f8

libceph: embed ceph messenger structure in ceph_client · 15d9882c

由 Alex Elder 提交于 5月 26, 2012

A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

15d9882c

19 5月, 2012 1 次提交

libceph: avoid unregistering osd request when not registered · 35f9f8a0

由 Sage Weil 提交于 5月 16, 2012

There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request().  If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.

Simply check if the request is still already unregistered; if so,
return immediately and do nothing.

Fixes http://tracker.newdream.net/issues/2420Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

35f9f8a0

17 5月, 2012 5 次提交

ceph: use info returned by get_authorizer · 8f43fb53

由 Alex Elder 提交于 5月 16, 2012

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8f43fb53

ceph: have get_authorizer methods return pointers · a3530df3

由 Alex Elder 提交于 5月 16, 2012

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a3530df3

ceph: ensure auth ops are defined before use · a255651d

由 Alex Elder 提交于 5月 16, 2012

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a255651d

ceph: messenger: reduce args to create_authorizer · 74f1869f

由 Alex Elder 提交于 5月 16, 2012

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

74f1869f

ceph: define ceph_auth_handshake type · 6c4a1915

由 Alex Elder 提交于 5月 16, 2012

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6c4a1915

15 5月, 2012 1 次提交

ceph: osd_client: fix endianness bug in osd_req_encode_op() · 065a68f9

由 Alex Elder 提交于 4月 20, 2012

From Al Viro <viro@zeniv.linux.org.uk>

Al Viro noticed that we were using a non-cpu-encoded value in
a switch statement in osd_req_encode_op().  The result would
clearly not work correctly on a big-endian machine.
Signed-off-by: NAlex Elder <elder@dreamhost.com>

065a68f9

16 4月, 2012 1 次提交

net: cleanup unsigned to unsigned int · 95c96174

由 Eric Dumazet 提交于 4月 15, 2012

Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c96174

11 1月, 2012 1 次提交
- S
  libceph: remove useless return value for osd_client __send_request() · 56e925b6
  由 Sage Weil 提交于 1月 03, 2012
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  56e925b6
12 11月, 2011 1 次提交

libceph: Allocate larger oid buffer in request msgs · 224736d9

由 Stratos Psomadakis 提交于 11月 10, 2011

ceph_osd_request struct allocates a 40-byte buffer for object names.
RBD image names can be up to 96 chars long (100 with the .rbd suffix),
which results in the object name for the image being truncated, and a
subsequent map failure.

Increase the oid buffer in request messages, in order to avoid the
truncation.
Signed-off-by: NStratos Psomadakis <psomas@grnet.gr>
Signed-off-by: NSage Weil <sage@newdream.net>

224736d9

26 10月, 2011 2 次提交

libceph: force resend of osd requests if we skip an osdmap · 38d6453c

由 Sage Weil 提交于 10月 14, 2011

If we skip over one or more map epochs, we need to resend all osd requests
because it is possible they remapped to other servers and then back.
Signed-off-by: NSage Weil <sage@newdream.net>

38d6453c

libceph: don't complain on msgpool alloc failures · b61c2763

由 Sage Weil 提交于 8月 09, 2011

The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.
Signed-off-by: NSage Weil <sage@newdream.net>

b61c2763