提交 · 89a86be0ce20022f6ede8bccec078dbb3d63caaa · openanolis / cloud-kernel

16 6月, 2012 1 次提交

libceph: transition socket state prior to actual connect · 89a86be0

由 Sage Weil 提交于 6月 09, 2012

Once we call ->connect(), we are racing against the actual
connection, and a subsequent transition from CONNECTING ->
CONNECTED.  Set the state to CONNECTING before that, under the
protection of the mutex, to avoid the race.

This was introduced in 928443cd,
with the original socket state code.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

89a86be0

07 6月, 2012 4 次提交

libceph: fix overflow in osdmap_apply_incremental() · a5506049

由 Xi Wang 提交于 6月 06, 2012

On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It would also overflow the subsequent kmalloc() size, leading to
out-of-bounds write.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

a5506049

libceph: fix overflow in osdmap_decode() · e91a9b63

由 Xi Wang 提交于 6月 06, 2012

On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass
the check ceph_decode_need(p, end, n * sizeof(u32), bad).  It would also
overflow the subsequent kmalloc() size, leading to out-of-bounds write.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

e91a9b63

libceph: fix overflow in __decode_pool_names() · ad3b904c

由 Xi Wang 提交于 6月 06, 2012

`len' is read from network and thus needs validation.  Otherwise a
large `len' would cause out-of-bounds access via the memcpy() call.
In addition, len = 0xffffffff would overflow the kmalloc() size,
leading to out-of-bounds write.

This patch adds a check of `len' via ceph_decode_need().  Also use
kstrndup rather than kmalloc/memcpy.

[elder@inktank.com: added -ENOMEM return for null kstrndup() result]
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

ad3b904c

rbd: Clear ceph_msg->bio_iter for retransmitted message · 43643528

由 Yan, Zheng 提交于 6月 06, 2012

The bug can cause NULL pointer dereference in write_partial_msg_pages
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

43643528

06 6月, 2012 13 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

libceph: make ceph_con_revoke() a msg operation · 6740a845

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6740a845

libceph: have messages take a connection reference · 92ce034b

由 Alex Elder 提交于 6月 04, 2012

There are essentially two types of ceph messages: incoming and
outgoing.  Outgoing messages are always allocated via ceph_msg_new(),
and at the time of their allocation they are not associated with any
particular connection.  Incoming messages are always allocated via
ceph_con_in_msg_alloc(), and they are initially associated with the
connection from which incoming data will be placed into the message.

When an outgoing message gets sent, it becomes associated with a
connection and remains that way until the message is successfully
sent.  The association of an incoming message goes away at the point
it is sent to an upper layer via a con->ops->dispatch method.

This patch implements reference counting for all ceph messages, such
that every message holds a reference (and a pointer) to a connection
if and only if it is associated with that connection (as described
above).


For background, here is an explanation of the ceph message
lifecycle, emphasizing when an association exists between a message
and a connection.

Outgoing Messages
An outgoing message is "owned" by its allocator, from the time it is
allocated in ceph_msg_new() up to the point it gets queued for
sending in ceph_con_send().  Prior to that point the message's
msg->con pointer is null; at the point it is queued for sending its
message pointer is assigned to refer to the connection.  At that
time the message is inserted into a connection's out_queue list.

When a message on the out_queue list has been sent to the socket
layer to be put on the wire, it is transferred out of that list and
into the connection's out_sent list.  At that point it is still owned
by the connection, and will remain so until an acknowledgement is
received from the recipient that indicates the message was
successfully transferred.  When such an acknowledgement is received
(in process_ack()), the message is removed from its list (in
ceph_msg_remove()), at which point it is no longer associated with
the connection.

So basically, any time a message is on one of a connection's lists,
it is associated with that connection.  Reference counting outgoing
messages can thus be done at the points a message is added to the
out_queue (in ceph_con_send()) and the point it is removed from
either its two lists (in ceph_msg_remove())--at which point its
connection pointer becomes null.

Incoming Messages
When an incoming message on a connection is getting read (in
read_partial_message()) and there is no message in con->in_msg,
a new one is allocated using ceph_con_in_msg_alloc().  At that
point the message is associated with the connection.  Once that
message has been completely and successfully read, it is passed to
upper layer code using the connection's con->ops->dispatch method.
At that point the association between the message and the connection
no longer exists.

Reference counting of connections for incoming messages can be done
by taking a reference to the connection when the message gets
allocated, and releasing that reference when it gets handed off
using the dispatch method.

We should never fail to get a connection reference for a
message--the since the caller should already hold one.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

92ce034b

libceph: have messages point to their connection · 38941f80

由 Alex Elder 提交于 6月 01, 2012

When a ceph message is queued for sending it is placed on a list of
pending messages (ceph_connection->out_queue).  When they are
actually sent over the wire, they are moved from that list to
another (ceph_connection->out_sent).  When acknowledgement for the
message is received, it is removed from the sent messages list.

During that entire time the message is "in the possession" of a
single ceph connection.  Keep track of that connection in the
message.  This will be used in the next patch (and is a helpful
bit of information for debugging anyway).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

38941f80

libceph: tweak ceph_alloc_msg() · 1c20f2d2

由 Alex Elder 提交于 6月 04, 2012

The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer.  Rename the
function so this implied usage is more clear.

In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for).  This allows us
to return what is now provided via the passed-in address of a "skip"
variable.  The return type is now Boolean to be explicit that there
are only two possible outcomes.

Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1c20f2d2

libceph: fully initialize connection in con_init() · 1bfd89f4

由 Alex Elder 提交于 5月 26, 2012

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1bfd89f4

libceph: init monitor connection when opening · 20581c1f

由 Alex Elder 提交于 5月 26, 2012

Hold off initializing a monitor client's connection until just
before it gets opened for use.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

20581c1f

libceph: drop connection refcounting for mon_client · ec87ef43

由 Sage Weil 提交于 5月 31, 2012

All references to the embedded ceph_connection come from the msgr
workqueue, which is drained prior to mon_client destruction.  That
means we can ignore con refcounting entirely.
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>

ec87ef43

libceph: embed ceph connection structure in mon_client · 67130934

由 Alex Elder 提交于 5月 26, 2012

A monitor client has a pointer to a ceph connection structure in it.
This is the only one of the three ceph client types that do it this
way; the OSD and MDS clients embed the connection into their main
structures.  There is always exactly one ceph connection for a
monitor client, so there is no need to allocate it separate from the
monitor client structure.

So switch the ceph_mon_client structure to embed its
ceph_connection structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

67130934

libceph: use con get/put ops from osd_client · 0d47766f

由 Sage Weil 提交于 5月 31, 2012

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NAlex Elder <elder@inktank.com>

0d47766f

libceph: osd_client: don't drop reply reference too early · ab8cb34a

由 Alex Elder 提交于 6月 04, 2012

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab8cb34a

rbd: endian bug in rbd_req_cb() · 895cfcc8

由 Dan Carpenter 提交于 6月 06, 2012

Sparse complains about this because:
drivers/block/rbd.c:996:20: warning: cast to restricted __le32
drivers/block/rbd.c:996:20: warning: cast from restricted __le16

These are set in osd_req_encode_op() and they are le16.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

895cfcc8

rbd: Fix ceph_snap_context size calculation · f9f9a190

由 Yan, Zheng 提交于 6月 06, 2012

ceph_snap_context->snaps is an u64 array
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

f9f9a190

01 6月, 2012 10 次提交

libceph: set CLOSED state bit in con_init · a5988c49

由 Alex Elder 提交于 5月 29, 2012

Once a connection is fully initialized, it is really in a CLOSED
state, so make that explicit by setting the bit in its state field.

It is possible for a connection in NEGOTIATING state to get a
failure, leading to ceph_fault() and ultimately ceph_con_close().
Clear that bits if it is set in that case, to reflect that the
connection truly is closed and is no longer participating in a
connect sequence.

Issue a warning if ceph_con_open() is called on a connection that
is not in CLOSED state.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a5988c49

libceph: provide osd number when creating osd · e10006f8

由 Alex Elder 提交于 5月 26, 2012

Pass the osd number to the create_osd() routine, and move the
initialization of fields that depend on it therein.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e10006f8

libceph: start tracking connection socket state · ce2c8903

由 Alex Elder 提交于 5月 22, 2012

Start explicitly keeping track of the state of a ceph connection's
socket, separate from the state of the connection itself.  Create
placeholder functions to encapsulate the state transitions.

    --------
    | NEW* |  transient initial state
    --------
        | con_sock_state_init()
        v
    ----------
    | CLOSED |  initialized, but no socket (and no
    ----------  TCP connection)
     ^      \
     |       \ con_sock_state_connecting()
     |        ----------------------
     |                              \
     + con_sock_state_closed()       \
     |\                               \
     | \                               \
     |  -----------                     \
     |  | CLOSING |  socket event;       \
     |  -----------  await close          \
     |       ^                            |
     |       |                            |
     |       + con_sock_state_closing()   |
     |      / \                           |
     |     /   ---------------            |
     |    /                   \           v
     |   /                    --------------
     |  /    -----------------| CONNECTING |  socket created, TCP
     |  |   /                 --------------  connect initiated
     |  |   | con_sock_state_connected()
     |  |   v
    -------------
    | CONNECTED |  TCP connection established
    -------------

Make the socket state an atomic variable, reinforcing that it's a
distinct transtion with no possible "intermediate/both" states.
This is almost certainly overkill at this point, though the
transitions into CONNECTED and CLOSING state do get called via
socket callback (the rest of the transitions occur with the
connection mutex held).  We can back out the atomicity later.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>

ce2c8903

libceph: start separating connection flags from state · 928443cd

由 Alex Elder 提交于 5月 22, 2012

A ceph_connection holds a mixture of connection state (as in "state
machine" state) and connection flags in a single "state" field.  To
make the distinction more clear, define a new "flags" field and use
it rather than the "state" field to hold Boolean flag values.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>

928443cd

libceph: embed ceph messenger structure in ceph_client · 15d9882c

由 Alex Elder 提交于 5月 26, 2012

A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

15d9882c

libceph: rename kvec_reset and kvec_add functions · e2200423

由 Alex Elder 提交于 5月 23, 2012

The functions ceph_con_out_kvec_reset() and ceph_con_out_kvec_add()
are entirely private functions, so drop the "ceph_" prefix in their
name to make them slightly more wieldy.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e2200423

libceph: rename socket callbacks · 327800bd

由 Alex Elder 提交于 5月 22, 2012

Change the names of the three socket callback functions to make it
more obvious they're specifically associated with a connection's
socket (not the ceph connection that uses it).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

327800bd

libceph: kill bad_proto ceph connection op · 6384bb8b

由 Alex Elder 提交于 5月 29, 2012

No code sets a bad_proto method in its ceph connection operations
vector, so just get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6384bb8b

libceph: eliminate connection state "DEAD" · e5e372da

由 Alex Elder 提交于 5月 22, 2012

The ceph connection state "DEAD" is never set and is therefore not
needed.  Eliminate it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

e5e372da

ceph: check PG_Private flag before accessing page->private · 28c0254e

由 Yan, Zheng 提交于 5月 28, 2012

I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

28c0254e

22 5月, 2012 1 次提交

libceph: fix pg_temp updates · 6bd9adbd

由 Sage Weil 提交于 5月 21, 2012

Usually, we are adding pg_temp entries or removing them. Occasionally they
update. In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.

Fix by removing the existing entry before inserting a new one.

Fixes http://tracker.newdream.net/issues/2446Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6bd9adbd

19 5月, 2012 2 次提交

libceph: avoid unregistering osd request when not registered · 35f9f8a0

由 Sage Weil 提交于 5月 16, 2012

There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request().  If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.

Simply check if the request is still already unregistered; if so,
return immediately and do nothing.

Fixes http://tracker.newdream.net/issues/2420Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

35f9f8a0

ceph: add auth buf in prepare_write_connect() · 3da54776

由 Alex Elder 提交于 5月 16, 2012

Move the addition of the authorizer buffer to a connection's
out_kvec out of get_connect_authorizer() and into its caller. This
way, the caller--prepare_write_connect()--can avoid adding the
connect header to out_kvec before it has been fully initialized.

Prior to this patch, it was possible for a connect header to be
sent over the wire before the authorizer protocol or buffer length
fields were initialized. An authorizer buffer associated with that
header could also be queued to send only after the connection header
that describes it was on the wire.

Fixes http://tracker.newdream.net/issues/2424Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3da54776

17 5月, 2012 9 次提交

ceph: rename prepare_connect_authorizer() · dac1e716

由 Alex Elder 提交于 5月 16, 2012

Change the name of prepare_connect_authorizer().  The next
patch is going to make this function no longer add anything to the
connection's out_kvec, so it will no longer fit the pattern of
the rest of the prepare_connect_*() functions.

In addition, pass the address of a variable that will hold the
authorization protocol to use.  Move the assignment of that to the
connection's out_connect structure into prepare_write_connect().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

dac1e716

ceph: return pointer from prepare_connect_authorizer() · 729796be

由 Alex Elder 提交于 5月 16, 2012

Change prepare_connect_authorizer() so it returns a pointer (or
pointer-coded error).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

729796be

ceph: use info returned by get_authorizer · 8f43fb53

由 Alex Elder 提交于 5月 16, 2012

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8f43fb53

ceph: have get_authorizer methods return pointers · a3530df3

由 Alex Elder 提交于 5月 16, 2012

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a3530df3

ceph: ensure auth ops are defined before use · a255651d

由 Alex Elder 提交于 5月 16, 2012

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a255651d

ceph: messenger: reduce args to create_authorizer · 74f1869f

由 Alex Elder 提交于 5月 16, 2012

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

74f1869f

ceph: define ceph_auth_handshake type · 6c4a1915

由 Alex Elder 提交于 5月 16, 2012

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6c4a1915

ceph: messenger: check return from get_authorizer · ed96af64

由 Alex Elder 提交于 5月 16, 2012

In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value.  This function can
return an error, so check for it and return it if that ever occurs.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ed96af64

ceph: messenger: rework prepare_connect_authorizer() · b1c6b980

由 Alex Elder 提交于 5月 16, 2012

Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.

Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b1c6b980

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功