提交 · a59b55a602b6c741052d79c1e3643f8440cddd27 · openanolis / cloud-kernel

31 7月, 2012 10 次提交

libceph: move ceph_con_send() closed check under the con mutex · a59b55a6

由 Sage Weil 提交于 7月 20, 2012

Take the con mutex before checking whether the connection is closed to
avoid racing with someone else closing it.
Signed-off-by: NSage Weil <sage@inktank.com>

a59b55a6

libceph: move msgr clear_standby under con mutex protection · 00650931

由 Sage Weil 提交于 7月 20, 2012

Avoid dropping and retaking con->mutex in the ceph_con_send() case by
leaving locking up to the caller.
Signed-off-by: NSage Weil <sage@inktank.com>

00650931

libceph: fix fault locking; close socket on lossy fault · 3b5ede07

由 Sage Weil 提交于 7月 20, 2012

If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.

We should also take the con mutex before printing out the state bits in
the debug output.
Signed-off-by: NSage Weil <sage@inktank.com>

3b5ede07

libceph: reset connection retry on successfully negotiation · 85effe18

由 Sage Weil 提交于 7月 30, 2012

We exponentially back off when we encounter connection errors.  If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.

Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node.  Fixes ceph issue #2802.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

85effe18

libceph: protect ceph_con_open() with mutex · 5469155f

由 Sage Weil 提交于 7月 30, 2012

Take the con mutex while we are initiating a ceph open.  This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

5469155f

libceph: (re)initialize bio_iter on start of message receive · a4107026

由 Sage Weil 提交于 7月 30, 2012

Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a4107026

libceph: fix mutex coverage for ceph_con_close · 8c50c817

由 Sage Weil 提交于 7月 30, 2012

Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

8c50c817

libceph: report socket read/write error message · 3a140a0d

由 Sage Weil 提交于 7月 30, 2012

We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

3a140a0d

libceph: prevent the race of incoming work during teardown · a2a32584

由 Guanjun He 提交于 7月 08, 2012

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: NGuanjun He <gjhe@suse.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a2a32584

libceph: fix messenger retry · a16cb1f7

由 Sage Weil 提交于 7月 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: NSage Weil <sage@inktank.com>

a16cb1f7

06 7月, 2012 19 次提交

libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47

由 Sage Weil 提交于 6月 27, 2012

It is possible to close a socket that is in the OPENING state.  For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established.  con_work() will come around and shut
down the socket.
Signed-off-by: NSage Weil <sage@inktank.com>

fbb85a47

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

libceph: add some fine ASCII art · bc18f4b1

由 Alex Elder 提交于 6月 20, 2012

Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bc18f4b1

libceph: small changes to messenger.c · 5821bd8c

由 Alex Elder 提交于 6月 11, 2012

This patch gathers a few small changes in "net/ceph/messenger.c":
  out_msg_pos_next()
    - small logic change that mostly affects indentation
  write_partial_msg_pages().
    - use a local variable trail_off to represent the offset into
      a message of the trail portion of the data (if present)
    - once we are in the trail portion we will always be there, so we
      don't always need to check against our data position
    - avoid computing len twice after we've reached the trail
    - get rid of the variable tmpcrc, which is not needed
    - trail_off and trail_len never change so mark them const
    - update some comments
  read_partial_message_bio()
    - bio_iovec_idx() will never return an error, so don't bother
      checking for it
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5821bd8c

libceph: distinguish two phases of connect sequence · 7593af92

由 Alex Elder 提交于 5月 24, 2012

Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer.  Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun.  The
CONNECTING bit/state continues to be set during this phase.

Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states.  In other
words, when NEGOTIATING is set, clear CONNECTING.  That way only
one of them will be active at a time.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7593af92

libceph: separate banner and connect writes · ab166d5a

由 Alex Elder 提交于 5月 31, 2012

There are two phases in the process of linking together the two ends
of a ceph connection.  The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.

When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.

So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).

Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed.  Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.

Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab166d5a

libceph: define and use an explicit CONNECTED state · e27947c7

由 Alex Elder 提交于 5月 23, 2012

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e27947c7

libceph: clear NEGOTIATING when done · 3ec50d18

由 Alex Elder 提交于 5月 23, 2012

A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server).  But that bit
is not cleared again--at least not until another connection attempt
is initiated.

Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3ec50d18

libceph: clear CONNECTING in ceph_con_close() · bb9e6bba

由 Alex Elder 提交于 6月 20, 2012

A connection that is closed will no longer be connecting.  So
clear the CONNECTING state bit in ceph_con_close().  Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bb9e6bba

libceph: don't touch con state in con_close_socket() · 456ea468

由 Alex Elder 提交于 6月 20, 2012

In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.

Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down.  We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.

A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down.  Add a comment explaining
the reason for this.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

456ea468

libceph: just set SOCK_CLOSED when state changes · d65c9e0b

由 Alex Elder 提交于 6月 20, 2012

When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
connection flag bit is set, and if it had not been previously set
queue_con() is called to ensure con_work() will get a chance to
handle the changed state.

con_work() atomically checks--and if set, clears--the SOCK_CLOSED
bit if it was set.  This means that even if the bit were set
repeatedly, the related processing in con_work() only gets called
once per transition of the bit from 0 to 1.

What's important then is that we ensure con_work() gets called *at
least* once when a socket close event occurs, not that it gets
called *exactly* once.

The work queue mechanism already takes care of queueing work
only if it is not already queued, so there's no need for us
to call queue_con() conditionally.

So this patch just makes it so the SOCK_CLOSED flag gets set
unconditionally in ceph_sock_state_change().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

d65c9e0b

libceph: don't change socket state on sock event · 188048bc

由 Alex Elder 提交于 6月 20, 2012

Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.

Changing connection information during handling of a socket event is
not very clean, so instead move this assignment inside con_work(),
where it can be done during normal connection-level processing (and
under protection of the connection mutex as well).

Move the handling of a socket closed event up to the top of the
processing loop in con_work(); there's no point in handling backoff
etc. if we have a newly-closed socket to take care of.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

188048bc

libceph: SOCK_CLOSED is a flag, not a state · a8d00e3c

由 Alex Elder 提交于 6月 20, 2012

The following commit changed it so SOCK_CLOSED bit was stored in
a connection's new "flags" field rather than its "state" field.

    libceph: start separating connection flags from state
    commit 928443cd

That bit is used in con_close_socket() to protect against setting an
error message more than once in the socket event handler function.

Unfortunately, the field being operated on in that function was not
updated to be "flags" as it should have been.  This fixes that
error.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a8d00e3c

libceph: don't use bio_iter as a flag · abdaa6a8

由 Alex Elder 提交于 6月 11, 2012

Recently a bug was fixed in which the bio_iter field in a ceph
message was not being properly re-initialized when a message got
re-transmitted:
    commit 43643528
    Author: Yan, Zheng <zheng.z.yan@intel.com>
    rbd: Clear ceph_msg->bio_iter for retransmitted message

We are now only initializing the bio_iter field when we are about to
start to write message data (in prepare_write_message_data()),
rather than every time we are attempting to write any portion of the
message data (in write_partial_msg_pages()).  This means we no
longer need to use the msg->bio_iter field as a flag.

So just don't do that any more.  Trust prepare_write_message_data()
to ensure msg->bio_iter is properly initialized, every time we are
about to begin writing (or re-writing) a message's bio data.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

abdaa6a8

libceph: move init of bio_iter · 572c588e

由 Alex Elder 提交于 6月 11, 2012

If a message has a non-null bio pointer, its bio_iter field is
initialized in write_partial_msg_pages() if this has not been done
already.  This is really a one-time setup operation for sending a
message's (bio) data, so move that initialization code into
prepare_write_message_data() which serves that purpose.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

572c588e

libceph: move init_bio_*() functions up · df6ad1f9

由 Alex Elder 提交于 6月 11, 2012

Move init_bio_iter() and iter_bio_next() up in their source file so
the'll be defined before they're needed.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

df6ad1f9

libceph: don't mark footer complete before it is · fd154f3c

由 Alex Elder 提交于 6月 11, 2012

This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
flag before the CRC for the data portion (recorded in the footer)
has been completely computed.  Hold off setting the complete flag
until we've decided it's ready to send.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

fd154f3c

libceph: encapsulate advancing msg page · 84ca8fc8

由 Alex Elder 提交于 6月 11, 2012

In write_partial_msg_pages(), once all the data from a page has been
sent we advance to the next one.  Put the code that takes care of
this into its own function.

While modifying write_partial_msg_pages(), make its local variable
"in_trail" be Boolean, and use the local variable "msg" (which is
just the connection's current out_msg pointer) consistently.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

84ca8fc8

libceph: encapsulate out message data setup · 739c905b

由 Alex Elder 提交于 6月 11, 2012

Move the code that prepares to write the data portion of a message
into its own function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

739c905b

22 6月, 2012 2 次提交

libceph: drop ceph_con_get/put helpers and nref member · d59315ca

由 Sage Weil 提交于 6月 21, 2012

These are no longer used.  Every ceph_connection instance is embedded in
another structure, and refcounts manipulated via the get/put ops.
Signed-off-by: NSage Weil <sage@inktank.com>

d59315ca

libceph: use con get/put methods · 36eb71aa

由 Sage Weil 提交于 6月 21, 2012

The ceph_con_get/put() helpers manipulate the embedded con ref
count, which isn't used now that ceph_connections are embedded in
other structures.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

36eb71aa

19 6月, 2012 1 次提交

libceph: fix NULL dereference in reset_connection() · 26ce1719

由 Dan Carpenter 提交于 6月 19, 2012

We dereference "con->in_msg" on the line after it was set to NULL.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

26ce1719

16 6月, 2012 1 次提交

libceph: transition socket state prior to actual connect · 89a86be0

由 Sage Weil 提交于 6月 09, 2012

Once we call ->connect(), we are racing against the actual
connection, and a subsequent transition from CONNECTING ->
CONNECTED.  Set the state to CONNECTING before that, under the
protection of the mutex, to avoid the race.

This was introduced in 928443cd,
with the original socket state code.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

89a86be0

07 6月, 2012 1 次提交

rbd: Clear ceph_msg->bio_iter for retransmitted message · 43643528

由 Yan, Zheng 提交于 6月 06, 2012

The bug can cause NULL pointer dereference in write_partial_msg_pages
Signed-off-by: NZheng Yan <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

43643528

06 6月, 2012 6 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

libceph: make ceph_con_revoke() a msg operation · 6740a845

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6740a845

libceph: have messages take a connection reference · 92ce034b

由 Alex Elder 提交于 6月 04, 2012

There are essentially two types of ceph messages: incoming and
outgoing.  Outgoing messages are always allocated via ceph_msg_new(),
and at the time of their allocation they are not associated with any
particular connection.  Incoming messages are always allocated via
ceph_con_in_msg_alloc(), and they are initially associated with the
connection from which incoming data will be placed into the message.

When an outgoing message gets sent, it becomes associated with a
connection and remains that way until the message is successfully
sent.  The association of an incoming message goes away at the point
it is sent to an upper layer via a con->ops->dispatch method.

This patch implements reference counting for all ceph messages, such
that every message holds a reference (and a pointer) to a connection
if and only if it is associated with that connection (as described
above).


For background, here is an explanation of the ceph message
lifecycle, emphasizing when an association exists between a message
and a connection.

Outgoing Messages
An outgoing message is "owned" by its allocator, from the time it is
allocated in ceph_msg_new() up to the point it gets queued for
sending in ceph_con_send().  Prior to that point the message's
msg->con pointer is null; at the point it is queued for sending its
message pointer is assigned to refer to the connection.  At that
time the message is inserted into a connection's out_queue list.

When a message on the out_queue list has been sent to the socket
layer to be put on the wire, it is transferred out of that list and
into the connection's out_sent list.  At that point it is still owned
by the connection, and will remain so until an acknowledgement is
received from the recipient that indicates the message was
successfully transferred.  When such an acknowledgement is received
(in process_ack()), the message is removed from its list (in
ceph_msg_remove()), at which point it is no longer associated with
the connection.

So basically, any time a message is on one of a connection's lists,
it is associated with that connection.  Reference counting outgoing
messages can thus be done at the points a message is added to the
out_queue (in ceph_con_send()) and the point it is removed from
either its two lists (in ceph_msg_remove())--at which point its
connection pointer becomes null.

Incoming Messages
When an incoming message on a connection is getting read (in
read_partial_message()) and there is no message in con->in_msg,
a new one is allocated using ceph_con_in_msg_alloc().  At that
point the message is associated with the connection.  Once that
message has been completely and successfully read, it is passed to
upper layer code using the connection's con->ops->dispatch method.
At that point the association between the message and the connection
no longer exists.

Reference counting of connections for incoming messages can be done
by taking a reference to the connection when the message gets
allocated, and releasing that reference when it gets handed off
using the dispatch method.

We should never fail to get a connection reference for a
message--the since the caller should already hold one.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

92ce034b

libceph: have messages point to their connection · 38941f80

由 Alex Elder 提交于 6月 01, 2012

When a ceph message is queued for sending it is placed on a list of
pending messages (ceph_connection->out_queue).  When they are
actually sent over the wire, they are moved from that list to
another (ceph_connection->out_sent).  When acknowledgement for the
message is received, it is removed from the sent messages list.

During that entire time the message is "in the possession" of a
single ceph connection.  Keep track of that connection in the
message.  This will be used in the next patch (and is a helpful
bit of information for debugging anyway).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

38941f80

libceph: tweak ceph_alloc_msg() · 1c20f2d2

由 Alex Elder 提交于 6月 04, 2012

The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer.  Rename the
function so this implied usage is more clear.

In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for).  This allows us
to return what is now provided via the passed-in address of a "skip"
variable.  The return type is now Boolean to be explicit that there
are only two possible outcomes.

Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1c20f2d2

libceph: fully initialize connection in con_init() · 1bfd89f4

由 Alex Elder 提交于 5月 26, 2012

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1bfd89f4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功