提交 · a53aab645c82f0146e35684b34692c69b5118121 · openanolis / cloud-kernel

31 7月, 2012 28 次提交

ceph: close old con before reopening on mds reconnect · a53aab64

由 Sage Weil 提交于 7月 30, 2012

When we detect a mds session reset, close the old ceph_connection before
reopening it.  This ensures we clean up the old socket properly and keep
the ceph_connection state correct.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a53aab64

libceph: (re)initialize bio_iter on start of message receive · a4107026

由 Sage Weil 提交于 7月 30, 2012

Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a4107026

libceph: resubmit linger ops when pg mapping changes · 6194ea89

由 Sage Weil 提交于 7月 30, 2012

The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6194ea89

libceph: fix mutex coverage for ceph_con_close · 8c50c817

由 Sage Weil 提交于 7月 30, 2012

Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

8c50c817

libceph: report socket read/write error message · 3a140a0d

由 Sage Weil 提交于 7月 30, 2012

We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

3a140a0d

libceph: support crush tunables · 546f04ef

由 Sage Weil 提交于 7月 30, 2012

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: Ncaleb miles <caleb.miles@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

546f04ef

libceph: move feature bits to separate header · 1fe60e51

由 Sage Weil 提交于 7月 30, 2012

This is simply cleanup that will keep things more closely synced with the
userland code.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

1fe60e51

rbd: kill num_reply parameters · d1f57ea6

由 Alex Elder 提交于 6月 26, 2012

Several functions include a num_reply parameter, but it is never
used.  Just get rid of it everywhere--it seems to be something
that never got fully implemented.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d1f57ea6

rbd: option symbol renames · 43ae4701

由 Alex Elder 提交于 7月 03, 2012

Use the name "ceph_opts" consistently (rather than just "opt") for
pointers to a ceph_options structure.

Change the few spots that don't use "rbd_opts" for a rbd_options
pointer to match the rest.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

43ae4701

rbd: more symbol renames · aded07ea

由 Alex Elder 提交于 7月 03, 2012

Rename variables named "obj" which represent object names so they're
consistently named "object_name".

Rename the "cls" and "method" parameters in rbd_req_sync_exec()
to be "class_name" and "method_name", and make similar changes
to the names of local variables in that function representing
the lengths of those names.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

aded07ea

rbd: rename some fields in struct rbd_dev · 0bed54dc

由 Alex Elder 提交于 7月 03, 2012

An rbd image is not a single object, but a logical construct made up
of an aggregation of objects.

Rename some fields in struct rbd_dev, in hopes of reinforcing this.
    obj         --> image_name
    obj_len     --> image_name_len
    obj_md_name --> header_name
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0bed54dc

rbd: use rbd_dev consistently · 0ce1a794

由 Alex Elder 提交于 7月 03, 2012

Most variables that represent a struct rbd_device are named
"rbd_dev", but in some cases "dev" is used instead.  Change all the
"dev" references so they use "rbd_dev" consistently, to make it
clear from the name that we're working with an RBD device (as
opposed to, for example, a struct device).  Similarly, change the
name of the "dev" field in struct rbd_notify_info to be "rbd_dev".
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0ce1a794

rbd: dynamically allocate snapshot name · 820a5f3e

由 Alex Elder 提交于 7月 09, 2012

There is no need to impose a small limit the length of the snapshot
name recorded for an rbd image in a struct rbd_dev.  Remove the
limitation by allocating space for the snapshot name dynamically.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

820a5f3e

rbd: dynamically allocate image name · bf3e5ae1

由 Alex Elder 提交于 7月 09, 2012

There is no need to impose a small limit the length of the rbd image
name recorded in a struct rbd_dev.  Remove the limitation by
allocating space for the image name dynamically.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

bf3e5ae1

rbd: dynamically allocate image header name · cb8627c7

由 Alex Elder 提交于 7月 09, 2012

There is no need to impose a small limit the length of the header
name recorded for an rbd image in a struct rbd_dev.  Remove the
limitation by allocating space for the header name dynamically.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

cb8627c7

rbd: dynamically allocate object prefix · 849b4260

由 Alex Elder 提交于 7月 09, 2012

There is no need to impose a small limit the length of the object
prefix recorded for an rbd image in a struct rbd_image_header.
Remove the limitation by allocating space for the object prefix
dynamically.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

849b4260

rbd: dynamically allocate pool name · d22f76e7

由 Alex Elder 提交于 7月 12, 2012

There is no need to impose a small limit the length of the pool name
recorded for an rbd image in a struct rbd_device.  Remove the
limitation by allocating space for the pool name ynamically.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d22f76e7

rbd: create pool_id device attribute · 9bb2f334

由 Alex Elder 提交于 7月 12, 2012

Add an entry under /sys/bus/rbd/devices/<N>/ named "pool_id" that
provides the id for the pool the rbd image is assocatied with.  This
is in addition to the pool name already provided.

Rename the "poolid" field in struct rbd_device  to be "pool_id".

Update the documentation to reflect the addition of this new entry.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9bb2f334

rbd: rename rbd_dev->block_name · ca1e49a6

由 Alex Elder 提交于 7月 10, 2012

Each rbd image has a name that forms the basis of all data objects
backing the device.  Old (format 1) images refer to this name as the
"block name," while new (format 2) images use the term "object
prefix" for this.

Change the field name in the in-core rbd image header structure to
reflect the more modern usage.  We intentionally keep the the name
"block_name" in the on-disk definition for format 1 image headers.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ca1e49a6

rbd: define dup_token() · ea3352f4

由 Alex Elder 提交于 7月 09, 2012

Define a new function dup_token(), to be used during argument
parsing for making dynamically-allocated copies of tokens being
parsed.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ea3352f4

libceph: define ceph_extract_encoded_string() · f8c36c58

由 Alex Elder 提交于 7月 11, 2012

This adds a new utility routine which will return a dynamically-
allocated buffer containing a string that has been decoded from ceph
over-the-wire format.  It also returns the length of the string
if the address of a size variable is supplied to receive it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f8c36c58

rbd: drop a useless local variable · ad4f232f

由 Alex Elder 提交于 7月 03, 2012

In rbd_req_sync_notify_ack(), a local variable was needlessly being
used to hold a null pointer.  Just pass NULL instead.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ad4f232f

libceph: fix off-by-one bug in ceph_encode_filepath() · c61a1abd

由 Alex Elder 提交于 7月 03, 2012

There is a BUG_ON() call that doesn't account for the single byte
structure version at the start of an encoded filepath in
ceph_encode_filepath().  Fix that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c61a1abd

ceph: clean up useless d_parent checks · 8842b3be

由 Sage Weil 提交于 6月 07, 2012

d_parent is never NULL, and IS_ROOT() is the proper way to check for a
(non-self-referential) parent.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>

8842b3be

libceph: prevent the race of incoming work during teardown · a2a32584

由 Guanjun He 提交于 7月 08, 2012

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: NGuanjun He <gjhe@suse.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a2a32584

libceph: fix messenger retry · a16cb1f7

由 Sage Weil 提交于 7月 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: NSage Weil <sage@inktank.com>

a16cb1f7

libceph: initialize rb, list nodes in ceph_osd_request · cd43045c

由 Sage Weil 提交于 7月 09, 2012

These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

cd43045c

libceph: initialize msgpool message types · d50b409f

由 Sage Weil 提交于 7月 09, 2012

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

d50b409f

06 7月, 2012 12 次提交

libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47

由 Sage Weil 提交于 6月 27, 2012

It is possible to close a socket that is in the OPENING state.  For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established.  con_work() will come around and shut
down the socket.
Signed-off-by: NSage Weil <sage@inktank.com>

fbb85a47

libceph: initialize mon_client con only once · 735a72ef

由 Sage Weil 提交于 6月 27, 2012

Do not re-initialize the con on every connection attempt. When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.
Signed-off-by: NSage Weil <sage@inktank.com>

735a72ef

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

libceph: drop declaration of ceph_con_get() · 26103021

由 Alex Elder 提交于 6月 21, 2012

For some reason the declaration of ceph_con_get() and
ceph_con_put() did not get deleted in this commit:
    d59315ca libceph: drop ceph_con_get/put helpers and nref member

Clean that up.
Signed-off-by: NAlex Elder <elder@inktank.com>

26103021

libceph: add some fine ASCII art · bc18f4b1

由 Alex Elder 提交于 6月 20, 2012

Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bc18f4b1

libceph: small changes to messenger.c · 5821bd8c

由 Alex Elder 提交于 6月 11, 2012

This patch gathers a few small changes in "net/ceph/messenger.c":
  out_msg_pos_next()
    - small logic change that mostly affects indentation
  write_partial_msg_pages().
    - use a local variable trail_off to represent the offset into
      a message of the trail portion of the data (if present)
    - once we are in the trail portion we will always be there, so we
      don't always need to check against our data position
    - avoid computing len twice after we've reached the trail
    - get rid of the variable tmpcrc, which is not needed
    - trail_off and trail_len never change so mark them const
    - update some comments
  read_partial_message_bio()
    - bio_iovec_idx() will never return an error, so don't bother
      checking for it
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5821bd8c

libceph: distinguish two phases of connect sequence · 7593af92

由 Alex Elder 提交于 5月 24, 2012

Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer.  Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun.  The
CONNECTING bit/state continues to be set during this phase.

Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states.  In other
words, when NEGOTIATING is set, clear CONNECTING.  That way only
one of them will be active at a time.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7593af92

libceph: separate banner and connect writes · ab166d5a

由 Alex Elder 提交于 5月 31, 2012

There are two phases in the process of linking together the two ends
of a ceph connection.  The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.

When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.

So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).

Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed.  Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.

Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab166d5a

libceph: define and use an explicit CONNECTED state · e27947c7

由 Alex Elder 提交于 5月 23, 2012

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e27947c7

libceph: clear NEGOTIATING when done · 3ec50d18

由 Alex Elder 提交于 5月 23, 2012

A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server).  But that bit
is not cleared again--at least not until another connection attempt
is initiated.

Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3ec50d18

libceph: clear CONNECTING in ceph_con_close() · bb9e6bba

由 Alex Elder 提交于 6月 20, 2012

A connection that is closed will no longer be connecting.  So
clear the CONNECTING state bit in ceph_con_close().  Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bb9e6bba

libceph: don't touch con state in con_close_socket() · 456ea468

由 Alex Elder 提交于 6月 20, 2012

In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.

Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down.  We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.

A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down.  Add a comment explaining
the reason for this.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

456ea468

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功