提交 · ca8b3a69174b04376722672d7dd6b666a7f17c50 · openeuler / Kernel

02 5月, 2013 28 次提交

libceph: implement multiple data items in a message · ca8b3a69

由 Alex Elder 提交于 4月 05, 2013

This patch adds support to the messenger for more than one data item
in its data list.

A message data cursor has two more fields to support this:
    - a count of the number of bytes left to be consumed across
      all data items in the list, "total_resid"
    - a pointer to the head of the list (for validation only)

The cursor initialization routine has been split into two parts: the
outer one, which initializes the cursor for traversing the entire
list of data items; and the inner one, which initializes the cursor
to start processing a single data item.

When a message cursor is first initialized, the outer initialization
routine sets total_resid to the length provided.  The data pointer
is initialized to the first data item on the list.  From there, the
inner initialization routine finishes by setting up to process the
data item the cursor points to.

Advancing the cursor consumes bytes in total_resid.  If the resid
field reaches zero, it means the current data item is fully
consumed.  If total_resid indicates there is more data, the cursor
is advanced to point to the next data item, and then the inner
initialization routine prepares for using that.  (A check is made at
this point to make sure we don't wrap around the front of the list.)

The type-specific init routines are modified so they can be given a
length that's larger than what the data item can support.  The resid
field is initialized to the smaller of the provided length and the
length of the entire data item.

When total_resid reaches zero, we're done.

This resolves:
    http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ca8b3a69

libceph: replace message data pointer with list · 5240d9f9

由 Alex Elder 提交于 3月 14, 2013

In place of the message data pointer, use a list head which links
through message data items.  For now we only support a single entry
on that list.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

5240d9f9

libceph: have cursor point to data · 8ae4f4f5

由 Alex Elder 提交于 3月 14, 2013

Rather than having a ceph message data item point to the cursor it's
associated with, have the cursor point to a data item.  This will
allow a message cursor to be used for more than one data item.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

8ae4f4f5

libceph: move cursor into message · 36153ec9

由 Alex Elder 提交于 3月 14, 2013

A message will only be processing a single data item at a time, so
there's no need for each data item to have its own cursor.

Move the cursor embedded in the message data structure into the
message itself.  To minimize the impact, keep the data->cursor
field, but make it be a pointer to the cursor in the message.

Move the definition of ceph_msg_data above ceph_msg_data_cursor so
the cursor can point to the data without a forward definition rather
than vice-versa.

This and the upcoming patches are part of:
    http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

36153ec9

libceph: record bio length · c851c495

由 Alex Elder 提交于 4月 05, 2013

The bio is the only data item type that doesn't record its full
length.  Fix that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c851c495

libceph: fix possible CONFIG_BLOCK build problem · ea96571f

由 Alex Elder 提交于 4月 05, 2013

This patch:
    15a0d7b libceph: record message data length
did not enclose some bio-specific code inside CONFIG_BLOCK as
it should have.  Fix that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ea96571f

libceph: record message data length · a1930804

由 Alex Elder 提交于 3月 14, 2013

Keep track of the length of the data portion for a message in a
separate field in the ceph_msg structure.  This information has
been maintained in wire byte order in the message header, but
that's going to change soon.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

a1930804

libceph: make message data be a pointer · 6644ed7b

由 Alex Elder 提交于 3月 11, 2013

Begin the transition from a single message data item to a list of
them by replacing the "data" structure in a message with a pointer
to a ceph_msg_data structure.

A null pointer will indicate the message has no data; replace the
use of ceph_msg_has_data() with a simple check for a null pointer.

Create functions ceph_msg_data_create() and ceph_msg_data_destroy()
to dynamically allocate and free a data item structure of a given type.

When a message has its data item "set," allocate one of these to
hold the data description, and free it when the last reference to
the message is dropped.

This partially resolves:
    http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6644ed7b

libceph: kill last of ceph_msg_pos · f5db90bc

由 Alex Elder 提交于 3月 11, 2013

The only remaining field in the ceph_msg_pos structure is
did_page_crc.  In the new cursor model of things that flag (or
something like it) belongs in the cursor.

Define a new field "need_crc" in the cursor (which applies to all
types of data) and initialize it to true whenever a cursor is
initialized.

In write_partial_message_data(), the data CRC still will be computed
as before, but it will check the cursor->need_crc field to determine
whether it's needed.  Any time the cursor is advanced to a new piece
of a data item, need_crc will be set, and this will cause the crc
for that entire piece to be accumulated into the data crc.

In write_partial_message_data() the intermediate crc value is now
held in a local variable so it doesn't have to be byte-swapped so
many times.  In read_partial_msg_data() we do something similar
(but mainly for consistency there).

With that, the ceph_msg_pos structure can go away,  and it no longer
needs to be passed as an argument to prepare_message_data().

This cleanup is related to:
    http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f5db90bc

libceph: kill most of ceph_msg_pos · 859a35d5

由 Alex Elder 提交于 3月 11, 2013

All but one of the fields in the ceph_msg_pos structure are now
never used (only assigned), so get rid of them.  This allows
several small blocks of code to go away.

This is cleanup of old code related to:
    http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

859a35d5

libceph: collapse all data items into one · 4c59b4a2

由 Alex Elder 提交于 3月 11, 2013

It turns out that only one of the data item types is ever used at
any one time in a single message (currently).
    - A page array is used by the osd client (on behalf of the file
      system) and by rbd.  Only one osd op (and therefore at most
      one data item) is ever used at a time by rbd.  And the only
      time the file system sends two, the second op contains no
      data.
    - A bio is only used by the rbd client (and again, only one
      data item per message)
    - A page list is used by the file system and by rbd for outgoing
      data, but only one op (and one data item) at a time.

We can therefore collapse all three of our data item fields into a
single field "data", and depend on the messenger code to properly
handle it based on its type.

This allows us to eliminate quite a bit of duplicated code.

This is related to:
    http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

4c59b4a2

libceph: kill ceph message bio_iter, bio_seg · 6518be47

由 Alex Elder 提交于 3月 11, 2013

The bio_iter and bio_seg fields in a message are no longer used, we
use the cursor instead.  So get rid of them and the functions that
operate on them them.

This is related to:
    http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6518be47

libceph: record residual bytes for all message data types · 25aff7c5

由 Alex Elder 提交于 3月 11, 2013

All of the data types can use this, not just the page array.  Until
now, only the bio type doesn't have it available, and only the
initiator of the request (the rbd client) is able to supply the
length of the full request without re-scanning the bio list.  Change
the cursor init routines so the length is supplied based on the
message header "data_len" field, and use that length to intiialize
the "resid" field of the cursor.

In addition, change the way "last_piece" is defined so it is based
on the residual number of bytes in the original request.  This is
necessary (at least for bio messages) because it is possible for
a read request to succeed without consuming all of the space
available in the data buffer.

This resolves:
    http://tracker.ceph.com/issues/4427Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

25aff7c5

libceph: kill message trail · 9d2a06c2

由 Alex Elder 提交于 3月 08, 2013

The wart that is the ceph message trail can now be removed, because
its only user was the osd client, and the previous patch made that
no longer the case.

The result allows write_partial_msg_pages() to be simplified
considerably.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9d2a06c2

libceph: implement pages array cursor · e766d7b5

由 Alex Elder 提交于 3月 07, 2013

Implement and use cursor routines for page array message data items
for outbound message data.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e766d7b5

libceph: implement bio message data item cursor · 6aaa4511

由 Alex Elder 提交于 3月 06, 2013

Implement and use cursor routines for bio message data items for
outbound message data.

(See the previous commit for reasoning in support of the changes
in out_msg_pos_next().)
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6aaa4511

libceph: prepare for other message data item types · dd236fcb

由 Alex Elder 提交于 3月 06, 2013

This just inserts some infrastructure in preparation for handling
other types of ceph message data items.  No functional changes,
just trying to simplify review by separating out some noise.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

dd236fcb

libceph: start defining message data cursor · fe38a2b6

由 Alex Elder 提交于 3月 06, 2013

This patch lays out the foundation for using generic routines to
manage processing items of message data.

For simplicity, we'll start with just the trail portion of a
message, because it stands alone and is only present for outgoing
data.

First some basic concepts.  We'll use the term "data item" to
represent one of the ceph_msg_data structures associated with a
message.  There are currently four of those, with single-letter
field names p, l, b, and t.  A data item is further broken into
"pieces" which always lie in a single page.  A data item will
include a "cursor" that will track state as the memory defined by
the item is consumed by sending data from or receiving data into it.

We define three routines to manipulate a data item's cursor: the
"init" routine; the "next" routine; and the "advance" routine.  The
"init" routine initializes the cursor so it points at the beginning
of the first piece in the item.  The "next" routine returns the
page, page offset, and length (limited by both the page and item
size) of the next unconsumed piece in the item.  It also indicates
to the caller whether the piece being returned is the last one in
the data item.

The "advance" routine consumes the requested number of bytes in the
item (advancing the cursor).  This is used to record the number of
bytes from the current piece that were actually sent or received by
the network code.  It returns an indication of whether the result
means the current piece has been fully consumed.  This is used by
the message send code to determine whether it should calculate the
CRC for the next piece processed.

The trail of a message is implemented as a ceph pagelist.  The
routines defined for it will be usable for non-trail pagelist data
as well.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

fe38a2b6

libceph: abstract message data · 43794509

由 Alex Elder 提交于 3月 01, 2013

Group the types of message data into an abstract structure with a
type indicator and a union containing fields appropriate to the
type of data it represents.  Use this to represent the pages,
pagelist, bio, and trail in a ceph message.

Verify message data is of type NONE in ceph_msg_data_set_*()
routines.  Since information about message data of type NONE really
should not be interpreted, get rid of the other assertions in those
functions.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

43794509

libceph: be explicit about message data representation · f9e15777

由 Alex Elder 提交于 3月 01, 2013

A ceph message has a data payload portion.  The memory for that data
(either the source of data to send or the location to place data
that is received) is specified in several ways.  The ceph_msg
structure includes fields for all of those ways, but this
mispresents the fact that not all of them are used at a time.

Specifically, the data in a message can be in:
    - an array of pages
    - a list of pages
    - a list of Linux bios
    - a second list of pages (the "trail")
(The two page lists are currently only ever used for outgoing data.)

Impose more structure on the ceph message, making the grouping of
some of these fields explicit.  Shorten the name of the
"page_alignment" field.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f9e15777

libceph: define ceph_msg_has_*() data macros · 97fb1c7f

由 Alex Elder 提交于 3月 01, 2013

Define and use macros ceph_msg_has_*() to determine whether to
operate on the pages, pagelist, bio, and trail fields of a message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

97fb1c7f

libceph: record message data byte length · 4a73ef27

由 Alex Elder 提交于 3月 07, 2013

Record the number of bytes of data in a page array rather than the
number of pages in the array.  It can be assumed that the page array
is of sufficient size to hold the number of bytes indicated (and
offset by the indicated alignment).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

4a73ef27

libceph: isolate other message data fields · 27fa8385

由 Alex Elder 提交于 2月 14, 2013

Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and
ceph_msg_data_set_trail() to clearly abstract the assignment of the
remaining data-related fields in a ceph message structure.  Use the
new functions in the osd client and mds client.

This partially resolves:
    http://tracker.ceph.com/issues/4263Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

27fa8385

libceph: set page info with byte length · f1baeb2b

由 Alex Elder 提交于 3月 07, 2013

When setting page array information for message data, provide the
byte length rather than the page count ceph_msg_data_set_pages().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f1baeb2b

libceph: isolate message page field manipulation · 02afca6c

由 Alex Elder 提交于 2月 14, 2013

Define a function ceph_msg_data_set_pages(), which more clearly
abstracts the assignment page-related fields for data in a ceph
message structure.  Use this new function in the osd client and mds
client.

Ideally, these fields would never be set more than once (with
BUG_ON() calls to guarantee that).  At the moment though the osd
client sets these every time it receives a message, and in the event
of a communication problem this can happen more than once.  (This
will be resolved shortly, but setting up these helpers first makes
it all a bit easier to work with.)

Rearrange the field order in a ceph_msg structure to group those
that are used to define the possible data payloads.

This partially resolves:
    http://tracker.ceph.com/issues/4263Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

02afca6c

libceph: kill ceph_msg->pagelist_count · ec02a2f2

由 Alex Elder 提交于 3月 01, 2013

The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ec02a2f2

libceph: distinguish page array and pagelist count · d4b515fa

由 Alex Elder 提交于 2月 25, 2013

Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d4b515fa

libceph: make ceph_msg->bio_seg be unsigned · 07c09b72

由 Alex Elder 提交于 2月 15, 2013

The bio_seg field is used by the ceph messenger in iterating through
a bio.  It should never have a negative value, so make it an
unsigned.  (I contemplated making it unsigned short to match the
struct bio definition, but it offered no benefit.)

Change variables used to hold bio_seg values to all be unsigned as
well.  Change two variable names in init_bio_iter() to match the
convention used everywhere else.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

07c09b72

14 2月, 2013 1 次提交

libceph: fix messenger CONFIG_BLOCK dependencies · 3ebc21f7

由 Alex Elder 提交于 1月 31, 2013

The ceph messenger has a few spots that are only used when
bio messages are supported, and that's only when CONFIG_BLOCK
is defined.  This surrounds a couple of spots with #ifdef's
that would cause a problem if CONFIG_BLOCK were not present
in the kernel configuration.

This resolves:
    http://tracker.ceph.com/issues/3976Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3ebc21f7

03 10月, 2012 1 次提交

UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers · a1ce3928

由 David Howells 提交于 10月 02, 2012

Convert #include "..." to #include <path/...> in kernel system headers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

a1ce3928

31 7月, 2012 4 次提交

libceph: clean up con flags · 4a861692

由 Sage Weil 提交于 7月 20, 2012

Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: NSage Weil <sage@inktank.com>

4a861692

libceph: replace connection state bits with states · 8dacc7da

由 Sage Weil 提交于 7月 20, 2012

Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.

This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: NSage Weil <sage@inktank.com>

8dacc7da

libceph: prevent the race of incoming work during teardown · a2a32584

由 Guanjun He 提交于 7月 08, 2012

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: NGuanjun He <gjhe@suse.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a2a32584

libceph: fix messenger retry · a16cb1f7

由 Sage Weil 提交于 7月 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: NSage Weil <sage@inktank.com>

a16cb1f7

18 7月, 2012 1 次提交

libceph: fix messenger retry · 5bdca4e0

由 Sage Weil 提交于 7月 10, 2012

5bdca4e0

06 7月, 2012 3 次提交

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

libceph: drop declaration of ceph_con_get() · 26103021

由 Alex Elder 提交于 6月 21, 2012

For some reason the declaration of ceph_con_get() and
ceph_con_put() did not get deleted in this commit:
    d59315ca libceph: drop ceph_con_get/put helpers and nref member

Clean that up.
Signed-off-by: NAlex Elder <elder@inktank.com>

26103021

libceph: define and use an explicit CONNECTED state · e27947c7

由 Alex Elder 提交于 5月 23, 2012

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e27947c7

22 6月, 2012 1 次提交

libceph: drop ceph_con_get/put helpers and nref member · d59315ca

由 Sage Weil 提交于 6月 21, 2012

These are no longer used.  Every ceph_connection instance is embedded in
another structure, and refcounts manipulated via the get/put ops.
Signed-off-by: NSage Weil <sage@inktank.com>

d59315ca

06 6月, 2012 1 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功