提交 · 9516e45b25d9967c35d2e798496ec5e590aaa24f · openeuler / raspberrypi-kernel

02 5月, 2013 25 次提交

libceph: simplify new message initialization · 9516e45b

由 Alex Elder 提交于 3月 01, 2013

Rather than explicitly initializing many fields to 0, NULL, or false
in a newly-allocated message, just use kzalloc() for allocating new
messages.  This will become a much more convenient way of doing
things anyway for upcoming patches that abstract the data field.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9516e45b

libceph: advance pagelist with list_rotate_left() · 35c7bfbc

由 Alex Elder 提交于 3月 06, 2013

While processing an outgoing pagelist (either the data pagelist or
trail) in a ceph message, the messenger cycles through each of the
pages on the list.  This is accomplished in out_msg_pos_next(), if
the end of the first page on the list is reached, the first page is
moved to the end of the list.

There is a list operation, list_rotate_left(), which performs
exactly this operation, and by using it, what's really going on
becomes more obvious.

So replace these two list_move_tail() calls with list_rotate_left().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

35c7bfbc

libceph: define and use in_msg_pos_next() · e788182f

由 Alex Elder 提交于 3月 08, 2013

Define a new function in_msg_pos_next() to match out_msg_pos_next(),
and use it in place of code at the end of read_partial_message_pages()
and read_partial_message_bio().

Note that the page number is incremented and offset reset under
slightly different conditions from before.  The result is
equivalent, however, as explained below.

Each time an incoming message is going to arrive, we find out how
much room is left--not surpassing the current page--and provide that
as the number of bytes to receive.  So the amount we'll use is the
lesser of:  all that's left of the entire request; and all that's
left in the current page.

If we received exactly how many were requested, we either reached
the end of the request or the end of the page.  In the first case,
we're done, in the second, we move onto the next page in the array.

In all cases but (possibly) on the last page, after adding the
number of bytes received, page_pos == PAGE_SIZE.  On the last page,
it doesn't really matter whether we increment the page number and
reset the page position, because we're done and we won't come back
here again.  The code previously skipped over that last case,
basically.  The new code handles that case the same as the others,
incrementing and resetting.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e788182f

libceph: kill args in read_partial_message_bio() · b3d56fab

由 Alex Elder 提交于 3月 08, 2013

There is only one caller for read_partial_message_bio(), and it
always passes &msg->bio_iter and &bio_seg as the second and third
arguments.  Furthermore, the message in question is always the
connection's in_msg, and we can get that inside the called function.

So drop those two parameters and use their derived equivalents.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b3d56fab

libceph: change type of ceph_tcp_sendpage() "more" · e1dcb128

由 Alex Elder 提交于 3月 06, 2013

Change the type of the "more" parameter from int to bool.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e1dcb128

libceph: minor byte order problems in read_partial_message() · 6ebc8b32

由 Alex Elder 提交于 3月 08, 2013

Some values printed are not (necessarily) in CPU order.  We already
have a copy of the converted versions, so use them.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6ebc8b32

libceph: define CEPH_MSG_MAX_MIDDLE_LEN · 7b11ba37

由 Alex Elder 提交于 3月 08, 2013

This is probably unnecessary but the code read as if it were wrong
in read_partial_message().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

7b11ba37

libceph: clean up skipped message logic · 4137577a

由 Alex Elder 提交于 3月 05, 2013

In ceph_con_in_msg_alloc() it is possible for a connection's
alloc_msg method to indicate an incoming message should be skipped.
By default, read_partial_message() initializes the skip variable
to 0 before it gets provided to ceph_con_in_msg_alloc().

The osd client, mon client, and mds client each supply an alloc_msg
method.  The mds client always assigns skip to be 0.

The other two leave the skip value of as-is, or assigns it to zero,
except:
    - if no (osd or mon) request having the given tid is found, in
      which case skip is set to 1 and NULL is returned; or
    - in the osd client, if the data of the reply message is not
      adequate to hold the message to be read, it assigns skip
      value 1 and returns NULL.
So the returned message pointer will always be NULL if skip is ever
non-zero.

Clean up the logic a bit in ceph_con_in_msg_alloc() to make this
state of affairs more obvious.  Add a comment explaining how a null
message pointer can mean either a message that should be skipped or
a problem allocating a message.

This resolves:
    http://tracker.ceph.com/issues/4324Reported-by: NGreg Farnum <greg@inktank.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

4137577a

libceph: separate read and write data · 0fff87ec

由 Alex Elder 提交于 2月 14, 2013

An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.

Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.

This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data.  It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.

This resolves:
    http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0fff87ec

libceph: distinguish page and bio requests · 2ac2b7a6

由 Alex Elder 提交于 2月 14, 2013

An osd request uses either pages or a bio list for its data.  Use a
union to record information about the two, and add a data type
tag to select between them.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

2ac2b7a6

libceph: separate osd request data info · 2794a82a

由 Alex Elder 提交于 2月 14, 2013

Pull the fields in an osd request structure that define the data for
the request out into a separate structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

2794a82a

libceph: don't assign page info in ceph_osdc_new_request() · 153e5167

由 Alex Elder 提交于 3月 01, 2013

Currently ceph_osdc_new_request() assigns an osd request's
r_num_pages and r_alignment fields.  The only thing it does
after that is call ceph_osdc_build_request(), and that doesn't
need those fields to be assigned.

Move the assignment of those fields out of ceph_osdc_new_request()
and into its caller.  As a result, the page_align parameter is no
longer used, so get rid of it.

Note that in ceph_sync_write(), the value for req->r_num_pages had
already been calculated earlier (as num_pages, and fortunately
it was computed the same way).  So don't bother recomputing it,
but because it's not needed earlier, move that calculation after the
call to ceph_osdc_new_request().  Hold off making the assignment to
r_alignment, doing it instead r_pages and r_num_pages are
getting set.

Similarly, in start_read(), nr_pages already holds the number of
pages in the array (and is calculated the same way), so there's no
need to recompute it.  Move the assignment of the page alignment
down with the others there as well.

This and the next few patches are preparation work for:
    http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

153e5167

libceph: define mds_alloc_msg() method · 53ded495

由 Alex Elder 提交于 3月 01, 2013

The only user of the ceph messenger that doesn't define an alloc_msg
method is the mds client.  Define one, such that it works just like
it did before, and simplify ceph_con_in_msg_alloc() by assuming the
alloc_msg method is always present.

This and the next patch resolve:
    http://tracker.ceph.com/issues/4322Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

53ded495

libceph: drop mutex while allocating a message · 1d866d1c

由 Alex Elder 提交于 3月 01, 2013

In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
connection a new message is allocated with ceph_msg_new().

Drop the mutex before making this call, and make sure we're still
connected when we get it back again.

This is preparing for the next patch, which ensures all connections
define an alloc_msg method, and then handles them all the same way.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

1d866d1c

libceph: rename ceph_calc_object_layout() · 41766f87

由 Alex Elder 提交于 3月 01, 2013

The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

41766f87

libceph: kill ceph_msg->pagelist_count · ec02a2f2

由 Alex Elder 提交于 3月 01, 2013

The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ec02a2f2

libceph: fix wrong opcode use in osd_req_encode_op() · 8f63ca2d

由 Alex Elder 提交于 3月 04, 2013

The new cases added to osd_req_encode_op() caused a new sparse
error, which highlighted an existing problem that had been
overlooked since it was originally checked in.  When an unsupported
opcode is found the destination rather than the source opcode was
being used in the error message.  The two differ in their byte
order, and we want to be using the one in the source.

Fix the problem in both spots.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

8f63ca2d

libceph: complete lingering requests only once · 0d5af164

由 Alex Elder 提交于 2月 27, 2013

An osd request marked to linger will be re-submitted in the event
a connection to the target osd gets dropped.  Currently, if there
is a callback function associated with a request it will be called
each time a request is submitted--which for lingering requests can
be more than once.

Change it so a request--including lingering ones--will get completed
(from the perspective of the user of the osd client) exactly once.

This resolves:
    http://tracker.ceph.com/issues/3967Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

0d5af164

libceph: set page alignment in start_request() · f51a822c

由 Alex Elder 提交于 2月 14, 2013

The page alignment field for a request is currently set in
ceph_osdc_build_request().  It's not needed at that point
nor do either of its callers need that value assigned at
any point before they call ceph_osdc_start_request().

So move that assignment into ceph_osdc_start_request().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f51a822c

libceph: distinguish page array and pagelist count · d4b515fa

由 Alex Elder 提交于 2月 25, 2013

Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d4b515fa

libceph: don't pass request to calc_layout() · 60cf5992

由 Alex Elder 提交于 2月 15, 2013

The only remaining reason to pass the osd request to calc_layout()
is to fill in its r_num_pages and r_page_alignment fields.  Once it
fills those in, it doesn't do anything more with them.

We can therefore move those assignments into the caller, and get rid
of the "req" parameter entirely.

Note, however, that the only caller is ceph_osdc_new_request(),
and that immediately overwrites those fields with values based on
its passed-in page offset.  So the assignment inside calc_layout()
was redundant anyway.

This resolves:
    http://tracker.ceph.com/issues/4262Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

60cf5992

libceph: format target object name in caller · dbe0fc41

由 Alex Elder 提交于 2月 15, 2013

Move the formatting of the object name (oid) to use for an object
request into the caller of calc_layout().  This makes the "vino"
parameter no longer necessary, so get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

dbe0fc41

libceph: pass object number back to calc_layout() caller · 47a05811

由 Alex Elder 提交于 2月 15, 2013

Have calc_layout() pass the computed object number back to its
caller.  (This is a small step to simplify review.)
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

47a05811

libceph: make ceph_msg->bio_seg be unsigned · 07c09b72

由 Alex Elder 提交于 2月 15, 2013

The bio_seg field is used by the ceph messenger in iterating through
a bio.  It should never have a negative value, so make it an
unsigned.  (I contemplated making it unsigned short to match the
struct bio definition, but it offered no benefit.)

Change variables used to hold bio_seg values to all be unsigned as
well.  Change two variable names in init_bio_iter() to match the
convention used everywhere else.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

07c09b72

libceph: fix a osd request memory leak · 3ff5f385

由 Alex Elder 提交于 2月 15, 2013

If an invalid layout is provided to ceph_osdc_new_request(), its
call to calc_layout() might return an error.  At that point in the
function we've already allocated an osd request structure, so we
need to free it (drop a reference) in the event such an error
occurs.

The only other value calc_layout() will return is 0, so make that
explicit in the successful case.

This resolves:
    http://tracker.ceph.com/issues/4240Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3ff5f385

12 3月, 2013 1 次提交

libceph: fix decoding of pgids · d6c0dd6b

由 Sage Weil 提交于 3月 06, 2013

In 4f6a7e5e we effectively dropped support
for the legacy encoding for the OSDMap and incremental.  However, we didn't
fix the decoding for the pgid.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

d6c0dd6b

27 2月, 2013 6 次提交

libceph: add support for HASHPSPOOL pool flag · 83ca14fd

由 Sage Weil 提交于 2月 26, 2013

The legacy behavior adds the pgid seed and pool together as the input for
CRUSH.  That is problematic because each pool's PGs end up mapping to the
same OSDs: 1.5 == 2.4 == 3.3 == ...

Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and
feed that into CRUSH.  This ensures that two adjacent pools will map to
an independent pseudorandom set of OSDs.

Advertise our support for this via a protocol feature flag.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

83ca14fd

libceph: update osd request/reply encoding · 1b83bef2

由 Sage Weil 提交于 2月 25, 2013

Use the new version of the encoding for osd requests and replies.  In the
process, update the way we are tracking request ops and reply lengths and
results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
appropriately.

The main changes are:
 - we keep pointers into the request memory for fields we need to update
   each time the request is sent out over the wire
 - we keep information about the result in an array in the request struct
   where the users can easily get at it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

1b83bef2

libceph: calculate placement based on the internal data types · 2169aea6

由 Sage Weil 提交于 2月 25, 2013

Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type.  This allows us to
pass the full 32-bit precision of the pgid.seed to the callers.  It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

2169aea6

ceph: update support for PGID64, PGPOOL3, OSDENC protocol features · 4f6a7e5e

由 Sage Weil 提交于 2月 23, 2013

Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
These have been present in ceph.git since v0.42, Feb 2012.  Require these
features to simplify support; nobody is running older userspace.

Note that the new request and reply encoding is still not in place, so the new
code is not yet functional.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4f6a7e5e

libceph: decode into cpu-native ceph_pg type · 5b191d99

由 Sage Weil 提交于 2月 23, 2013

Always decode data into our cpu-native ceph_pg type that has the correct
field widths.  Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy protocol.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

5b191d99

libceph: rename ceph_pg -> ceph_pg_v1 · 12979354

由 Sage Weil 提交于 1月 08, 2013

Rename the old version this type to distinguish it from the new version.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

12979354

26 2月, 2013 6 次提交

libceph: use a do..while loop in con_work() · 49659416

由 Alex Elder 提交于 2月 19, 2013

This just converts a manually-implemented loop into a do..while loop
in con_work().  It also moves handling of EAGAIN inside the blocks
where it's already been determined an error code was returned.

Also update a few dout() calls near the affected code for
consistency.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

49659416

libceph: use a flag to indicate a fault has occurred · b6e7b6a1

由 Alex Elder 提交于 2月 19, 2013

This just rearranges the logic in con_work() a little bit so that a
flag is used to indicate a fault has occurred.  This allows both the
fault and non-fault case to be handled the same way and avoids a
couple of nearly consecutive gotos.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b6e7b6a1

libceph: separate non-locked fault handling · 93209264

由 Alex Elder 提交于 2月 19, 2013

An error occurring on a ceph connection is treated as a fault,
causing the connection to be reset.  The initial part of this fault
handling has to be done while holding the connection mutex, but
it must then be dropped for the last part.

Separate the part of this fault handling that executes without the
lock into its own function, con_fault_finish().  Move the call to
this new function, as well as call that drops the connection mutex,
into ceph_fault().  Rename that function con_fault() to reflect that
it's only handling the connection part of the fault handling.

The motivation for this was a warning from sparse about the locking
being done here.  Rearranging things this way keeps all the mutex
manipulation within ceph_fault(), and this stops sparse from
complaining.

This partially resolves:
    http://tracker.ceph.com/issues/4184Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

93209264

libceph: encapsulate connection backoff · f20a39fd

由 Alex Elder 提交于 2月 19, 2013

Collect the code that tests for and implements a backoff delay for a
ceph connection into a new function, ceph_backoff().

Make the debug output messages in that part of the code report
things consistently by reporting a message in the socket closed
case, and by making the one for PREOPEN state report the connection
pointer like the rest.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f20a39fd

libceph: eliminate sparse warnings · 15417167

由 Alex Elder 提交于 2月 19, 2013

Eliminate most of the problems in the libceph code that cause sparse
to issue warnings.
    - Convert functions that are never referenced externally to have
      static scope.
    - Pass NULL rather than 0 for a pointer argument in one spot in
      ceph_monc_delete_snapid()

This partially resolves:
    http://tracker.ceph.com/issues/4184Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

15417167

libceph: define connection flag helpers · c9ffc77a

由 Alex Elder 提交于 2月 20, 2013

Define and use functions that encapsulate operations performed on
a connection's flags.

This resolves:
    http://tracker.ceph.com/issues/4234Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c9ffc77a

20 2月, 2013 2 次提交

libceph: drop return value from page vector copy routines · 903bb32e

由 Alex Elder 提交于 2月 06, 2013

The return values provided for ceph_copy_to_page_vector() and
ceph_copy_from_page_vector() serve no purpose, so get rid of them.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

903bb32e

libceph: use void pointers in page vector functions · b324814e

由 Alex Elder 提交于 2月 06, 2013

The functions used for working with ceph page vectors are defined
with char pointers, but they're really intended to operate on
untyped data.  Change the types of these function parameters
to (void *) to reflect this.

(Note that the functions now assume void pointer arithmetic works
like arithmetic on char pointers.)
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b324814e