提交 · ec02a2f2ffae13e038453ae89592a8c6210f7f4d · openeuler / raspberrypi-kernel

02 5月, 2013 3 次提交

libceph: kill ceph_msg->pagelist_count · ec02a2f2

由 Alex Elder 提交于 3月 01, 2013

The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ec02a2f2

libceph: distinguish page array and pagelist count · d4b515fa

由 Alex Elder 提交于 2月 25, 2013

Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d4b515fa

libceph: make ceph_msg->bio_seg be unsigned · 07c09b72

由 Alex Elder 提交于 2月 15, 2013

The bio_seg field is used by the ceph messenger in iterating through
a bio.  It should never have a negative value, so make it an
unsigned.  (I contemplated making it unsigned short to match the
struct bio definition, but it offered no benefit.)

Change variables used to hold bio_seg values to all be unsigned as
well.  Change two variable names in init_bio_iter() to match the
convention used everywhere else.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

07c09b72

26 2月, 2013 6 次提交

libceph: use a do..while loop in con_work() · 49659416

由 Alex Elder 提交于 2月 19, 2013

This just converts a manually-implemented loop into a do..while loop
in con_work().  It also moves handling of EAGAIN inside the blocks
where it's already been determined an error code was returned.

Also update a few dout() calls near the affected code for
consistency.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

49659416

libceph: use a flag to indicate a fault has occurred · b6e7b6a1

由 Alex Elder 提交于 2月 19, 2013

This just rearranges the logic in con_work() a little bit so that a
flag is used to indicate a fault has occurred.  This allows both the
fault and non-fault case to be handled the same way and avoids a
couple of nearly consecutive gotos.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

b6e7b6a1

libceph: separate non-locked fault handling · 93209264

由 Alex Elder 提交于 2月 19, 2013

An error occurring on a ceph connection is treated as a fault,
causing the connection to be reset.  The initial part of this fault
handling has to be done while holding the connection mutex, but
it must then be dropped for the last part.

Separate the part of this fault handling that executes without the
lock into its own function, con_fault_finish().  Move the call to
this new function, as well as call that drops the connection mutex,
into ceph_fault().  Rename that function con_fault() to reflect that
it's only handling the connection part of the fault handling.

The motivation for this was a warning from sparse about the locking
being done here.  Rearranging things this way keeps all the mutex
manipulation within ceph_fault(), and this stops sparse from
complaining.

This partially resolves:
    http://tracker.ceph.com/issues/4184Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

93209264

libceph: encapsulate connection backoff · f20a39fd

由 Alex Elder 提交于 2月 19, 2013

Collect the code that tests for and implements a backoff delay for a
ceph connection into a new function, ceph_backoff().

Make the debug output messages in that part of the code report
things consistently by reporting a message in the socket closed
case, and by making the one for PREOPEN state report the connection
pointer like the rest.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

f20a39fd

libceph: eliminate sparse warnings · 15417167

由 Alex Elder 提交于 2月 19, 2013

Eliminate most of the problems in the libceph code that cause sparse
to issue warnings.
    - Convert functions that are never referenced externally to have
      static scope.
    - Pass NULL rather than 0 for a pointer argument in one spot in
      ceph_monc_delete_snapid()

This partially resolves:
    http://tracker.ceph.com/issues/4184Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

15417167

libceph: define connection flag helpers · c9ffc77a

由 Alex Elder 提交于 2月 20, 2013

Define and use functions that encapsulate operations performed on
a connection's flags.

This resolves:
    http://tracker.ceph.com/issues/4234Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

c9ffc77a

14 2月, 2013 1 次提交

libceph: fix messenger CONFIG_BLOCK dependencies · 3ebc21f7

由 Alex Elder 提交于 1月 31, 2013

The ceph messenger has a few spots that are only used when
bio messages are supported, and that's only when CONFIG_BLOCK
is defined.  This surrounds a couple of spots with #ifdef's
that would cause a problem if CONFIG_BLOCK were not present
in the kernel configuration.

This resolves:
    http://tracker.ceph.com/issues/3976Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

3ebc21f7

28 12月, 2012 2 次提交

libceph: fix protocol feature mismatch failure path · 0fa6ebc6

由 Sage Weil 提交于 12月 27, 2012

We should not set con->state to CLOSED here; that happens in
ceph_fault() in the caller, where it first asserts that the state
is not yet CLOSED.  Avoids a BUG when the features don't match.

Since the fail_protocol() has become a trivial wrapper, replace
calls to it with direct calls to reset_connection().
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

0fa6ebc6

libceph: WARN, don't BUG on unexpected connection states · 122070a2

由 Alex Elder 提交于 12月 26, 2012

A number of assertions in the ceph messenger are implemented with
BUG_ON(), killing the system if connection's state doesn't match
what's expected.  At this point our state model is (evidently) not
well understood enough for these assertions to trigger a BUG().
Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...)
so we learn about these issues without killing the machine.

We now recognize that a connection fault can occur due to a socket
closure at any time, regardless of the state of the connection.  So
there is really nothing we can assert about the state of the
connection at that point so eliminate that assertion.
Reported-by: NUgis <ugis22@gmail.com>
Tested-by: NUgis <ugis22@gmail.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

122070a2

21 12月, 2012 1 次提交

libceph: report connection fault with warning · 28362986

由 Alex Elder 提交于 12月 14, 2012

When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically).  We
currently get an error message on the log whenever this occurs.

A ceph connection will attempt to reestablish a socket connection
repeatedly if a fault occurs.  This means that these error messages
will get repeatedly added to the log, which is undesirable.

Change the error message to be a warning, so they don't get
logged by default.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

28362986

18 12月, 2012 1 次提交

libceph: socket can close in any connection state · 7bb21d68

由 Alex Elder 提交于 12月 07, 2012

A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex).  As a result, the connectino can be in pretty much any state
at the time its socket is closed.

Handle those other cases at the top of con_work().  Pull this whole
block of code into a separate function to reduce the clutter.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7bb21d68

27 10月, 2012 1 次提交

libceph: avoid NULL kref_put from NULL alloc_msg return · 7246240c

由 Sage Weil 提交于 10月 25, 2012

The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which
may return NULL. It also drops con->mutex while it allocates a message,
which means that the connection state may change (e.g., get closed). If
that happens, we clean up and bail out. Avoid calling ceph_msg_put() on
a NULL return value and triggering a crash.

This was observed when an ->alloc_msg() call races with a timeout that
resends a zillion messages and resets the connection, and ->alloc_msg()
returns NULL (because the request was resent to another target).

Fixes http://tracker.newdream.net/issues/3342Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

7246240c

25 10月, 2012 1 次提交

libceph: avoid NULL kref_put when osd reset races with alloc_msg · 9bd95261

由 Sage Weil 提交于 10月 24, 2012

The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a
message. If that races with a timeout that resends a zillion messages and
resets the connection, and the ->alloc_msg() method returns a NULL message,
it will call ceph_msg_put(NULL) and BUG.

Fix by only calling put if msg is non-NULL.

Fixes http://tracker.newdream.net/issues/3142Signed-off-by: NSage Weil <sage@inktank.com>

9bd95261

10 10月, 2012 3 次提交

rbd: define common queue_con_delay() · 802c6d96

由 Alex Elder 提交于 10月 08, 2012

This patch defines a single function, queue_con_delay() to call
queue_delayed_work() for a connection.  It basically generalizes
what was previously queue_con() by adding the delay argument.
queue_con() is now a simple helper that passes 0 for its delay.
queue_con_delay() returns 0 if it queued work or an errno if it
did not for some reason.

If con_work() finds the BACKOFF flag set for a connection, it now
calls queue_con_delay() to handle arranging to start again after a
delay.

Note about connection reference counts:  con_work() only ever gets
called as a work item function.  At the time that work is scheduled,
a reference to the connection is acquired, and the corresponding
con_work() call is then responsible for dropping that reference
before it returns.

Previously, the backoff handling inside con_work() silently handed
off its reference to delayed work it scheduled.  Now that
queue_con_delay() is used, a new reference is acquired for the
newly-scheduled work, and the original reference is dropped by the
con->ops->put() call at the end of the function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

802c6d96

rbd: let con_work() handle backoff · 8618e30b

由 Alex Elder 提交于 10月 08, 2012

Both ceph_fault() and con_work() include handling for imposing a
delay before doing further processing on a faulted connection.
The latter is used only if ceph_fault() is unable to.

Instead, just let con_work() always be responsible for implementing
the delay.  After setting up the delay value, set the BACKOFF flag
on the connection unconditionally and call queue_con() to ensure
con_work() will get called to handle it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8618e30b

rbd: reset BACKOFF if unable to re-queue · 588377d6

由 Alex Elder 提交于 10月 08, 2012

If ceph_fault() is unable to queue work after a delay, it sets the
BACKOFF connection flag so con_work() will attempt to do so.

In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
result in newly-queued work, it simply ignores this condition and
proceeds as if no backoff delay were desired.  There are two
problems with this--one of which is a bug.

The first problem is simply that the intended behavior is to back
off, and if we aren't able queue the work item to run after a delay
we're not doing that.

The only reason queue_delayed_work() won't queue work is if the
provided work item is already queued.  In the messenger, this
means that con_work() is already scheduled to be run again.  So
if we simply set the BACKOFF flag again when this occurs, we know
the next con_work() call will again attempt to hold off activity
on the connection until after the delay.

The second problem--the bug--is a leak of a reference count.  If
queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
the connection reference held on entry to con_work().  However,
processing is (was) allowed to continue, and at the end of the
function a second con->ops->put() is called.

This patch fixes both problems.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

588377d6

22 9月, 2012 1 次提交

libceph: only kunmap kmapped pages · 5ce765a5

由 Alex Elder 提交于 9月 21, 2012

In write_partial_msg_pages(), pages need to be kmapped in order to
perform a CRC-32c calculation on them.  As an artifact of the way
this code used to be structured, the kunmap() call was separated
from the kmap() call and both were done conditionally.  But the
conditions under which the kmap() and kunmap() calls were made
differed, so there was a chance a kunmap() call would be done on a
page that had not been mapped.

The symptom of this was tripping a BUG() in kunmap_high() when
pkmap_count[nr] became 0.
Reported-by: NBryan K. Wright <bryan@virginia.edu>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5ce765a5

22 8月, 2012 1 次提交

libceph: avoid truncation due to racing banners · 6d4221b5

由 Jim Schutt 提交于 8月 10, 2012

Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work().  During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner.  In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client.  As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read().  If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner.  Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

6d4221b5

31 7月, 2012 19 次提交

libceph: recheck con state after allocating incoming message · 61399191

由 Sage Weil 提交于 7月 30, 2012

We drop the lock when calling the ->alloc_msg() con op, which means
we need to (a) not clobber con->in_msg without the mutex held, and (b)
we need to verify that we are still in the OPEN state when we retake
it to avoid causing any mayhem.  If the state does change, -EAGAIN
will get us back to con_work() and loop.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

61399191

libceph: change ceph_con_in_msg_alloc convention to be less weird · 4740a623

由 Sage Weil 提交于 7月 30, 2012

This function's calling convention is very limiting.  In particular,
we can't return any error other than ENOMEM (and only implicitly),
which is a problem (see next patch).

Instead, return an normal 0 or error code, and make the skip a pointer
output parameter.  Drop the useless in_hdr argument (we have the con
pointer).
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4740a623

libceph: avoid dropping con mutex before fault · 8636ea67

由 Sage Weil 提交于 7月 30, 2012

The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it.  This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).

Add annotation so that lockdep realizes we will drop the mutex before
returning.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8636ea67

libceph: verify state after retaking con lock after dispatch · 7b862e07

由 Sage Weil 提交于 7月 30, 2012

We drop the con mutex when delivering a message.  When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

7b862e07

libceph: fix handling of immediate socket connect failure · 8007b8d6

由 Sage Weil 提交于 7月 30, 2012

If the connect() call immediately fails such that sock == NULL, we
still need con_close_socket() to reset our socket state to CLOSED.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8007b8d6

S
libceph: clear all flags on con_close · 43c7427d
由 Sage Weil 提交于 7月 20, 2012
```
Signed-off-by: NSage Weil <sage@inktank.com>
```
43c7427d

libceph: clean up con flags · 4a861692

由 Sage Weil 提交于 7月 20, 2012

Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: NSage Weil <sage@inktank.com>

4a861692

libceph: replace connection state bits with states · 8dacc7da

由 Sage Weil 提交于 7月 20, 2012

Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.

This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: NSage Weil <sage@inktank.com>

8dacc7da

S
libceph: drop unnecessary CLOSED check in socket state change callback · d7353dd5
由 Sage Weil 提交于 7月 20, 2012
```
If we are CLOSED, the socket is closed and we won't get these.
Signed-off-by: NSage Weil <sage@inktank.com>
```
d7353dd5

libceph: close socket directly from ceph_con_close() · ee76e073

由 Sage Weil 提交于 7月 20, 2012

It is simpler to do this immediately, since we already hold the con mutex.
It also avoids the need to deal with a not-quite-CLOSED socket in con_work.
Signed-off-by: NSage Weil <sage@inktank.com>

ee76e073

libceph: drop gratuitous socket close calls in con_work · 2e8cb100

由 Sage Weil 提交于 7月 20, 2012

If the state is CLOSED or OPENING, we shouldn't have a socket.
Signed-off-by: NSage Weil <sage@inktank.com>

2e8cb100

libceph: move ceph_con_send() closed check under the con mutex · a59b55a6

由 Sage Weil 提交于 7月 20, 2012

Take the con mutex before checking whether the connection is closed to
avoid racing with someone else closing it.
Signed-off-by: NSage Weil <sage@inktank.com>

a59b55a6

libceph: move msgr clear_standby under con mutex protection · 00650931

由 Sage Weil 提交于 7月 20, 2012

Avoid dropping and retaking con->mutex in the ceph_con_send() case by
leaving locking up to the caller.
Signed-off-by: NSage Weil <sage@inktank.com>

00650931

libceph: fix fault locking; close socket on lossy fault · 3b5ede07

由 Sage Weil 提交于 7月 20, 2012

If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.

We should also take the con mutex before printing out the state bits in
the debug output.
Signed-off-by: NSage Weil <sage@inktank.com>

3b5ede07

libceph: reset connection retry on successfully negotiation · 85effe18

由 Sage Weil 提交于 7月 30, 2012

We exponentially back off when we encounter connection errors.  If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.

Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node.  Fixes ceph issue #2802.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

85effe18

libceph: protect ceph_con_open() with mutex · 5469155f

由 Sage Weil 提交于 7月 30, 2012

Take the con mutex while we are initiating a ceph open.  This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

5469155f

libceph: (re)initialize bio_iter on start of message receive · a4107026

由 Sage Weil 提交于 7月 30, 2012

Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a4107026

libceph: fix mutex coverage for ceph_con_close · 8c50c817

由 Sage Weil 提交于 7月 30, 2012

Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

8c50c817

libceph: report socket read/write error message · 3a140a0d

由 Sage Weil 提交于 7月 30, 2012

We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

3a140a0d