提交 · d4f65b5d2497b2fd9c45f06b71deb4ab084a5b66 · openanolis / cloud-kernel

13 9月, 2012 1 次提交

KEYS: Add payload preparsing opportunity prior to key instantiate or update · d4f65b5d

由 David Howells 提交于 9月 13, 2012

Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called.  This is done with the
provision of two new key type operations:

	int (*preparse)(struct key_preparsed_payload *prep);
	void (*free_preparse)(struct key_preparsed_payload *prep);

If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases).  The second operation is called to clean up if the first
was called.

preparse() is given the opportunity to fill in the following structure:

	struct key_preparsed_payload {
		char		*description;
		void		*type_data[2];
		void		*payload;
		const void	*data;
		size_t		datalen;
		size_t		quotalen;
	};

Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.

The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.

The preparser may also propose a description for the key by attaching it as a
string to the description field.  This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function.  This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.

This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.

The instantiate() and update() operations are then modified to look like this:

	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
	int (*update)(struct key *key, struct key_preparsed_payload *prep);

and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

d4f65b5d

22 8月, 2012 1 次提交

libceph: avoid truncation due to racing banners · 6d4221b5

由 Jim Schutt 提交于 8月 10, 2012

Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work().  During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner.  In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client.  As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read().  If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner.  Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

6d4221b5

21 8月, 2012 1 次提交

libceph: delay debugfs initialization until we learn global_id · d1c338a5

由 Sage Weil 提交于 8月 19, 2012

The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).

Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

d1c338a5

03 8月, 2012 1 次提交

libceph: fix crypto key null deref, memory leak · f0666b1a

由 Sylvain Munaut 提交于 8月 02, 2012

Avoid crashing if the crypto key payload was NULL, as when it was not correctly
allocated and initialized.  Also, avoid leaking it.
Signed-off-by: NSylvain Munaut <tnt@246tNt.com>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

f0666b1a

31 7月, 2012 29 次提交

libceph: recheck con state after allocating incoming message · 61399191

由 Sage Weil 提交于 7月 30, 2012

We drop the lock when calling the ->alloc_msg() con op, which means
we need to (a) not clobber con->in_msg without the mutex held, and (b)
we need to verify that we are still in the OPEN state when we retake
it to avoid causing any mayhem.  If the state does change, -EAGAIN
will get us back to con_work() and loop.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

61399191

libceph: change ceph_con_in_msg_alloc convention to be less weird · 4740a623

由 Sage Weil 提交于 7月 30, 2012

This function's calling convention is very limiting.  In particular,
we can't return any error other than ENOMEM (and only implicitly),
which is a problem (see next patch).

Instead, return an normal 0 or error code, and make the skip a pointer
output parameter.  Drop the useless in_hdr argument (we have the con
pointer).
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4740a623

libceph: avoid dropping con mutex before fault · 8636ea67

由 Sage Weil 提交于 7月 30, 2012

The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it.  This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).

Add annotation so that lockdep realizes we will drop the mutex before
returning.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8636ea67

libceph: verify state after retaking con lock after dispatch · 7b862e07

由 Sage Weil 提交于 7月 30, 2012

We drop the con mutex when delivering a message.  When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

7b862e07

libceph: revoke mon_client messages on session restart · 4f471e4a

由 Sage Weil 提交于 7月 30, 2012

Revoke all mon_client messages when we shut down the old connection.
This is mostly moot since we are re-using the same ceph_connection,
but it is cleaner.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4f471e4a

libceph: fix handling of immediate socket connect failure · 8007b8d6

由 Sage Weil 提交于 7月 30, 2012

If the connect() call immediately fails such that sock == NULL, we
still need con_close_socket() to reset our socket state to CLOSED.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8007b8d6

libceph: be less chatty about stray replies · 756a16a5

由 Sage Weil 提交于 7月 30, 2012

There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts.  There's no need to spam the console about it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

756a16a5

S
libceph: clear all flags on con_close · 43c7427d
由 Sage Weil 提交于 7月 20, 2012
```
Signed-off-by: NSage Weil <sage@inktank.com>
```
43c7427d

libceph: clean up con flags · 4a861692

由 Sage Weil 提交于 7月 20, 2012

Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: NSage Weil <sage@inktank.com>

4a861692

libceph: replace connection state bits with states · 8dacc7da

由 Sage Weil 提交于 7月 20, 2012

Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.

This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: NSage Weil <sage@inktank.com>

8dacc7da

S
libceph: drop unnecessary CLOSED check in socket state change callback · d7353dd5
由 Sage Weil 提交于 7月 20, 2012
```
If we are CLOSED, the socket is closed and we won't get these.
Signed-off-by: NSage Weil <sage@inktank.com>
```
d7353dd5

libceph: close socket directly from ceph_con_close() · ee76e073

由 Sage Weil 提交于 7月 20, 2012

It is simpler to do this immediately, since we already hold the con mutex.
It also avoids the need to deal with a not-quite-CLOSED socket in con_work.
Signed-off-by: NSage Weil <sage@inktank.com>

ee76e073

libceph: drop gratuitous socket close calls in con_work · 2e8cb100

由 Sage Weil 提交于 7月 20, 2012

If the state is CLOSED or OPENING, we shouldn't have a socket.
Signed-off-by: NSage Weil <sage@inktank.com>

2e8cb100

libceph: move ceph_con_send() closed check under the con mutex · a59b55a6

由 Sage Weil 提交于 7月 20, 2012

Take the con mutex before checking whether the connection is closed to
avoid racing with someone else closing it.
Signed-off-by: NSage Weil <sage@inktank.com>

a59b55a6

libceph: move msgr clear_standby under con mutex protection · 00650931

由 Sage Weil 提交于 7月 20, 2012

Avoid dropping and retaking con->mutex in the ceph_con_send() case by
leaving locking up to the caller.
Signed-off-by: NSage Weil <sage@inktank.com>

00650931

libceph: fix fault locking; close socket on lossy fault · 3b5ede07

由 Sage Weil 提交于 7月 20, 2012

If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.

We should also take the con mutex before printing out the state bits in
the debug output.
Signed-off-by: NSage Weil <sage@inktank.com>

3b5ede07

libceph: trivial fix for the incorrect debug output · 048a9d2d

由 Jiaju Zhang 提交于 7月 20, 2012

This is a trivial fix for the debug output, as it is inconsistent
with the function name so may confuse people when debugging.

[elder@inktank.com: switched to use __func__]
Signed-off-by: NJiaju Zhang <jjzhang@suse.de>
Reviewed-by: NAlex Elder <elder@inktank.com>

048a9d2d

libceph: reset connection retry on successfully negotiation · 85effe18

由 Sage Weil 提交于 7月 30, 2012

We exponentially back off when we encounter connection errors.  If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.

Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node.  Fixes ceph issue #2802.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

85effe18

libceph: protect ceph_con_open() with mutex · 5469155f

由 Sage Weil 提交于 7月 30, 2012

Take the con mutex while we are initiating a ceph open.  This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

5469155f

libceph: (re)initialize bio_iter on start of message receive · a4107026

由 Sage Weil 提交于 7月 30, 2012

Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

a4107026

libceph: resubmit linger ops when pg mapping changes · 6194ea89

由 Sage Weil 提交于 7月 30, 2012

The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6194ea89

libceph: fix mutex coverage for ceph_con_close · 8c50c817

由 Sage Weil 提交于 7月 30, 2012

Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

8c50c817

libceph: report socket read/write error message · 3a140a0d

由 Sage Weil 提交于 7月 30, 2012

We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

3a140a0d

libceph: support crush tunables · 546f04ef

由 Sage Weil 提交于 7月 30, 2012

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: Ncaleb miles <caleb.miles@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

546f04ef

libceph: move feature bits to separate header · 1fe60e51

由 Sage Weil 提交于 7月 30, 2012

This is simply cleanup that will keep things more closely synced with the
userland code.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

1fe60e51

libceph: prevent the race of incoming work during teardown · a2a32584

由 Guanjun He 提交于 7月 08, 2012

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: NGuanjun He <gjhe@suse.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a2a32584

libceph: fix messenger retry · a16cb1f7

由 Sage Weil 提交于 7月 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: NSage Weil <sage@inktank.com>

a16cb1f7

libceph: initialize rb, list nodes in ceph_osd_request · cd43045c

由 Sage Weil 提交于 7月 09, 2012

These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

cd43045c

libceph: initialize msgpool message types · d50b409f

由 Sage Weil 提交于 7月 09, 2012

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

d50b409f

18 7月, 2012 1 次提交

libceph: fix messenger retry · 5bdca4e0

由 Sage Weil 提交于 7月 10, 2012

5bdca4e0

11 7月, 2012 1 次提交
- B
  net: Fix non-kernel-doc comments with kernel-doc start marker · ae86b9e3
  由 Ben Hutchings 提交于 7月 10, 2012
```
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ae86b9e3
06 7月, 2012 5 次提交

libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47

由 Sage Weil 提交于 6月 27, 2012

It is possible to close a socket that is in the OPENING state.  For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established.  con_work() will come around and shut
down the socket.
Signed-off-by: NSage Weil <sage@inktank.com>

fbb85a47

libceph: initialize mon_client con only once · 735a72ef

由 Sage Weil 提交于 6月 27, 2012

Do not re-initialize the con on every connection attempt. When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.
Signed-off-by: NSage Weil <sage@inktank.com>

735a72ef

libceph: set peer name on con_open, not init · b7a9e5dd

由 Sage Weil 提交于 6月 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: NSage Weil <sage@inktank.com>

b7a9e5dd

libceph: add some fine ASCII art · bc18f4b1

由 Alex Elder 提交于 6月 20, 2012

Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bc18f4b1

libceph: small changes to messenger.c · 5821bd8c

由 Alex Elder 提交于 6月 11, 2012

This patch gathers a few small changes in "net/ceph/messenger.c":
  out_msg_pos_next()
    - small logic change that mostly affects indentation
  write_partial_msg_pages().
    - use a local variable trail_off to represent the offset into
      a message of the trail portion of the data (if present)
    - once we are in the trail portion we will always be there, so we
      don't always need to check against our data position
    - avoid computing len twice after we've reached the trail
    - get rid of the variable tmpcrc, which is not needed
    - trail_off and trail_len never change so mark them const
    - update some comments
  read_partial_message_bio()
    - bio_iovec_idx() will never return an error, so don't bother
      checking for it
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5821bd8c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功