提交 · e27947c767f5bed15048f4e4dad3e2eb69133697 · openeuler / raspberrypi-kernel

06 7月, 2012 1 次提交

libceph: define and use an explicit CONNECTED state · e27947c7

由 Alex Elder 提交于 5月 23, 2012

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e27947c7

22 6月, 2012 1 次提交

libceph: drop ceph_con_get/put helpers and nref member · d59315ca

由 Sage Weil 提交于 6月 21, 2012

These are no longer used.  Every ceph_connection instance is embedded in
another structure, and refcounts manipulated via the get/put ops.
Signed-off-by: NSage Weil <sage@inktank.com>

d59315ca

06 6月, 2012 4 次提交

libceph: make ceph_con_revoke_message() a msg op · 8921d114

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8921d114

libceph: make ceph_con_revoke() a msg operation · 6740a845

由 Alex Elder 提交于 6月 01, 2012

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6740a845

libceph: have messages point to their connection · 38941f80

由 Alex Elder 提交于 6月 01, 2012

When a ceph message is queued for sending it is placed on a list of
pending messages (ceph_connection->out_queue).  When they are
actually sent over the wire, they are moved from that list to
another (ceph_connection->out_sent).  When acknowledgement for the
message is received, it is removed from the sent messages list.

During that entire time the message is "in the possession" of a
single ceph connection.  Keep track of that connection in the
message.  This will be used in the next patch (and is a helpful
bit of information for debugging anyway).
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

38941f80

libceph: fully initialize connection in con_init() · 1bfd89f4

由 Alex Elder 提交于 5月 26, 2012

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

1bfd89f4

01 6月, 2012 5 次提交

libceph: start tracking connection socket state · ce2c8903

由 Alex Elder 提交于 5月 22, 2012

Start explicitly keeping track of the state of a ceph connection's
socket, separate from the state of the connection itself.  Create
placeholder functions to encapsulate the state transitions.

    --------
    | NEW* |  transient initial state
    --------
        | con_sock_state_init()
        v
    ----------
    | CLOSED |  initialized, but no socket (and no
    ----------  TCP connection)
     ^      \
     |       \ con_sock_state_connecting()
     |        ----------------------
     |                              \
     + con_sock_state_closed()       \
     |\                               \
     | \                               \
     |  -----------                     \
     |  | CLOSING |  socket event;       \
     |  -----------  await close          \
     |       ^                            |
     |       |                            |
     |       + con_sock_state_closing()   |
     |      / \                           |
     |     /   ---------------            |
     |    /                   \           v
     |   /                    --------------
     |  /    -----------------| CONNECTING |  socket created, TCP
     |  |   /                 --------------  connect initiated
     |  |   | con_sock_state_connected()
     |  |   v
    -------------
    | CONNECTED |  TCP connection established
    -------------

Make the socket state an atomic variable, reinforcing that it's a
distinct transtion with no possible "intermediate/both" states.
This is almost certainly overkill at this point, though the
transitions into CONNECTED and CLOSING state do get called via
socket callback (the rest of the transitions occur with the
connection mutex held).  We can back out the atomicity later.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>

ce2c8903

libceph: start separating connection flags from state · 928443cd

由 Alex Elder 提交于 5月 22, 2012

A ceph_connection holds a mixture of connection state (as in "state
machine" state) and connection flags in a single "state" field.  To
make the distinction more clear, define a new "flags" field and use
it rather than the "state" field to hold Boolean flag values.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>

928443cd

libceph: embed ceph messenger structure in ceph_client · 15d9882c

由 Alex Elder 提交于 5月 26, 2012

A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

15d9882c

libceph: kill bad_proto ceph connection op · 6384bb8b

由 Alex Elder 提交于 5月 29, 2012

No code sets a bad_proto method in its ceph connection operations
vector, so just get rid of it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

6384bb8b

libceph: eliminate connection state "DEAD" · e5e372da

由 Alex Elder 提交于 5月 22, 2012

The ceph connection state "DEAD" is never set and is therefore not
needed.  Eliminate it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

e5e372da

17 5月, 2012 2 次提交

ceph: use info returned by get_authorizer · 8f43fb53

由 Alex Elder 提交于 5月 16, 2012

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8f43fb53

ceph: have get_authorizer methods return pointers · a3530df3

由 Alex Elder 提交于 5月 16, 2012

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a3530df3

22 3月, 2012 3 次提交

libceph: use "do" in CRC-related Boolean variables · bca064d2

由 Alex Elder 提交于 2月 15, 2012

Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages(): the value of "do_crc"
assigned appears to be the opposite of what it should be. No
attempt to fix this is made here; this change preserves the
erroneous behavior. The problem I found is documented here:
http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

bca064d2

libceph: make ceph_msgr_wq private · e0f43c94

由 Alex Elder 提交于 2月 14, 2012

The messenger workqueue has no need to be public.  So give it static
scope.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e0f43c94

ceph: use a shared zero page rather than one per messenger · 57666519

由 Alex Elder 提交于 1月 23, 2012

Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

57666519

26 10月, 2011 1 次提交

libceph: don't complain on msgpool alloc failures · b61c2763

由 Sage Weil 提交于 8月 09, 2011

The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.
Signed-off-by: NSage Weil <sage@newdream.net>

b61c2763

15 9月, 2011 1 次提交

Remove unneeded version.h includes from include/ · e81b1516

由 Jesper Juhl 提交于 8月 01, 2011

It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in include/.
This patch removes them.

When I last posted the patch, the ceph bit was ACK'ed by Sage Weil, so
I've added that below.

The pwc-ioctl change generated quite a bit of discussion about V4L version
numbers in general, but as far as I can tell, no concensus was reached on
what the long term solution should be, so in the mean time I think we
could start by just removing the unneeded include, which is why I'm
resending the patch with that hunk still included.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Acked-by: NSage Weil <sage@newdream.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

e81b1516

27 7月, 2011 1 次提交

libceph: don't time out osd requests that haven't been received · 4cf9d544

由 Sage Weil 提交于 7月 26, 2011

Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing). Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

4cf9d544

05 3月, 2011 2 次提交

libceph: fix msgr keepalive flag · e76661d0

由 Sage Weil 提交于 3月 03, 2011

There was some broken keepalive code using a dead variable.  Shift to using
the proper bit flag.
Signed-off-by: NSage Weil <sage@newdream.net>

e76661d0

libceph: fix msgr backoff · 60bf8bf8

由 Sage Weil 提交于 3月 04, 2011

With commit f363e45f we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
that the exponential backoff breaks in certain cases:

 * con_work attempts to connect.
 * we get an immediate failure, and the socket state change handler queues
   immediate work.
 * con_work calls con_fault, we decide to back off, but can't queue delayed
   work.

In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).
Signed-off-by: NSage Weil <sage@newdream.net>

60bf8bf8

13 1月, 2011 1 次提交

net/ceph: make ceph_msgr_wq non-reentrant · f363e45f

由 Tejun Heo 提交于 1月 03, 2011

ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs.  This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
  block execution of con_work() to begin with - queue_con() can be
  called after the work started but before BUSY is set.  It seems that
  it was an optimization for a rather cold path and can be safely
  removed.

* The number of concurrent work items is bound by the number of
  connections and connetions are independent from each other.  With
  the default concurrency level, different connections will be
  executed independently.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: NSage Weil <sage@newdream.net>

f363e45f

10 11月, 2010 1 次提交

ceph: explicitly specify page alignment in network messages · c5c6b19d

由 Sage Weil 提交于 11月 09, 2010

The alignment used for reading data into or out of pages used to be taken
from the data_off field in the message header. This only worked as long
as the page alignment matched the object offset, breaking direct io to
non-page aligned offsets.

Instead, explicitly specify the page alignment next to the page vector
in the ceph_msg struct, and use that instead of the message header (which
probably shouldn't be trusted). The alloc_msg callback is responsible for
filling in this field properly when it sets up the page vector.
Signed-off-by: NSage Weil <sage@newdream.net>

c5c6b19d

21 10月, 2010 2 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

ceph: messenger and osdc changes for rbd · 68b4476b

由 Yehuda Sadeh 提交于 4月 06, 2010

Allow the messenger to send/receive data in a bio.  This is added
so that we wouldn't need to copy the data into pages or some other buffer
when doing IO for an rbd block device.

We can now have trailing variable sized data for osd
ops.  Also osd ops encoding is more modular.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

68b4476b

30 5月, 2010 1 次提交

ceph: close out mds, osd connections before stopping auth · a922d38f

由 Sage Weil 提交于 5月 29, 2010

The auth module (part of the mon_client) is needed to free any
ceph_authorizer(s) used by the mds and osd connections. Flush the msgr
workqueue before stopping monc to ensure that the destroy_authorizer
auth op is available when those connections are closed out.
Signed-off-by: NSage Weil <sage@newdream.net>

a922d38f

18 5月, 2010 5 次提交

ceph: all allocation functions should get gfp_mask · 34d23762

由 Yehuda Sadeh 提交于 4月 06, 2010

This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

34d23762

ceph: save peer feature bits in connection structure · aba558e2

由 Sage Weil 提交于 5月 12, 2010

These are used for adjusting behavior, such as conditionally encoding a
newer message format.
Signed-off-by: NSage Weil <sage@newdream.net>

aba558e2

ceph: resync headers with userland · ca9d93a2

由 Sage Weil 提交于 5月 12, 2010

Notable changes include pool op defines and types, FLOCK feature bit, and
new CMPXATTR osd ops.
Signed-off-by: NSage Weil <sage@newdream.net>

ca9d93a2

ceph: clean up connection reset · 6f2bc3ff

由 Sage Weil 提交于 4月 02, 2010

Reset out_keepalive_pending and peer_global_seq, and drop unused var.
Signed-off-by: NSage Weil <sage@newdream.net>

6f2bc3ff

ceph: simplify ceph_msg_new · bb257664

由 Sage Weil 提交于 4月 01, 2010

We only need to pass in front_len.  Callers can attach any other payload
pieces (middle, data) as they see fit.
Signed-off-by: NSage Weil <sage@newdream.net>

bb257664

12 5月, 2010 1 次提交

ceph: preserve seq # on requeued messages after transient transport errors · e84346b7

由 Sage Weil 提交于 5月 11, 2010

If the tcp connection drops and we reconnect to reestablish a stateful
session (with the mds), we need to resend previously sent (and possibly
received) messages with the _same_ seq # so that they can be dropped on
the other end if needed.  Only assign a new seq once after the message is
queued.
Signed-off-by: NSage Weil <sage@newdream.net>

e84346b7

23 3月, 2010 1 次提交

ceph: avoid reopening osd connections when address hasn't changed · 87b315a5

由 Sage Weil 提交于 3月 22, 2010

We get a fault callback on _every_ tcp connection fault.  Normally, we
want to reopen the connection when that happens.  If the address we have
is bad, however, and connection attempts always result in a connection
refused or similar error, explicitly closing and reopening the msgr
connection just prevents the messenger's backoff logic from kicking in.
The result can be a console full of

[ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed

Instead, if we get a fault, and have outstanding requests, but the osd
address hasn't changed and the connection never successfully connected in
the first place, do nothing to the osd connection.  The messenger layer
will back off and retry periodically, because we never connected and thus
the lossy bit is not set.

Instead, touch each request's r_stamp so that handle_timeout can tell the
request is still alive and kicking.
Signed-off-by: NSage Weil <sage@newdream.net>

87b315a5

02 3月, 2010 1 次提交

ceph: reset bits on connection close · 1679f876

由 Sage Weil 提交于 2月 26, 2010

Clear LOSSYTX bit, so that if/when we reconnect, said reconnect
will retry on failure.

Clear _PENDING bits too, to avoid polluting subsequent
connection state.

Drop unused REGISTERED bit.
Signed-off-by: NSage Weil <sage@newdream.net>

1679f876

11 2月, 2010 1 次提交

ceph: allow renewal of auth credentials · 9bd2e6f8

由 Sage Weil 提交于 2月 02, 2010

Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bd2e6f8

26 1月, 2010 3 次提交

ceph: keep reserved replies on the request structure · 0d59ab81

由 Yehuda Sadeh 提交于 1月 13, 2010

This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0d59ab81

ceph: alloc message data pages and check if tid exists · 0547a9b3

由 Yehuda Sadeh 提交于 1月 11, 2010

Now doing it in the same callback that is also responsible for
allocating the 'front' part of the message. If we get a message
that we haven't got a corresponding tid for, mark it for skipping.

Moving the mutex unlock/lock from the osd alloc_msg callback
to the calling function in the messenger.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0547a9b3

ceph: allocate middle of message before stating to read · 2450418c

由 Yehuda Sadeh 提交于 1月 08, 2010

Both front and middle parts of the message are now being
allocated at the ceph_alloc_msg().
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

2450418c

24 12月, 2009 2 次提交

ceph: support ceph_pagelist for message payload · 58bb3b37

由 Sage Weil 提交于 12月 23, 2009

The ceph_pagelist is a simple list of whole pages, strung together via
their lru list_head.  It facilitates encoding to a "buffer" of unknown
size.  Allow its use in place of the ceph_msg page vector.

This will be used to fix the huge buffer preallocation woes of MDS
reconnection.
Signed-off-by: NSage Weil <sage@newdream.net>

58bb3b37

ceph: control access to page vector for incoming data · 350b1c32

由 Sage Weil 提交于 12月 22, 2009

When we issue an OSD read, we specify a vector of pages that the data is to
be read into. The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.

Only read data into the page vector if the reply is coming from the
OSD we last sent the request to. Keep track of which connection is using
the vector by taking a reference. If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.
Signed-off-by: NSage Weil <sage@newdream.net>

350b1c32