提交 · a255651d4cad89f1a606edd36135af892ada4f20 · openanolis / cloud-kernel

17 5月, 2012 8 次提交

ceph: messenger: check return from get_authorizer · ed96af64

由 Alex Elder 提交于 5月 16, 2012

In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value.  This function can
return an error, so check for it and return it if that ever occurs.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ed96af64

ceph: messenger: rework prepare_connect_authorizer() · b1c6b980

由 Alex Elder 提交于 5月 16, 2012

Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.

Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b1c6b980

ceph: messenger: check prepare_write_connect() result · 5a0f8fdd

由 Alex Elder 提交于 5月 16, 2012

prepare_write_connect() can return an error, but only one of its
callers checks for it.  All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5a0f8fdd

ceph: don't set WRITE_PENDING too early · e10c758e

由 Alex Elder 提交于 5月 16, 2012

prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection.  Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending.  It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.

Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.

Partially addresses http://tracker.newdream.net/issues/2424Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e10c758e

ceph: drop msgr argument from prepare_write_connect() · e825a66d

由 Alex Elder 提交于 5月 16, 2012

In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr.  Just get the msgr
value from the ceph connection and drop the unneeded argument.

The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e825a66d

ceph: messenger: send banner in process_connect() · 41b90c00

由 Alex Elder 提交于 5月 16, 2012

prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message.  It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

41b90c00

ceph: messenger: reset connection kvec caller · 84fb3adf

由 Alex Elder 提交于 5月 16, 2012

Reset a connection's kvec fields in the caller rather than in
prepare_write_connect().   This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

84fb3adf

libceph: don't reset kvec in prepare_write_banner() · d329156f

由 Alex Elder 提交于 5月 16, 2012

Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

d329156f

15 5月, 2012 3 次提交

ceph: messenger: change read_partial() to take "end" arg · fd51653f

由 Alex Elder 提交于 5月 10, 2012

Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents.  This
amounts to moving the addition "to + size" into the caller.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

fd51653f

ceph: messenger: update "to" in read_partial() caller · e6cee71f

由 Alex Elder 提交于 5月 10, 2012

read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.

Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e6cee71f

ceph: messenger: use read_partial() in read_partial_message() · 57dac9d1

由 Alex Elder 提交于 5月 10, 2012

There are two blocks of code in read_partial_message()--those that
read the header and footer of the message--that can be replaced by a
call to read_partial().  Do that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

57dac9d1

22 3月, 2012 23 次提交

libceph: isolate kmap() call in write_partial_msg_pages() · 8d63e318

由 Alex Elder 提交于 3月 07, 2012

In write_partial_msg_pages(), every case now does an identical call
to kmap(page).  Instead, just call it once inside the CRC-computing
block where it's needed.  Move the definition of kaddr inside that
block, and make it a (char *) to ensure portable pointer arithmetic.

We still don't kunmap() it until after the sendpage() call, in case
that also ends up needing to use the mapping.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

8d63e318

libceph: rename "page_shift" variable to something sensible · 9bd19663

由 Alex Elder 提交于 3月 07, 2012

In write_partial_msg_pages() there is a local variable used to
track the starting offset within a bio segment to use.  Its name,
"page_shift" defies the Linux convention of using that name for
log-base-2(page size).

Since it's only used in the bio case rename it "bio_offset".  Use it
along with the page_pos field to compute the memory offset when
computing CRC's in that function.  This makes the bio case match the
others more closely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

9bd19663

libceph: get rid of zero_page_address · 0cdf9e60

由 Alex Elder 提交于 3月 07, 2012

There's not a lot of benefit to zero_page_address, which basically
holds a mapping of the zero page through the life of the messenger
module.  Even with our own mapping, the sendpage interface where
it's used may need to kmap() it again.  It's almost certain to
be in low memory anyway.

So stop treating the zero page specially in write_partial_msg_pages()
and just get rid of zero_page_address entirely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

0cdf9e60

libceph: only call kernel_sendpage() via helper · e36b13cc

由 Alex Elder 提交于 3月 07, 2012

Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

e36b13cc

libceph: use kernel_sendpage() for sending zeroes · 31739139

由 Alex Elder 提交于 3月 07, 2012

If a message queued for send gets revoked, zeroes are sent over the
wire instead of any unsent data.  This is done by constructing a
message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().

Since we are already working with a page in this case we can use
the sendpage interface instead.  Create a new ceph_tcp_sendpage()
helper that sets up flags to match the way ceph_tcp_sendmsg()
does now.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

31739139

libceph: fix inverted crc option logic · 37675b0f

由 Alex Elder 提交于 3月 07, 2012

CRC's are computed for all messages between ceph entities. The CRC
computation for the data portion of message can optionally be
disabled using the "nocrc" (common) ceph option. The default is
for CRC computation for the data portion to be enabled.

Unfortunately, the code that implements this feature interprets the
feature flag wrong, meaning that by default the CRC's have *not*
been computed (or checked) for the data portion of messages unless
the "nocrc" option was supplied.

Fix this, in write_partial_msg_pages() and read_partial_message().
Also change the flag variable in write_partial_msg_pages() to be
"no_datacrc" to match the usage elsewhere in the file.

This fixes http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

37675b0f

libceph: some simple changes · 84495f49

由 Alex Elder 提交于 2月 15, 2012

Nothing too big here.
    - define the size of the buffer used for consuming ignored
      incoming data using a symbolic constant
    - simplify the condition determining whether to unmap the page
      in write_partial_msg_pages(): do it for crc but not if the
      page is the zero page
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

84495f49

libceph: small refactor in write_partial_kvec() · f42299e6

由 Alex Elder 提交于 2月 15, 2012

Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

f42299e6

libceph: do crc calculations outside loop · fe3ad593

由 Alex Elder 提交于 2月 15, 2012

Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

fe3ad593

libceph: separate CRC calculation from byte swapping · a9a0c51a

由 Alex Elder 提交于 2月 15, 2012

Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a9a0c51a

libceph: use "do" in CRC-related Boolean variables · bca064d2

由 Alex Elder 提交于 2月 15, 2012

Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages(): the value of "do_crc"
assigned appears to be the opposite of what it should be. No
attempt to fix this is made here; this change preserves the
erroneous behavior. The problem I found is documented here:
http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

bca064d2

libceph: a few small changes · d3002b97

由 Alex Elder 提交于 2月 14, 2012

This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3002b97

libceph: make ceph_tcp_connect() return int · 41617d0c

由 Alex Elder 提交于 2月 14, 2012

There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

41617d0c

libceph: encapsulate some messenger cleanup code · 6173d1f0

由 Alex Elder 提交于 2月 14, 2012

Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

6173d1f0

libceph: make ceph_msgr_wq private · e0f43c94

由 Alex Elder 提交于 2月 14, 2012

The messenger workqueue has no need to be public.  So give it static
scope.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e0f43c94

libceph: encapsulate connection kvec operations · 859eb799

由 Alex Elder 提交于 2月 14, 2012

Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

859eb799

libceph: move prepare_write_banner() · 963be4d7

由 Alex Elder 提交于 2月 14, 2012

One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

963be4d7

ceph: eliminate some abusive casts · 99f0f3b2

由 Alex Elder 提交于 1月 23, 2012

This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

99f0f3b2

ceph: eliminate some needless casts · bd406145

由 Alex Elder 提交于 1月 23, 2012

This eliminates type casts in some places where they are not
required.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

bd406145

ceph: kill addr_str_lock spinlock; use atomic instead · f64a9317

由 Alex Elder 提交于 1月 23, 2012

A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f64a9317

ceph: make use of "else" where appropriate · a5bc3129

由 Alex Elder 提交于 1月 23, 2012

Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a5bc3129

ceph: use a shared zero page rather than one per messenger · 57666519

由 Alex Elder 提交于 1月 23, 2012

Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

57666519

net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer · 182fac26

由 Jim Schutt 提交于 2月 29, 2012

The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

182fac26

01 11月, 2011 1 次提交

net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules · bc3b2d7f

由 Paul Gortmaker 提交于 7月 15, 2011

These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

bc3b2d7f

26 10月, 2011 3 次提交

ceph: use kernel DNS resolver · ee3b56f2

由 Noah Watkins 提交于 9月 23, 2011

Change ceph_parse_ips to take either names given as
IP addresses or standard hostnames (e.g. localhost).
The DNS lookup is done using the dns_resolver facility
similar to its use in AFS, NFS, and CIFS.

This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER
that controls if this feature is on or off.
Signed-off-by: NNoah Watkins <noahwatkins@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ee3b56f2

libceph: warn on msg allocation failures · f0ed1b7c

由 Sage Weil 提交于 8月 09, 2011

Any non-masked msg allocation failure should generate a warning and stack
trace to the console.  All of these need to eventually be replaced by
safe preallocation or msgpools.
Signed-off-by: NSage Weil <sage@newdream.net>

f0ed1b7c

libceph: don't complain on msgpool alloc failures · b61c2763

由 Sage Weil 提交于 8月 09, 2011

The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.
Signed-off-by: NSage Weil <sage@newdream.net>

b61c2763

17 9月, 2011 1 次提交

libceph: initialize ack_stamp to avoid unnecessary connection reset · c0d5f9db

由 Jim Schutt 提交于 9月 16, 2011

Commit 4cf9d544 recorded when an outgoing ceph message was ACKed,
in order to avoid unnecessary connection resets when an OSD is busy.

However, ack_stamp is uninitialized, so there is a window between
when the message is sent and when it is ACKed in which handle_timeout()
interprets the unitialized value as an expired timeout, and resets
the connection unnecessarily.

Close the window by initializing ack_stamp.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NSage Weil <sage@newdream.net>

c0d5f9db

27 7月, 2011 1 次提交

libceph: don't time out osd requests that haven't been received · 4cf9d544

由 Sage Weil 提交于 7月 26, 2011

Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing). Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

4cf9d544

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功