- 14 1月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compares front_len of the incoming message with iov_len field of ceph_msg::front structure, which is used primarily to indicate the amount of data already read in, and not the size of the allocated buffer. Under certain conditions (e.g. a short read from a socket followed by that socket's shutdown and owning ceph_connection reset) this results in a warning similar to [85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0) and, through another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer. Fixes: http://tracker.ceph.com/issues/5425Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Rename front_max field of struct ceph_msg to front_alloc_len to make its purpose more clear. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 01 1月, 2014 3 次提交
-
-
由 Ilya Dryomov 提交于
Similar to userspace, don't bail with "parse_ips bad ip ..." if the specified port is port 0, instead use port CEPH_MON_PORT (6789, the default monitor port). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
This updates ceph_features.h so that it has all feature bits defined in ceph.git. In the interim since the last update, ceph.git crossed the "32 feature bits" point, and, the addition of the 33rd bit wasn't handled correctly. The work-around is squashed into this commit and reflects ceph.git commit 053659d05e0349053ef703b414f44965f368b9f0. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 10 8月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
dbf2576e ("workqueue: make all workqueues non-reentrant") made WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages. This patch doesn't introduce any behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NSage Weil <sage@inktank.com> Cc: ceph-devel@vger.kernel.org
-
- 25 7月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2013 2 次提交
-
-
由 Alex Elder 提交于
Create a slab cache to manage ceph_msg_data structure allocation. This is part of: http://tracker.ceph.com/issues/3926Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Create a slab cache to manage ceph_msg structure allocation. This is part of: http://tracker.ceph.com/issues/3926Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
- 02 5月, 2013 31 次提交
-
-
由 Alex Elder 提交于
This patch makes four small changes in the ceph messenger. While getting copyup functionality working I found two bugs in the messenger. Existing paths through the code did not trigger these problems, but they're fixed here: - In ceph_msg_data_pagelist_cursor_init(), the cursor's last_piece field was being checked against the length supplied. This was OK until this commit: ccba6d98 libceph: implement multiple data items in a message That commit changed the cursor init routines to allow lengths to be supplied that exceeded the size of the current data item. Because of this, we have to use the assigned cursor resid field rather than the provided length in determining whether the cursor points to the last piece of a data item. - In ceph_msg_data_add_pages(), a BUG_ON() was erroneously catching attempts to add page data to a message if the message already had data assigned to it. That was OK until that same commit, at which point it was fine for messages to have multiple data items. It slipped through because that BUG_ON() call was present twice in that function. (You can never be too careful.) In addition two other minor things are changed: - In ceph_msg_data_cursor_init(), the local variable "data" was getting assigned twice. - In ceph_msg_data_advance(), it was assumed that the type-specific advance routine would set new_piece to true after it advanced past the last piece. That may have been fine, but since we check for that case we might as well set it explicitly in ceph_msg_data_advance(). This resolves: http://tracker.ceph.com/issues/4762Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Change the names of the functions that put data on a pagelist to reflect that we're adding to whatever's already there rather than just setting it to the one thing. Currently only one data item is ever added to a message, but that's about to change. This resolves: http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
This patch adds support to the messenger for more than one data item in its data list. A message data cursor has two more fields to support this: - a count of the number of bytes left to be consumed across all data items in the list, "total_resid" - a pointer to the head of the list (for validation only) The cursor initialization routine has been split into two parts: the outer one, which initializes the cursor for traversing the entire list of data items; and the inner one, which initializes the cursor to start processing a single data item. When a message cursor is first initialized, the outer initialization routine sets total_resid to the length provided. The data pointer is initialized to the first data item on the list. From there, the inner initialization routine finishes by setting up to process the data item the cursor points to. Advancing the cursor consumes bytes in total_resid. If the resid field reaches zero, it means the current data item is fully consumed. If total_resid indicates there is more data, the cursor is advanced to point to the next data item, and then the inner initialization routine prepares for using that. (A check is made at this point to make sure we don't wrap around the front of the list.) The type-specific init routines are modified so they can be given a length that's larger than what the data item can support. The resid field is initialized to the smaller of the provided length and the length of the entire data item. When total_resid reaches zero, we're done. This resolves: http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
In place of the message data pointer, use a list head which links through message data items. For now we only support a single entry on that list. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Rather than having a ceph message data item point to the cursor it's associated with, have the cursor point to a data item. This will allow a message cursor to be used for more than one data item. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
A message will only be processing a single data item at a time, so there's no need for each data item to have its own cursor. Move the cursor embedded in the message data structure into the message itself. To minimize the impact, keep the data->cursor field, but make it be a pointer to the cursor in the message. Move the definition of ceph_msg_data above ceph_msg_data_cursor so the cursor can point to the data without a forward definition rather than vice-versa. This and the upcoming patches are part of: http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The bio is the only data item type that doesn't record its full length. Fix that. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
We know the length of our message buffers. If we get a message that's too long, just dump it and ignore it. If skip was set then con->in_msg won't be valid, so be careful not to dereference a null pointer in the process. This resolves: http://tracker.ceph.com/issues/4664Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
This patch: 15a0d7b libceph: record message data length did not enclose some bio-specific code inside CONFIG_BLOCK as it should have. Fix that. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
In prepare_message_data(), the length used to initialize the cursor is taken from the header of the message provided. I'm working toward not using the header data length field to determine length in outbound messages, and this is a step in that direction. For inbound messages this will be set to be the actual number of bytes that are arriving (which may be less than the total size of the data buffer available). This resolves: http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Keep track of the length of the data portion for a message in a separate field in the ceph_msg structure. This information has been maintained in wire byte order in the message header, but that's going to change soon. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
When a cursor for a page array data message is initialized it needs to determine the initial value for cursor->last_piece. Currently it just checks if length is less than a page, but that's not correct. The data in the first page in the array will be offset by a page offset based on the alignment recorded for the data. (All pages thereafter will be aligned at the base of the page, so there's no need to account for this except for the first page.) Because this was wrong, there was a case where the length of a piece would be calculated as all of the residual bytes in the message and that plus the page offset could exceed the length of a page. So fix this case. Make sure the sum won't wrap. This resolves a third issue described in: http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Alex Elder 提交于
Currently ceph_msg_data_pages_advance() allows the page offset value to be PAGE_SIZE, apparently assuming ceph_msg_data_pages_next() will treat it as 0. But that doesn't happen, and the result led to a helpful assertion failure. Change ceph_msg_data_pages_advance() to truncate the offset to 0 before returning if it reaches PAGE_SIZE. Make a few other minor adjustments in this area (comments and a better assertion) while modifying it. This resolves a second issue described in: http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Alex Elder 提交于
It's OK for the result of a read to come back with fewer bytes than were requested. So don't trigger a BUG() in that case when initializing the data cursor. This resolves the first problem described in: http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Alex Elder 提交于
Begin the transition from a single message data item to a list of them by replacing the "data" structure in a message with a pointer to a ceph_msg_data structure. A null pointer will indicate the message has no data; replace the use of ceph_msg_has_data() with a simple check for a null pointer. Create functions ceph_msg_data_create() and ceph_msg_data_destroy() to dynamically allocate and free a data item structure of a given type. When a message has its data item "set," allocate one of these to hold the data description, and free it when the last reference to the message is dropped. This partially resolves: http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The *_msg_pos_next() functions do little more than call ceph_msg_data_advance(). Replace those wrapper functions with a simple call to ceph_msg_data_advance(). This cleanup is related to: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
In write_partial_message_data() we aggregate the crc for the data portion of the message as each new piece of the data item is encountered. Because it was computed *before* sending the data, if an attempt to send a new piece resulted in 0 bytes being sent, the crc crc across that piece would erroneously get computed again and added to the aggregate result. This would occasionally happen in the evnet of a connection failure. The crc value isn't really needed until the complete value is known after sending all data, so there's no need to compute it before sending. So don't calculate the crc for a piece until *after* we know at least one byte of it has been sent. That will avoid this problem. This resolves: http://tracker.ceph.com/issues/4450Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Alex Elder 提交于
The only remaining field in the ceph_msg_pos structure is did_page_crc. In the new cursor model of things that flag (or something like it) belongs in the cursor. Define a new field "need_crc" in the cursor (which applies to all types of data) and initialize it to true whenever a cursor is initialized. In write_partial_message_data(), the data CRC still will be computed as before, but it will check the cursor->need_crc field to determine whether it's needed. Any time the cursor is advanced to a new piece of a data item, need_crc will be set, and this will cause the crc for that entire piece to be accumulated into the data crc. In write_partial_message_data() the intermediate crc value is now held in a local variable so it doesn't have to be byte-swapped so many times. In read_partial_msg_data() we do something similar (but mainly for consistency there). With that, the ceph_msg_pos structure can go away, and it no longer needs to be passed as an argument to prepare_message_data(). This cleanup is related to: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
All but one of the fields in the ceph_msg_pos structure are now never used (only assigned), so get rid of them. This allows several small blocks of code to go away. This is cleanup of old code related to: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Use the "resid" field of a cursor rather than finding when the message data position has moved up to meet the data length to determine when all data has been sent or received in write_partial_message_data() and read_partial_msg_data(). This is cleanup of old code related to: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
It turns out that only one of the data item types is ever used at any one time in a single message (currently). - A page array is used by the osd client (on behalf of the file system) and by rbd. Only one osd op (and therefore at most one data item) is ever used at a time by rbd. And the only time the file system sends two, the second op contains no data. - A bio is only used by the rbd client (and again, only one data item per message) - A page list is used by the file system and by rbd for outgoing data, but only one op (and one data item) at a time. We can therefore collapse all three of our data item fields into a single field "data", and depend on the messenger code to properly handle it based on its type. This allows us to eliminate quite a bit of duplicated code. This is related to: http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Now that read_partial_message_pages() and read_partial_message_bio() are literally identical functions we can factor them out. They're pretty simple as well, so just move their relevant content into read_partial_msg_data(). This is and previous patches together resolve: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
There is handling in write_partial_message_data() for the case where only the length of--and no other information about--the data to be sent has been specified. It uses the zero page as the source of data to send in this case. This case doesn't occur. All message senders set up a page array, pagelist, or bio describing the data to be sent. So eliminate the block of code that handles this (but check and issue a warning for now, just in case it happens for some reason). This resolves: http://tracker.ceph.com/issues/4426Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The cursor code for a page array selects the right page, page offset, and length to use for a ceph_tcp_recvpage() call, so we can use it to replace a block in read_partial_message_pages(). This partially resolves: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The bio_iter and bio_seg fields in a message are no longer used, we use the cursor instead. So get rid of them and the functions that operate on them them. This is related to: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
Replace the use of the information in con->in_msg_pos for incoming bio data. The old in_msg_pos and the new cursor mechanism do basically the same thing, just slightly differently. The main functional difference is that in_msg_pos keeps track of the length of the complete bio list, and assumed it was fully consumed when that many bytes had been transferred. The cursor does not assume a length, it simply consumes all bytes in the bio list. Because the only user of bio data is the rbd client, and because the length of a bio list provided by rbd client always matches the number of bytes in the list, both ways of tracking length are equivalent. In addition, for in_msg_pos the initial bio vector is selected as the initial value of the bio->bi_idx, while the cursor assumes this is zero. Again, the rbd client always passes 0 as the initial index so the effect is the same. Other than that, they basically match: in_msg_pos cursor ---------- ------ bio_iter bio bio_seg vec_index page_pos page_offset The in_msg_pos field is initialized by a call to init_bio_iter(). The bio cursor is initialized by ceph_msg_data_cursor_init(). Both now happen in the same spot, in prepare_message_data(). The in_msg_pos field is advanced by a call to in_msg_pos_next(), which updates page_pos and calls iter_bio_next() to move to the next bio vector, or to the next bio in the list. The cursor is advanced by ceph_msg_data_advance(). That isn't currently happening so add a call to that in in_msg_pos_next(). Finally, the next piece of data to use for a read is determined by a bunch of lines in read_partial_message_bio(). Those can be replaced by an equivalent ceph_msg_data_bio_next() call. This partially resolves: http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
All of the data types can use this, not just the page array. Until now, only the bio type doesn't have it available, and only the initiator of the request (the rbd client) is able to supply the length of the full request without re-scanning the bio list. Change the cursor init routines so the length is supplied based on the message header "data_len" field, and use that length to intiialize the "resid" field of the cursor. In addition, change the way "last_piece" is defined so it is based on the residual number of bytes in the original request. This is necessary (at least for bio messages) because it is possible for a read request to succeed without consuming all of the space available in the data buffer. This resolves: http://tracker.ceph.com/issues/4427Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
The value passed for "pages" in read_partial_message_pages() is always the pages pointer from the incoming message, which can be derived inside that function. So just get rid of the parameter. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Alex Elder 提交于
When the last reference to a ceph message is dropped, ceph_msg_last_put() is called to clean things up. For "normal" messages (allocated via ceph_msg_new() rather than being allocated from a memory pool) it's sufficient to just release resources. But for a mempool-allocated message we actually have to re-initialize the data fields in the message back to initial state so they're ready to go in the event the message gets reused. Some of this was already done; this fleshes it out so it's done more completely. This resolves: http://tracker.ceph.com/issues/4540Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
由 Sage Weil 提交于
We maintain a counter of failed auth attempts to allow us to retry once before failing. However, if the second attempt succeeds, the flag isn't cleared, which makes us think auth failed again later when the connection resets for other reasons (like a socket error). This is one part of the sorry sequence of events in bug http://tracker.ceph.com/issues/4282Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
This is an old protocol extension that allows the client and server to avoid resending old messages after a reconnect (following a socket error). Instead, the exchange their sequence numbers during the handshake. This avoids sending a bunch of useless data over the socket. It has been supported in the server code since v0.22 (Sep 2010). Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-