1. 08 2月, 2014 1 次提交
  2. 26 1月, 2014 1 次提交
    • I
      libceph: add ceph_kv{malloc,free}() and switch to them · eeb0bed5
      Ilya Dryomov 提交于
      Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into
      two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them.
      
      ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is
      vmalloc()'ed with __GFP_HIGHMEM set.  This changes the existing
      behaviour:
      
      - for buffers (ceph_buffer_new()), from trying to kmalloc() everything
        and using vmalloc() just as a fallback
      
      - for messages (ceph_msg_new()), from going to vmalloc() for anything
        bigger than a page
      
      - for messages (ceph_msg_new()), from disallowing vmalloc() to use high
        memory
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eeb0bed5
  3. 14 1月, 2014 2 次提交
  4. 01 1月, 2014 3 次提交
  5. 24 11月, 2013 1 次提交
    • K
      ceph: Convert to immutable biovecs · f38a5181
      Kent Overstreet 提交于
      Now that we've got a mechanism for immutable biovecs -
      bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
      respect it instead of using the bvec array directly.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      f38a5181
  6. 10 8月, 2013 1 次提交
  7. 25 7月, 2013 1 次提交
  8. 03 5月, 2013 2 次提交
  9. 02 5月, 2013 28 次提交
    • A
      libceph: fix two messenger bugs · a51b272e
      Alex Elder 提交于
      This patch makes four small changes in the ceph messenger.
      
      While getting copyup functionality working I found two bugs in the
      messenger.  Existing paths through the code did not trigger these
      problems, but they're fixed here:
          - In ceph_msg_data_pagelist_cursor_init(), the cursor's
            last_piece field was being checked against the length
            supplied.  This was OK until this commit: ccba6d98 libceph:
            implement multiple data items in a message That commit changed
            the cursor init routines to allow lengths to be supplied that
            exceeded the size of the current data item. Because of this,
            we have to use the assigned cursor resid field rather than the
            provided length in determining whether the cursor points to
            the last piece of a data item.
          - In ceph_msg_data_add_pages(), a BUG_ON() was erroneously
            catching attempts to add page data to a message if the message
            already had data assigned to it. That was OK until that same
            commit, at which point it was fine for messages to have
            multiple data items. It slipped through because that BUG_ON()
            call was present twice in that function. (You can never be too
            careful.)
      
      In addition two other minor things are changed:
          - In ceph_msg_data_cursor_init(), the local variable "data" was
            getting assigned twice.
          - In ceph_msg_data_advance(), it was assumed that the
            type-specific advance routine would set new_piece to true
            after it advanced past the last piece. That may have been
            fine, but since we check for that case we might as well set it
            explicitly in ceph_msg_data_advance().
      
      This resolves:
          http://tracker.ceph.com/issues/4762Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a51b272e
    • A
      libceph: add, don't set data for a message · 90af3602
      Alex Elder 提交于
      Change the names of the functions that put data on a pagelist to
      reflect that we're adding to whatever's already there rather than
      just setting it to the one thing.  Currently only one data item is
      ever added to a message, but that's about to change.
      
      This resolves:
          http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      90af3602
    • A
      libceph: implement multiple data items in a message · ca8b3a69
      Alex Elder 提交于
      This patch adds support to the messenger for more than one data item
      in its data list.
      
      A message data cursor has two more fields to support this:
          - a count of the number of bytes left to be consumed across
            all data items in the list, "total_resid"
          - a pointer to the head of the list (for validation only)
      
      The cursor initialization routine has been split into two parts: the
      outer one, which initializes the cursor for traversing the entire
      list of data items; and the inner one, which initializes the cursor
      to start processing a single data item.
      
      When a message cursor is first initialized, the outer initialization
      routine sets total_resid to the length provided.  The data pointer
      is initialized to the first data item on the list.  From there, the
      inner initialization routine finishes by setting up to process the
      data item the cursor points to.
      
      Advancing the cursor consumes bytes in total_resid.  If the resid
      field reaches zero, it means the current data item is fully
      consumed.  If total_resid indicates there is more data, the cursor
      is advanced to point to the next data item, and then the inner
      initialization routine prepares for using that.  (A check is made at
      this point to make sure we don't wrap around the front of the list.)
      
      The type-specific init routines are modified so they can be given a
      length that's larger than what the data item can support.  The resid
      field is initialized to the smaller of the provided length and the
      length of the entire data item.
      
      When total_resid reaches zero, we're done.
      
      This resolves:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ca8b3a69
    • A
      libceph: replace message data pointer with list · 5240d9f9
      Alex Elder 提交于
      In place of the message data pointer, use a list head which links
      through message data items.  For now we only support a single entry
      on that list.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5240d9f9
    • A
      libceph: have cursor point to data · 8ae4f4f5
      Alex Elder 提交于
      Rather than having a ceph message data item point to the cursor it's
      associated with, have the cursor point to a data item.  This will
      allow a message cursor to be used for more than one data item.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ae4f4f5
    • A
      libceph: move cursor into message · 36153ec9
      Alex Elder 提交于
      A message will only be processing a single data item at a time, so
      there's no need for each data item to have its own cursor.
      
      Move the cursor embedded in the message data structure into the
      message itself.  To minimize the impact, keep the data->cursor
      field, but make it be a pointer to the cursor in the message.
      
      Move the definition of ceph_msg_data above ceph_msg_data_cursor so
      the cursor can point to the data without a forward definition rather
      than vice-versa.
      
      This and the upcoming patches are part of:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      36153ec9
    • A
      libceph: record bio length · c851c495
      Alex Elder 提交于
      The bio is the only data item type that doesn't record its full
      length.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c851c495
    • A
      libceph: skip message if too big to receive · f759ebb9
      Alex Elder 提交于
      We know the length of our message buffers.  If we get a message
      that's too long, just dump it and ignore it.  If skip was set
      then con->in_msg won't be valid, so be careful not to dereference
      a null pointer in the process.
      
      This resolves:
          http://tracker.ceph.com/issues/4664Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f759ebb9
    • A
      libceph: fix possible CONFIG_BLOCK build problem · ea96571f
      Alex Elder 提交于
      This patch:
          15a0d7b libceph: record message data length
      did not enclose some bio-specific code inside CONFIG_BLOCK as
      it should have.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ea96571f
    • A
      libceph: provide data length when preparing message · 98fa5dd8
      Alex Elder 提交于
      In prepare_message_data(), the length used to initialize the cursor
      is taken from the header of the message provided.  I'm working
      toward not using the header data length field to determine length in
      outbound messages, and this is a step in that direction.  For
      inbound messages this will be set to be the actual number of bytes
      that are arriving (which may be less than the total size of the data
      buffer available).
      
      This resolves:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      98fa5dd8
    • A
      libceph: record message data length · a1930804
      Alex Elder 提交于
      Keep track of the length of the data portion for a message in a
      separate field in the ceph_msg structure.  This information has
      been maintained in wire byte order in the message header, but
      that's going to change soon.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a1930804
    • A
      libceph: account for alignment in pages cursor · 56fc5659
      Alex Elder 提交于
      When a cursor for a page array data message is initialized it needs
      to determine the initial value for cursor->last_piece.  Currently it
      just checks if length is less than a page, but that's not correct.
      The data in the first page in the array will be offset by a page
      offset based on the alignment recorded for the data.  (All pages
      thereafter will be aligned at the base of the page, so there's
      no need to account for this except for the first page.)
      
      Because this was wrong, there was a case where the length of a piece
      would be calculated as all of the residual bytes in the message and
      that plus the page offset could exceed the length of a page.
      
      So fix this case.  Make sure the sum won't wrap.
      
      This resolves a third issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      56fc5659
    • A
      libceph: page offset must be less than page size · 5df521b1
      Alex Elder 提交于
      Currently ceph_msg_data_pages_advance() allows the page offset value
      to be PAGE_SIZE, apparently assuming ceph_msg_data_pages_next() will
      treat it as 0.  But that doesn't happen, and the result led to a
      helpful assertion failure.
      
      Change ceph_msg_data_pages_advance() to truncate the offset to 0
      before returning if it reaches PAGE_SIZE.
      
      Make a few other minor adjustments in this area (comments and a
      better assertion) while modifying it.
      
      This resolves a second issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5df521b1
    • A
      libceph: fix broken data length assertions · 1190bf06
      Alex Elder 提交于
      It's OK for the result of a read to come back with fewer bytes than
      were requested.  So don't trigger a BUG() in that case when
      initializing the data cursor.
      
      This resolves the first problem described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1190bf06
    • A
      libceph: make message data be a pointer · 6644ed7b
      Alex Elder 提交于
      Begin the transition from a single message data item to a list of
      them by replacing the "data" structure in a message with a pointer
      to a ceph_msg_data structure.
      
      A null pointer will indicate the message has no data; replace the
      use of ceph_msg_has_data() with a simple check for a null pointer.
      
      Create functions ceph_msg_data_create() and ceph_msg_data_destroy()
      to dynamically allocate and free a data item structure of a given type.
      
      When a message has its data item "set," allocate one of these to
      hold the data description, and free it when the last reference to
      the message is dropped.
      
      This partially resolves:
          http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6644ed7b
    • A
      libceph: use only ceph_msg_data_advance() · 8ea299bc
      Alex Elder 提交于
      The *_msg_pos_next() functions do little more than call
      ceph_msg_data_advance().  Replace those wrapper functions with
      a simple call to ceph_msg_data_advance().
      
      This cleanup is related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ea299bc
    • A
      libceph: don't add to crc unless data sent · 143334ff
      Alex Elder 提交于
      In write_partial_message_data() we aggregate the crc for the data
      portion of the message as each new piece of the data item is
      encountered.  Because it was computed *before* sending the data, if
      an attempt to send a new piece resulted in 0 bytes being sent, the
      crc crc across that piece would erroneously get computed again and
      added to the aggregate result.  This would occasionally happen in
      the evnet of a connection failure.
      
      The crc value isn't really needed until the complete value is known
      after sending all data, so there's no need to compute it before
      sending.
      
      So don't calculate the crc for a piece until *after* we know at
      least one byte of it has been sent.  That will avoid this problem.
      
      This resolves:
          http://tracker.ceph.com/issues/4450Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      143334ff
    • A
      libceph: kill last of ceph_msg_pos · f5db90bc
      Alex Elder 提交于
      The only remaining field in the ceph_msg_pos structure is
      did_page_crc.  In the new cursor model of things that flag (or
      something like it) belongs in the cursor.
      
      Define a new field "need_crc" in the cursor (which applies to all
      types of data) and initialize it to true whenever a cursor is
      initialized.
      
      In write_partial_message_data(), the data CRC still will be computed
      as before, but it will check the cursor->need_crc field to determine
      whether it's needed.  Any time the cursor is advanced to a new piece
      of a data item, need_crc will be set, and this will cause the crc
      for that entire piece to be accumulated into the data crc.
      
      In write_partial_message_data() the intermediate crc value is now
      held in a local variable so it doesn't have to be byte-swapped so
      many times.  In read_partial_msg_data() we do something similar
      (but mainly for consistency there).
      
      With that, the ceph_msg_pos structure can go away,  and it no longer
      needs to be passed as an argument to prepare_message_data().
      
      This cleanup is related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f5db90bc
    • A
      libceph: kill most of ceph_msg_pos · 859a35d5
      Alex Elder 提交于
      All but one of the fields in the ceph_msg_pos structure are now
      never used (only assigned), so get rid of them.  This allows
      several small blocks of code to go away.
      
      This is cleanup of old code related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      859a35d5
    • A
      libceph: use cursor resid for loop condition · 643c68a4
      Alex Elder 提交于
      Use the "resid" field of a cursor rather than finding when the
      message data position has moved up to meet the data length to
      determine when all data has been sent or received in
      write_partial_message_data() and read_partial_msg_data().
      
      This is cleanup of old code related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      643c68a4
    • A
      libceph: collapse all data items into one · 4c59b4a2
      Alex Elder 提交于
      It turns out that only one of the data item types is ever used at
      any one time in a single message (currently).
          - A page array is used by the osd client (on behalf of the file
            system) and by rbd.  Only one osd op (and therefore at most
            one data item) is ever used at a time by rbd.  And the only
            time the file system sends two, the second op contains no
            data.
          - A bio is only used by the rbd client (and again, only one
            data item per message)
          - A page list is used by the file system and by rbd for outgoing
            data, but only one op (and one data item) at a time.
      
      We can therefore collapse all three of our data item fields into a
      single field "data", and depend on the messenger code to properly
      handle it based on its type.
      
      This allows us to eliminate quite a bit of duplicated code.
      
      This is related to:
          http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      4c59b4a2
    • A
      libceph: get rid of read helpers · 686be208
      Alex Elder 提交于
      Now that read_partial_message_pages() and read_partial_message_bio()
      are literally identical functions we can factor them out.  They're
      pretty simple as well, so just move their relevant content into
      read_partial_msg_data().
      
      This is and previous patches together resolve:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      686be208
    • A
      libceph: no outbound zero data · 61fcdc97
      Alex Elder 提交于
      There is handling in write_partial_message_data() for the case where
      only the length of--and no other information about--the data to be
      sent has been specified.  It uses the zero page as the source of
      data to send in this case.
      
      This case doesn't occur.  All message senders set up a page array,
      pagelist, or bio describing the data to be sent.  So eliminate the
      block of code that handles this (but check and issue a warning for
      now, just in case it happens for some reason).
      
      This resolves:
          http://tracker.ceph.com/issues/4426Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      61fcdc97
    • A
      libceph: use cursor for inbound data pages · 878efabd
      Alex Elder 提交于
      The cursor code for a page array selects the right page, page
      offset, and length to use for a ceph_tcp_recvpage() call, so
      we can use it to replace a block in read_partial_message_pages().
      
      This partially resolves:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      878efabd
    • A
      libceph: kill ceph message bio_iter, bio_seg · 6518be47
      Alex Elder 提交于
      The bio_iter and bio_seg fields in a message are no longer used, we
      use the cursor instead.  So get rid of them and the functions that
      operate on them them.
      
      This is related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6518be47
    • A
      libceph: use cursor for bio reads · 463207aa
      Alex Elder 提交于
      Replace the use of the information in con->in_msg_pos for incoming
      bio data.  The old in_msg_pos and the new cursor mechanism do
      basically the same thing, just slightly differently.
      
      The main functional difference is that in_msg_pos keeps track of the
      length of the complete bio list, and assumed it was fully consumed
      when that many bytes had been transferred.  The cursor does not assume
      a length, it simply consumes all bytes in the bio list.  Because the
      only user of bio data is the rbd client, and because the length of a
      bio list provided by rbd client always matches the number of bytes
      in the list, both ways of tracking length are equivalent.
      
      In addition, for in_msg_pos the initial bio vector is selected as
      the initial value of the bio->bi_idx, while the cursor assumes this
      is zero.  Again, the rbd client always passes 0 as the initial index
      so the effect is the same.
      
      Other than that, they basically match:
          in_msg_pos      cursor
          ----------      ------
          bio_iter        bio
          bio_seg         vec_index
          page_pos        page_offset
      
      The in_msg_pos field is initialized by a call to init_bio_iter().
      The bio cursor is initialized by ceph_msg_data_cursor_init().
      Both now happen in the same spot, in prepare_message_data().
      
      The in_msg_pos field is advanced by a call to in_msg_pos_next(),
      which updates page_pos and calls iter_bio_next() to move to the next
      bio vector, or to the next bio in the list.  The cursor is advanced
      by ceph_msg_data_advance().  That isn't currently happening so
      add a call to that in in_msg_pos_next().
      
      Finally, the next piece of data to use for a read is determined
      by a bunch of lines in read_partial_message_bio().  Those can be
      replaced by an equivalent ceph_msg_data_bio_next() call.
      
      This partially resolves:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      463207aa
    • A
      libceph: record residual bytes for all message data types · 25aff7c5
      Alex Elder 提交于
      All of the data types can use this, not just the page array.  Until
      now, only the bio type doesn't have it available, and only the
      initiator of the request (the rbd client) is able to supply the
      length of the full request without re-scanning the bio list.  Change
      the cursor init routines so the length is supplied based on the
      message header "data_len" field, and use that length to intiialize
      the "resid" field of the cursor.
      
      In addition, change the way "last_piece" is defined so it is based
      on the residual number of bytes in the original request.  This is
      necessary (at least for bio messages) because it is possible for
      a read request to succeed without consuming all of the space
      available in the data buffer.
      
      This resolves:
          http://tracker.ceph.com/issues/4427Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      25aff7c5
    • A
      libceph: drop pages parameter · 28a89dde
      Alex Elder 提交于
      The value passed for "pages" in read_partial_message_pages() is
      always the pages pointer from the incoming message, which can be
      derived inside that function.  So just get rid of the parameter.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      28a89dde