1. 19 2月, 2013 4 次提交
  2. 18 1月, 2013 9 次提交
    • A
      rbd: kill ceph_osd_req_op->flags · 2b5fc648
      Alex Elder 提交于
      The flags field of struct ceph_osd_req_op is never used, so just get
      rid of it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2b5fc648
    • A
      libceph: pass num_op with ops · ae7ca4a3
      Alex Elder 提交于
      Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are
      provided an array of ceph osd request operations.  Rather than just
      passing the number of operations in the array, the caller is
      required append an additional zeroed operation structure to signal
      the end of the array.
      
      All callers know the number of operations at the time these
      functions are called, so drop the silly zero entry and supply that
      number directly.  As a result, get_num_ops() is no longer needed.
      This also means that ceph_osdc_alloc_request() never uses its ops
      argument, so that can be dropped.
      
      Also rbd_create_rw_ops() no longer needs to add one to reserve room
      for the additional op.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ae7ca4a3
    • A
      libceph: don't set pages or bio in ceph_osdc_alloc_request() · 54a54007
      Alex Elder 提交于
      Only one of the two callers of ceph_osdc_alloc_request() provides
      page or bio data for its payload.  And essentially all that function
      was doing with those arguments was assigning them to fields in the
      osd request structure.
      
      Simplify ceph_osdc_alloc_request() by having the caller take care of
      making those assignments
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      54a54007
    • A
      libceph: don't set flags in ceph_osdc_alloc_request() · d178a9e7
      Alex Elder 提交于
      The only thing ceph_osdc_alloc_request() really does with the
      flags value it is passed is assign it to the newly-created
      osd request structure.  Do that in the caller instead.
      
      Both callers subsequently call ceph_osdc_build_request(), so have
      that function (instead of ceph_osdc_alloc_request()) issue a warning
      if a request comes through with neither the read nor write flags set.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d178a9e7
    • A
      libceph: drop osdc from ceph_calc_raw_layout() · e75b45cf
      Alex Elder 提交于
      The osdc parameter to ceph_calc_raw_layout() is not used, so get rid
      of it.  Consequently, the corresponding parameter in calc_layout()
      becomes unused, so get rid of that as well.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e75b45cf
    • A
      libceph: drop snapid in ceph_calc_raw_layout() · 4d6b250b
      Alex Elder 提交于
      A snapshot id must be provided to ceph_calc_raw_layout() even though
      it is not needed at all for calculating the layout.
      
      Where the snapshot id *is* needed is when building the request
      message for an osd operation.
      
      Drop the snapid parameter from ceph_calc_raw_layout() and pass
      that value instead in ceph_osdc_build_request().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      4d6b250b
    • A
      libceph: pass length to ceph_osdc_build_request() · 0120be3c
      Alex Elder 提交于
      The len argument to ceph_osdc_build_request() is set up to be
      passed by address, but that function never updates its value
      so there's no need to do this.  Tighten up the interface by
      passing the length directly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0120be3c
    • A
      libceph: always allow trail in osd request · c885837f
      Alex Elder 提交于
      An osd request structure contains an optional trail portion, which
      if present will contain data to be passed in the payload portion of
      the message containing the request.  The trail field is a
      ceph_pagelist pointer, and if null it indicates there is no trail.
      
      A ceph_pagelist structure contains a length field, and it can
      legitimately hold value 0.  Make use of this to change the
      interpretation of the "trail" of an osd request so that every osd
      request has trailing data, it just might have length 0.
      
      This means we change the r_trail field in a ceph_osd_request
      structure from a pointer to a structure that is always initialized.
      
      Note that in ceph_osdc_start_request(), the trail pointer (or now
      address of that structure) is assigned to a ceph message's trail
      field.  Here's why that's still OK (looking at net/ceph/messenger.c):
          - What would have resulted in a null pointer previously will now
            refer to a 0-length page list.  That message trail pointer
            is used in two functions, write_partial_msg_pages() and
            out_msg_pos_next().
          - In write_partial_msg_pages(), a null page list pointer is
            handled the same as a message with 0-length trail, and both
            result in a "in_trail" variable set to false.  The trail
            pointer is only used if in_trail is true.
          - The only other place the message trail pointer is used is
            out_msg_pos_next().  That function is only called by
            write_partial_msg_pages() and only touches the trail pointer
            if the in_trail value it is passed is true.
      Therefore a null ceph_msg->trail pointer is equivalent to a non-null
      pointer referring to a 0-length page list structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c885837f
    • A
      rbd: drop oid parameters from ceph_osdc_build_request() · af77f26c
      Alex Elder 提交于
      The last two parameters to ceph_osd_build_request() describe the
      object id, but the values passed always come from the osd request
      structure whose address is also provided.  Get rid of those last
      two parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      af77f26c
  3. 02 10月, 2012 1 次提交
  4. 17 5月, 2012 1 次提交
  5. 12 11月, 2011 1 次提交
  6. 23 3月, 2011 1 次提交
  7. 22 3月, 2011 1 次提交
    • S
      libceph: fix osd request queuing on osdmap updates · 6f6c7006
      Sage Weil 提交于
      If we send a request to osd A, and the request's pg remaps to osd B and
      then back to A in quick succession, we need to resend the request to A. The
      old code was only calling kick_requests after processing all incremental
      maps in a message, so it was very possible to not resend a request that
      needed to be resent.  This would make the osd eventually time out (at least
      with the current default of osd timeouts enabled).
      
      The correct approach is to scan requests on every map incremental.  This
      patch refactors the kick code in a few ways:
       - all requests are either on req_lru (in flight), req_unsent (ready to
         send), or req_notarget (currently map to no up osd)
       - mapping always done by map_request (previous map_osds)
       - if the mapping changes, we requeue.  requests are resent only after all
         map incrementals are processed.
       - some osd reset code is moved out of kick_requests into a separate
         function
       - the "kick this osd" functionality is moved to kick_osd_requests, as it
         is unrelated to scanning for request->pg->osd mapping changes
      Signed-off-by: NSage Weil <sage@newdream.net>
      6f6c7006
  8. 10 11月, 2010 1 次提交
    • S
      ceph: make page alignment explicit in osd interface · b7495fc2
      Sage Weil 提交于
      We used to infer alignment of IOs within a page based on the file offset,
      which assumed they matched.  This broke with direct IO that was not aligned
      to pages (e.g., 512-byte aligned IO).  We were also trusting the alignment
      specified in the OSD reply, which could have been adjusted by the server.
      
      Explicitly specify the page alignment when setting up OSD IO requests.
      Signed-off-by: NSage Weil <sage@newdream.net>
      b7495fc2
  9. 21 10月, 2010 4 次提交
  10. 12 5月, 2010 1 次提交
  11. 06 5月, 2010 1 次提交
    • S
      ceph: don't use writeback_control in writepages completion · 54ad023b
      Sage Weil 提交于
      The ->writepages writeback_control is not still valid in the writepages
      completion.  We were touching it solely to adjust pages_skipped when there
      was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials),
      causing an oops in the writeback code shortly thereafter.  Updating
      pages_skipped on error isn't correct anyway, so let's just rip out this
      (clearly broken) code to pass the wbc to the completion.
      Signed-off-by: NSage Weil <sage@newdream.net>
      54ad023b
  12. 23 3月, 2010 1 次提交
  13. 05 3月, 2010 1 次提交
    • Y
      ceph: reset osd after relevant messages timed out · 422d2cb8
      Yehuda Sadeh 提交于
      This simplifies the process of timing out messages. We
      keep lru of current messages that are in flight. If a
      timeout has passed, we reset the osd connection, so that
      messages will be retransmitted.  This is a failsafe in case
      we hit some sort of problem sending out message to the OSD.
      Normally, we'll get notification via an updated osdmap if
      there are problems.
      
      If a request is older than the keepalive timeout, send a
      keepalive to ensure we detect any breaks in the TCP connection.
      Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: NSage Weil <sage@newdream.net>
      422d2cb8
  14. 02 3月, 2010 1 次提交
  15. 12 2月, 2010 1 次提交
  16. 26 1月, 2010 1 次提交
  17. 15 1月, 2010 1 次提交
  18. 24 12月, 2009 1 次提交
    • S
      ceph: control access to page vector for incoming data · 350b1c32
      Sage Weil 提交于
      When we issue an OSD read, we specify a vector of pages that the data is to
      be read into.  The request may be sent multiple times, to multiple OSDs, if
      the osdmap changes, which means we can get more than one reply.
      
      Only read data into the page vector if the reply is coming from the
      OSD we last sent the request to.  Keep track of which connection is using
      the vector by taking a reference.  If another connection was already
      using the vector before and a new reply comes in on the right connection,
      revoke the pages from the other connection.
      Signed-off-by: NSage Weil <sage@newdream.net>
      350b1c32
  19. 22 12月, 2009 1 次提交
  20. 08 12月, 2009 1 次提交
  21. 19 11月, 2009 1 次提交
    • S
      ceph: negotiate authentication protocol; implement AUTH_NONE protocol · 4e7a5dcd
      Sage Weil 提交于
      When we open a monitor session, we send an initial AUTH message listing
      the auth protocols we support, our entity name, and (possibly) a previously
      assigned global_id.  The monitor chooses a protocol and responds with an
      initial message.
      
      Initially implement AUTH_NONE, a dummy protocol that provides no security,
      but works within the new framework.  It generates 'authorizers' that are
      used when connecting to (mds, osd) services that simply state our entity
      name and global_id.
      
      This is a wire protocol change.
      Signed-off-by: NSage Weil <sage@newdream.net>
      4e7a5dcd
  22. 13 11月, 2009 1 次提交
  23. 07 10月, 2009 1 次提交
    • S
      ceph: OSD client · f24e9980
      Sage Weil 提交于
      The OSD client is responsible for reading and writing data from/to the
      object storage pool.  This includes determining where objects are
      stored in the cluster, and ensuring that requests are retried or
      redirected in the event of a node failure or data migration.
      
      If an OSD does not respond before a timeout expires, keepalive
      messages are sent across the lossless, ordered communications channel
      to ensure that any break in the TCP is discovered.  If the session
      does reset, a reconnection is attempted and affected requests are
      resent (by the message transport layer).
      Signed-off-by: NSage Weil <sage@newdream.net>
      f24e9980