1. 06 7月, 2012 1 次提交
  2. 22 6月, 2012 1 次提交
  3. 06 6月, 2012 4 次提交
    • A
      libceph: make ceph_con_revoke_message() a msg op · 8921d114
      Alex Elder 提交于
      ceph_con_revoke_message() is passed both a message and a ceph
      connection.  A ceph_msg allocated for incoming messages on a
      connection always has a pointer to that connection, so there's no
      need to provide the connection when revoking such a message.
      
      Note that the existing logic does not preclude the message supplied
      being a null/bogus message pointer.  The only user of this interface
      is the OSD client, and the only value an osd client passes is a
      request's r_reply field.  That is always non-null (except briefly in
      an error path in ceph_osdc_alloc_request(), and that drops the
      only reference so the request won't ever have a reply to revoke).
      So we can safely assume the passed-in message is non-null, but add a
      BUG_ON() to make it very obvious we are imposing this restriction.
      
      Rename the function ceph_msg_revoke_incoming() to reflect that it is
      really an operation on an incoming message.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      8921d114
    • A
      libceph: make ceph_con_revoke() a msg operation · 6740a845
      Alex Elder 提交于
      ceph_con_revoke() is passed both a message and a ceph connection.
      Now that any message associated with a connection holds a pointer
      to that connection, there's no need to provide the connection when
      revoking a message.
      
      This has the added benefit of precluding the possibility of the
      providing the wrong connection pointer.  If the message's connection
      pointer is null, it is not being tracked by any connection, so
      revoking it is a no-op.  This is supported as a convenience for
      upper layers, so they can revoke a message that is not actually
      "in flight."
      
      Rename the function ceph_msg_revoke() to reflect that it is really
      an operation on a message, not a connection.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      6740a845
    • A
      libceph: have messages point to their connection · 38941f80
      Alex Elder 提交于
      When a ceph message is queued for sending it is placed on a list of
      pending messages (ceph_connection->out_queue).  When they are
      actually sent over the wire, they are moved from that list to
      another (ceph_connection->out_sent).  When acknowledgement for the
      message is received, it is removed from the sent messages list.
      
      During that entire time the message is "in the possession" of a
      single ceph connection.  Keep track of that connection in the
      message.  This will be used in the next patch (and is a helpful
      bit of information for debugging anyway).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      38941f80
    • A
      libceph: fully initialize connection in con_init() · 1bfd89f4
      Alex Elder 提交于
      Move the initialization of a ceph connection's private pointer,
      operations vector pointer, and peer name information into
      ceph_con_init().  Rearrange the arguments so the connection pointer
      is first.  Hide the byte-swapping of the peer entity number inside
      ceph_con_init()
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1bfd89f4
  4. 01 6月, 2012 5 次提交
    • A
      libceph: start tracking connection socket state · ce2c8903
      Alex Elder 提交于
      Start explicitly keeping track of the state of a ceph connection's
      socket, separate from the state of the connection itself.  Create
      placeholder functions to encapsulate the state transitions.
      
          --------
          | NEW* |  transient initial state
          --------
              | con_sock_state_init()
              v
          ----------
          | CLOSED |  initialized, but no socket (and no
          ----------  TCP connection)
           ^      \
           |       \ con_sock_state_connecting()
           |        ----------------------
           |                              \
           + con_sock_state_closed()       \
           |\                               \
           | \                               \
           |  -----------                     \
           |  | CLOSING |  socket event;       \
           |  -----------  await close          \
           |       ^                            |
           |       |                            |
           |       + con_sock_state_closing()   |
           |      / \                           |
           |     /   ---------------            |
           |    /                   \           v
           |   /                    --------------
           |  /    -----------------| CONNECTING |  socket created, TCP
           |  |   /                 --------------  connect initiated
           |  |   | con_sock_state_connected()
           |  |   v
          -------------
          | CONNECTED |  TCP connection established
          -------------
      
      Make the socket state an atomic variable, reinforcing that it's a
      distinct transtion with no possible "intermediate/both" states.
      This is almost certainly overkill at this point, though the
      transitions into CONNECTED and CLOSING state do get called via
      socket callback (the rest of the transitions occur with the
      connection mutex held).  We can back out the atomicity later.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: Sage Weil<sage@inktank.com>
      ce2c8903
    • A
      libceph: start separating connection flags from state · 928443cd
      Alex Elder 提交于
      A ceph_connection holds a mixture of connection state (as in "state
      machine" state) and connection flags in a single "state" field.  To
      make the distinction more clear, define a new "flags" field and use
      it rather than the "state" field to hold Boolean flag values.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: Sage Weil<sage@inktank.com>
      928443cd
    • A
      libceph: embed ceph messenger structure in ceph_client · 15d9882c
      Alex Elder 提交于
      A ceph client has a pointer to a ceph messenger structure in it.
      There is always exactly one ceph messenger for a ceph client, so
      there is no need to allocate it separate from the ceph client
      structure.
      
      Switch the ceph_client structure to embed its ceph_messenger
      structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      15d9882c
    • A
      libceph: kill bad_proto ceph connection op · 6384bb8b
      Alex Elder 提交于
      No code sets a bad_proto method in its ceph connection operations
      vector, so just get rid of it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
      6384bb8b
    • A
      libceph: eliminate connection state "DEAD" · e5e372da
      Alex Elder 提交于
      The ceph connection state "DEAD" is never set and is therefore not
      needed.  Eliminate it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
      e5e372da
  5. 17 5月, 2012 2 次提交
  6. 22 3月, 2012 3 次提交
  7. 26 10月, 2011 1 次提交
  8. 15 9月, 2011 1 次提交
    • J
      Remove unneeded version.h includes from include/ · e81b1516
      Jesper Juhl 提交于
      It was pointed out by 'make versioncheck' that some includes of
      linux/version.h are not needed in include/.
      This patch removes them.
      
      When I last posted the patch, the ceph bit was ACK'ed by Sage Weil, so
      I've added that below.
      
      The pwc-ioctl change generated quite a bit of discussion about V4L version
      numbers in general, but as far as I can tell, no concensus was reached on
      what the long term solution should be, so in the mean time I think we
      could start by just removing the unneeded include, which is why I'm
      resending the patch with that hunk still included.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      e81b1516
  9. 27 7月, 2011 1 次提交
  10. 05 3月, 2011 2 次提交
    • S
      libceph: fix msgr keepalive flag · e76661d0
      Sage Weil 提交于
      There was some broken keepalive code using a dead variable.  Shift to using
      the proper bit flag.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e76661d0
    • S
      libceph: fix msgr backoff · 60bf8bf8
      Sage Weil 提交于
      With commit f363e45f we replaced a bunch of hacky workqueue mutual
      exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
      that the exponential backoff breaks in certain cases:
      
       * con_work attempts to connect.
       * we get an immediate failure, and the socket state change handler queues
         immediate work.
       * con_work calls con_fault, we decide to back off, but can't queue delayed
         work.
      
      In this case, we add a BACKOFF bit to make con_work reschedule delayed work
      next time it runs (which should be immediately).
      Signed-off-by: NSage Weil <sage@newdream.net>
      60bf8bf8
  11. 13 1月, 2011 1 次提交
    • T
      net/ceph: make ceph_msgr_wq non-reentrant · f363e45f
      Tejun Heo 提交于
      ceph messenger code does a rather complex dancing around multithread
      workqueue to make sure the same work item isn't executed concurrently
      on different CPUs.  This restriction can be provided by workqueue with
      WQ_NON_REENTRANT.
      
      Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
      level and remove the QUEUED/BUSY logic.
      
      * This removes backoff handling in con_work() but it couldn't reliably
        block execution of con_work() to begin with - queue_con() can be
        called after the work started but before BUSY is set.  It seems that
        it was an optimization for a rather cold path and can be safely
        removed.
      
      * The number of concurrent work items is bound by the number of
        connections and connetions are independent from each other.  With
        the default concurrency level, different connections will be
        executed independently.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Sage Weil <sage@newdream.net>
      Cc: ceph-devel@vger.kernel.org
      Signed-off-by: NSage Weil <sage@newdream.net>
      f363e45f
  12. 10 11月, 2010 1 次提交
    • S
      ceph: explicitly specify page alignment in network messages · c5c6b19d
      Sage Weil 提交于
      The alignment used for reading data into or out of pages used to be taken
      from the data_off field in the message header.  This only worked as long
      as the page alignment matched the object offset, breaking direct io to
      non-page aligned offsets.
      
      Instead, explicitly specify the page alignment next to the page vector
      in the ceph_msg struct, and use that instead of the message header (which
      probably shouldn't be trusted).  The alloc_msg callback is responsible for
      filling in this field properly when it sets up the page vector.
      Signed-off-by: NSage Weil <sage@newdream.net>
      c5c6b19d
  13. 21 10月, 2010 2 次提交
    • Y
      ceph: factor out libceph from Ceph file system · 3d14c5d2
      Yehuda Sadeh 提交于
      This factors out protocol and low-level storage parts of ceph into a
      separate libceph module living in net/ceph and include/linux/ceph.  This
      is mostly a matter of moving files around.  However, a few key pieces
      of the interface change as well:
      
       - ceph_client becomes ceph_fs_client and ceph_client, where the latter
         captures the mon and osd clients, and the fs_client gets the mds client
         and file system specific pieces.
       - Mount option parsing and debugfs setup is correspondingly broken into
         two pieces.
       - The mon client gets a generic handler callback for otherwise unknown
         messages (mds map, in this case).
       - The basic supported/required feature bits can be expanded (and are by
         ceph_fs_client).
      
      No functional change, aside from some subtle error handling cases that got
      cleaned up in the refactoring process.
      Signed-off-by: NSage Weil <sage@newdream.net>
      3d14c5d2
    • Y
      ceph: messenger and osdc changes for rbd · 68b4476b
      Yehuda Sadeh 提交于
      Allow the messenger to send/receive data in a bio.  This is added
      so that we wouldn't need to copy the data into pages or some other buffer
      when doing IO for an rbd block device.
      
      We can now have trailing variable sized data for osd
      ops.  Also osd ops encoding is more modular.
      Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: NSage Weil <sage@newdream.net>
      68b4476b
  14. 30 5月, 2010 1 次提交
  15. 18 5月, 2010 5 次提交
  16. 12 5月, 2010 1 次提交
  17. 23 3月, 2010 1 次提交
    • S
      ceph: avoid reopening osd connections when address hasn't changed · 87b315a5
      Sage Weil 提交于
      We get a fault callback on _every_ tcp connection fault.  Normally, we
      want to reopen the connection when that happens.  If the address we have
      is bad, however, and connection attempts always result in a connection
      refused or similar error, explicitly closing and reopening the msgr
      connection just prevents the messenger's backoff logic from kicking in.
      The result can be a console full of
      
      [ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed
      [ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed
      [ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed
      
      Instead, if we get a fault, and have outstanding requests, but the osd
      address hasn't changed and the connection never successfully connected in
      the first place, do nothing to the osd connection.  The messenger layer
      will back off and retry periodically, because we never connected and thus
      the lossy bit is not set.
      
      Instead, touch each request's r_stamp so that handle_timeout can tell the
      request is still alive and kicking.
      Signed-off-by: NSage Weil <sage@newdream.net>
      87b315a5
  18. 02 3月, 2010 1 次提交
  19. 11 2月, 2010 1 次提交
  20. 26 1月, 2010 3 次提交
  21. 24 12月, 2009 2 次提交
    • S
      ceph: support ceph_pagelist for message payload · 58bb3b37
      Sage Weil 提交于
      The ceph_pagelist is a simple list of whole pages, strung together via
      their lru list_head.  It facilitates encoding to a "buffer" of unknown
      size.  Allow its use in place of the ceph_msg page vector.
      
      This will be used to fix the huge buffer preallocation woes of MDS
      reconnection.
      Signed-off-by: NSage Weil <sage@newdream.net>
      58bb3b37
    • S
      ceph: control access to page vector for incoming data · 350b1c32
      Sage Weil 提交于
      When we issue an OSD read, we specify a vector of pages that the data is to
      be read into.  The request may be sent multiple times, to multiple OSDs, if
      the osdmap changes, which means we can get more than one reply.
      
      Only read data into the page vector if the reply is coming from the
      OSD we last sent the request to.  Keep track of which connection is using
      the vector by taking a reference.  If another connection was already
      using the vector before and a new reply comes in on the right connection,
      revoke the pages from the other connection.
      Signed-off-by: NSage Weil <sage@newdream.net>
      350b1c32