1. 08 7月, 2014 6 次提交
  2. 12 6月, 2014 1 次提交
  3. 05 4月, 2014 2 次提交
  4. 03 4月, 2014 2 次提交
  5. 14 2月, 2014 1 次提交
  6. 08 2月, 2014 2 次提交
  7. 04 2月, 2014 1 次提交
  8. 28 1月, 2014 7 次提交
  9. 26 1月, 2014 1 次提交
  10. 14 1月, 2014 2 次提交
  11. 14 12月, 2013 3 次提交
    • J
      libceph: resend all writes after the osdmap loses the full flag · 9a1ea2db
      Josh Durgin 提交于
      With the current full handling, there is a race between osds and
      clients getting the first map marked full. If the osd wins, it will
      return -ENOSPC to any writes, but the client may already have writes
      in flight. This results in the client getting the error and
      propagating it up the stack. For rbd, the block layer turns this into
      EIO, which can cause corruption in filesystems above it.
      
      To avoid this race, osds are being changed to drop writes that came
      from clients with an osdmap older than the last osdmap marked full.
      In order for this to work, clients must resend all writes after they
      encounter a full -> not full transition in the osdmap. osds will wait
      for an updated map instead of processing a request from a client with
      a newer map, so resent writes will not be dropped by the osd unless
      there is another not full -> full transition.
      
      This approach requires both osds and clients to be fixed to avoid the
      race. Old clients talking to osds with this fix may hang instead of
      returning EIO and potentially corrupting an fs. New clients talking to
      old osds have the same behavior as before if they encounter this race.
      
      Fixes: http://tracker.ceph.com/issues/6938Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      9a1ea2db
    • J
      libceph: block I/O when PAUSE or FULL osd map flags are set · d29adb34
      Josh Durgin 提交于
      The PAUSEWR and PAUSERD flags are meant to stop the cluster from
      processing writes and reads, respectively. The FULL flag is set when
      the cluster determines that it is out of space, and will no longer
      process writes.  PAUSEWR and PAUSERD are purely client-side settings
      already implemented in userspace clients. The osd does nothing special
      with these flags.
      
      When the FULL flag is set, however, the osd responds to all writes
      with -ENOSPC. For cephfs, this makes sense, but for rbd the block
      layer translates this into EIO.  If a cluster goes from full to
      non-full quickly, a filesystem on top of rbd will not behave well,
      since some writes succeed while others get EIO.
      
      Fix this by blocking any writes when the FULL flag is set in the osd
      client. This is the same strategy used by userspace, so apply it by
      default.  A follow-on patch makes this configurable.
      
      __map_request() is called to re-target osd requests in case the
      available osds changed.  Add a paused field to a ceph_osd_request, and
      set it whenever an appropriate osd map flag is set.  Avoid queueing
      paused requests in __map_request(), but force them to be resent if
      they become unpaused.
      
      Also subscribe to the next osd map from the monitor if any of these
      flags are set, so paused requests can be unblocked as soon as
      possible.
      
      Fixes: http://tracker.ceph.com/issues/6079Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      d29adb34
    • L
      ceph: Add necessary clean up if invalid reply received in handle_reply() · 37c89bde
      Li Wang 提交于
      Wake up possible waiters, invoke the call back if any, unregister the request
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      37c89bde
  12. 10 9月, 2013 1 次提交
  13. 28 8月, 2013 3 次提交
  14. 16 8月, 2013 1 次提交
  15. 10 8月, 2013 1 次提交
  16. 04 7月, 2013 5 次提交
    • Y
      libceph: call r_unsafe_callback when unsafe reply is received · 61c5d6bf
      Yan, Zheng 提交于
      We can't use !req->r_sent to check if OSD request is sent for the
      first time, this is because __cancel_request() zeros req->r_sent
      when OSD map changes. Rather than adding a new variable to struct
      ceph_osd_request to indicate if it's sent for the first time, We
      can call the unsafe callback only when unsafe OSD reply is received.
      If OSD's first reply is safe, just skip calling the unsafe callback.
      
      The purpose of unsafe callback is adding unsafe request to a list,
      so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
      to wait for a write(2) that hasn't returned yet. So it's OK to add
      request to the unsafe list when the first OSD reply is received.
      (ceph_sync_write() returns after receiving the first OSD reply)
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      61c5d6bf
    • Y
      libceph: fix truncate size calculation · ccca4e37
      Yan, Zheng 提交于
      check the "not truncated yet" case
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ccca4e37
    • Y
      libceph: fix safe completion · eb845ff1
      Yan, Zheng 提交于
      handle_reply() calls complete_request() only if the first OSD reply
      has ONDISK flag.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eb845ff1
    • A
      libceph: print more info for short message header · 4974341e
      Alex Elder 提交于
      If an osd client response message arrives that has a front section
      that's too big for the buffer set aside to receive it, a warning
      gets reported and a new buffer is allocated.
      
      The warning says nothing about which connection had the problem.
      Add the peer type and number to what gets reported, to be a bit more
      informative.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      4974341e
    • A
      libceph: add lingering request reference when registered · 96e4dac6
      Alex Elder 提交于
      When an osd request is set to linger, the osd client holds onto the
      request so it can be re-submitted following certain osd map changes.
      The osd client holds a reference to the request until it is
      unregistered.  This is used by rbd for watch requests.
      
      Currently, the reference is taken when the request is marked with
      the linger flag.  This means that if an error occurs after that
      time but before the the request completes successfully, that
      reference is leaked.
      
      There's really no reason to take the reference until the request is
      registered in the the osd client's list of lingering requests, and
      that only happens when the lingering (watch) request completes
      successfully.
      
      So take that reference only when it gets registered following
      succesful completion, and drop it (as before) when the request
      gets unregistered.  This avoids the reference problem on error
      in rbd.
      
      Rearrange ceph_osdc_unregister_linger_request() to avoid using
      the request pointer after it may have been freed.
      
      And hold an extra reference in kick_requests() while handling
      a linger request that has not yet been registered, to ensure
      it doesn't go away.
      
      This resolves:
          http://tracker.ceph.com/issues/3859Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      96e4dac6
  17. 18 5月, 2013 1 次提交
    • A
      libceph: must hold mutex for reset_changed_osds() · 14d2f38d
      Alex Elder 提交于
      An osd client has a red-black tree describing its osds, and
      occasionally we would get crashes due to one of these trees tree
      becoming corrupt somehow.
      
      The problem turned out to be that reset_changed_osds() was being
      called without protection of the osd client request mutex.  That
      function would call __reset_osd() for any osd that had changed, and
      __reset_osd() would call __remove_osd() for any osd with no
      outstanding requests, and finally __remove_osd() would remove the
      corresponding entry from the red-black tree.  Thus, the tree was
      getting modified without having any lock protection, and was
      vulnerable to problems due to concurrent updates.
      
      This appears to be the only osd tree updating path that has this
      problem.  It can be fairly easily fixed by moving the call up
      a few lines, to just before the request mutex gets dropped
      in kick_requests().
      
      This resolves:
          http://tracker.ceph.com/issues/5043
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      14d2f38d