1. 29 9月, 2011 2 次提交
    • S
      libceph: fix pg_temp mapping update · 8adc8b3d
      Sage Weil 提交于
      The incremental map updates have a record for each pg_temp mapping that is
      to be add/updated (len > 0) or removed (len == 0).  The old code was
      written as if the updates were a complete enumeration; that was just wrong.
      Update the code to remove 0-length entries and drop the rbtree traversal.
      
      This avoids misdirected (and hung) requests that manifest as server
      errors like
      
      [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
      Signed-off-by: NSage Weil <sage@newdream.net>
      8adc8b3d
    • S
      libceph: fix pg_temp mapping calculation · 782e182e
      Sage Weil 提交于
      We need to apply the modulo pg_num calculation before looking up a pgid in
      the pg_temp mapping rbtree.  This fixes pg_temp mappings, and fixes
      (some) misdirected requests that result in messages like
      
      [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
      
      on the server and stall make the client block without getting a reply (at
      least until the pg_temp mapping goes way, but that can take a long long
      time).
      
      Reorder calc_pg_raw() a bit to make more sense.
      Signed-off-by: NSage Weil <sage@newdream.net>
      782e182e
  2. 17 9月, 2011 3 次提交
  3. 01 9月, 2011 1 次提交
  4. 10 8月, 2011 1 次提交
    • S
      libceph: fix msgpool · 5185352c
      Sage Weil 提交于
      There were several problems here:
      
       1- we weren't tagging allocations with the pool, so they were never
          returned to the pool.
       2- msgpool_put didn't add back to the mempool, even it were called.
       3- msgpool_release didn't clear the pool pointer, so it would have looped
          had #1 not been broken.
      
      These may or may not have been responsible for #1136 or #1381 (BUG due to
      non-empty mempool on umount).  I can't seem to trigger the crash now using
      the method I was using before.
      Signed-off-by: NSage Weil <sage@newdream.net>
      5185352c
  5. 27 7月, 2011 1 次提交
  6. 20 7月, 2011 1 次提交
    • S
      ceph: fix file mode calculation · 38be7a79
      Sage Weil 提交于
      open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR.  No need
      for any O_APPEND special case.
      
      Passing O_WRONLY|O_RDWR is undefined according to the man page, but the
      Linux VFS interprets this as O_RDWR, so we'll do the same.
      
      This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being
      translated to readonly.
      Reported-by: NFyodor Ustinov <ufm@ufm.su>
      Signed-off-by: NSage Weil <sage@newdream.net>
      38be7a79
  7. 14 6月, 2011 1 次提交
  8. 08 6月, 2011 1 次提交
  9. 25 5月, 2011 2 次提交
  10. 20 5月, 2011 8 次提交
  11. 04 5月, 2011 2 次提交
  12. 07 4月, 2011 1 次提交
  13. 31 3月, 2011 1 次提交
  14. 30 3月, 2011 4 次提交
  15. 29 3月, 2011 1 次提交
  16. 27 3月, 2011 1 次提交
  17. 26 3月, 2011 1 次提交
    • S
      ceph: flush msgr_wq during mds_client shutdown · ef550f6f
      Sage Weil 提交于
      The release method for mds connections uses a backpointer to the
      mds_client, so we need to flush the workqueue of any pending work (and
      ceph_connection references) prior to freeing the mds_client.  This fixes
      an oops easily triggered under UML by
      
       while true ; do mount ... ; umount ... ; done
      
      Also fix an outdated comment: the flush in ceph_destroy_client only flushes
      OSD connections out.  This bug is basically an artifact of the ceph ->
      ceph+libceph conversion.
      Signed-off-by: NSage Weil <sage@newdream.net>
      ef550f6f
  18. 23 3月, 2011 1 次提交
  19. 22 3月, 2011 1 次提交
    • S
      libceph: fix osd request queuing on osdmap updates · 6f6c7006
      Sage Weil 提交于
      If we send a request to osd A, and the request's pg remaps to osd B and
      then back to A in quick succession, we need to resend the request to A. The
      old code was only calling kick_requests after processing all incremental
      maps in a message, so it was very possible to not resend a request that
      needed to be resent.  This would make the osd eventually time out (at least
      with the current default of osd timeouts enabled).
      
      The correct approach is to scan requests on every map incremental.  This
      patch refactors the kick code in a few ways:
       - all requests are either on req_lru (in flight), req_unsent (ready to
         send), or req_notarget (currently map to no up osd)
       - mapping always done by map_request (previous map_osds)
       - if the mapping changes, we requeue.  requests are resent only after all
         map incrementals are processed.
       - some osd reset code is moved out of kick_requests into a separate
         function
       - the "kick this osd" functionality is moved to kick_osd_requests, as it
         is unrelated to scanning for request->pg->osd mapping changes
      Signed-off-by: NSage Weil <sage@newdream.net>
      6f6c7006
  20. 16 3月, 2011 1 次提交
  21. 05 3月, 2011 3 次提交
    • S
      libceph: fix msgr standby handling · e00de341
      Sage Weil 提交于
      The standby logic used to be pretty dependent on the work requeueing
      behavior that changed when we switched to WQ_NON_REENTRANT.  It was also
      very fragile.
      
      Restructure things so that:
       - We clear WRITE_PENDING when we set STANDBY.  This ensures we will
         requeue work when we wake up later.
       - con_work backs off if STANDBY is set.  There is nothing to do if we are
         in standby.
       - clear_standby() helper is called by both con_send() and con_keepalive(),
         the two actions that can wake us up again.  Move the connect_seq++
         logic here.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e00de341
    • S
      libceph: fix msgr keepalive flag · e76661d0
      Sage Weil 提交于
      There was some broken keepalive code using a dead variable.  Shift to using
      the proper bit flag.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e76661d0
    • S
      libceph: fix msgr backoff · 60bf8bf8
      Sage Weil 提交于
      With commit f363e45f we replaced a bunch of hacky workqueue mutual
      exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
      that the exponential backoff breaks in certain cases:
      
       * con_work attempts to connect.
       * we get an immediate failure, and the socket state change handler queues
         immediate work.
       * con_work calls con_fault, we decide to back off, but can't queue delayed
         work.
      
      In this case, we add a BACKOFF bit to make con_work reschedule delayed work
      next time it runs (which should be immediately).
      Signed-off-by: NSage Weil <sage@newdream.net>
      60bf8bf8
  22. 04 3月, 2011 2 次提交
    • S
      libceph: retry after authorization failure · 692d20f5
      Sage Weil 提交于
      If we mark the connection CLOSED we will give up trying to reconnect to
      this server instance.  That is appropriate for things like a protocol
      version mismatch that won't change until the server is restarted, at which
      point we'll get a new addr and reconnect.  An authorization failure like
      this is probably due to the server not properly rotating it's secret keys,
      however, and should be treated as transient so that the normal backoff and
      retry behavior kicks in.
      Signed-off-by: NSage Weil <sage@newdream.net>
      692d20f5
    • S
      libceph: fix handling of short returns from get_user_pages · 38815b78
      Sage Weil 提交于
      get_user_pages() can return fewer pages than we ask for.  We were returning
      a bogus pointer/error code in that case.  Instead, loop until we get all
      the pages we want or get an error we can return to the caller.
      Signed-off-by: NSage Weil <sage@newdream.net>
      38815b78