1. 04 5月, 2011 2 次提交
  2. 07 4月, 2011 1 次提交
  3. 31 3月, 2011 1 次提交
  4. 30 3月, 2011 4 次提交
  5. 29 3月, 2011 1 次提交
  6. 27 3月, 2011 1 次提交
  7. 26 3月, 2011 1 次提交
    • S
      ceph: flush msgr_wq during mds_client shutdown · ef550f6f
      Sage Weil 提交于
      The release method for mds connections uses a backpointer to the
      mds_client, so we need to flush the workqueue of any pending work (and
      ceph_connection references) prior to freeing the mds_client.  This fixes
      an oops easily triggered under UML by
      
       while true ; do mount ... ; umount ... ; done
      
      Also fix an outdated comment: the flush in ceph_destroy_client only flushes
      OSD connections out.  This bug is basically an artifact of the ceph ->
      ceph+libceph conversion.
      Signed-off-by: NSage Weil <sage@newdream.net>
      ef550f6f
  8. 23 3月, 2011 1 次提交
  9. 22 3月, 2011 1 次提交
    • S
      libceph: fix osd request queuing on osdmap updates · 6f6c7006
      Sage Weil 提交于
      If we send a request to osd A, and the request's pg remaps to osd B and
      then back to A in quick succession, we need to resend the request to A. The
      old code was only calling kick_requests after processing all incremental
      maps in a message, so it was very possible to not resend a request that
      needed to be resent.  This would make the osd eventually time out (at least
      with the current default of osd timeouts enabled).
      
      The correct approach is to scan requests on every map incremental.  This
      patch refactors the kick code in a few ways:
       - all requests are either on req_lru (in flight), req_unsent (ready to
         send), or req_notarget (currently map to no up osd)
       - mapping always done by map_request (previous map_osds)
       - if the mapping changes, we requeue.  requests are resent only after all
         map incrementals are processed.
       - some osd reset code is moved out of kick_requests into a separate
         function
       - the "kick this osd" functionality is moved to kick_osd_requests, as it
         is unrelated to scanning for request->pg->osd mapping changes
      Signed-off-by: NSage Weil <sage@newdream.net>
      6f6c7006
  10. 16 3月, 2011 1 次提交
  11. 05 3月, 2011 3 次提交
    • S
      libceph: fix msgr standby handling · e00de341
      Sage Weil 提交于
      The standby logic used to be pretty dependent on the work requeueing
      behavior that changed when we switched to WQ_NON_REENTRANT.  It was also
      very fragile.
      
      Restructure things so that:
       - We clear WRITE_PENDING when we set STANDBY.  This ensures we will
         requeue work when we wake up later.
       - con_work backs off if STANDBY is set.  There is nothing to do if we are
         in standby.
       - clear_standby() helper is called by both con_send() and con_keepalive(),
         the two actions that can wake us up again.  Move the connect_seq++
         logic here.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e00de341
    • S
      libceph: fix msgr keepalive flag · e76661d0
      Sage Weil 提交于
      There was some broken keepalive code using a dead variable.  Shift to using
      the proper bit flag.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e76661d0
    • S
      libceph: fix msgr backoff · 60bf8bf8
      Sage Weil 提交于
      With commit f363e45f we replaced a bunch of hacky workqueue mutual
      exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
      that the exponential backoff breaks in certain cases:
      
       * con_work attempts to connect.
       * we get an immediate failure, and the socket state change handler queues
         immediate work.
       * con_work calls con_fault, we decide to back off, but can't queue delayed
         work.
      
      In this case, we add a BACKOFF bit to make con_work reschedule delayed work
      next time it runs (which should be immediately).
      Signed-off-by: NSage Weil <sage@newdream.net>
      60bf8bf8
  12. 04 3月, 2011 2 次提交
    • S
      libceph: retry after authorization failure · 692d20f5
      Sage Weil 提交于
      If we mark the connection CLOSED we will give up trying to reconnect to
      this server instance.  That is appropriate for things like a protocol
      version mismatch that won't change until the server is restarted, at which
      point we'll get a new addr and reconnect.  An authorization failure like
      this is probably due to the server not properly rotating it's secret keys,
      however, and should be treated as transient so that the normal backoff and
      retry behavior kicks in.
      Signed-off-by: NSage Weil <sage@newdream.net>
      692d20f5
    • S
      libceph: fix handling of short returns from get_user_pages · 38815b78
      Sage Weil 提交于
      get_user_pages() can return fewer pages than we ask for.  We were returning
      a bogus pointer/error code in that case.  Instead, loop until we get all
      the pages we want or get an error we can return to the caller.
      Signed-off-by: NSage Weil <sage@newdream.net>
      38815b78
  13. 26 1月, 2011 2 次提交
    • S
      libceph: fix socket write error handling · 42961d23
      Sage Weil 提交于
      Pass errors from writing to the socket up the stack.  If we get -EAGAIN,
      return 0 from the helper to simplify the callers' checks.
      Signed-off-by: NSage Weil <sage@newdream.net>
      42961d23
    • S
      libceph: fix socket read error handling · 98bdb0aa
      Sage Weil 提交于
      If we get EAGAIN when trying to read from the socket, it is not an error.
      Return 0 from the helper in this case to simplify the error handling cases
      in the caller (indirectly, try_read).
      
      Fix try_read to pass any error to it's caller (con_work) instead of almost
      always returning 0.  This let's us respond to things like socket
      disconnects.
      Signed-off-by: NSage Weil <sage@newdream.net>
      98bdb0aa
  14. 13 1月, 2011 3 次提交
    • T
      net/ceph: make ceph_msgr_wq non-reentrant · f363e45f
      Tejun Heo 提交于
      ceph messenger code does a rather complex dancing around multithread
      workqueue to make sure the same work item isn't executed concurrently
      on different CPUs.  This restriction can be provided by workqueue with
      WQ_NON_REENTRANT.
      
      Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
      level and remove the QUEUED/BUSY logic.
      
      * This removes backoff handling in con_work() but it couldn't reliably
        block execution of con_work() to begin with - queue_con() can be
        called after the work started but before BUSY is set.  It seems that
        it was an optimization for a rather cold path and can be safely
        removed.
      
      * The number of concurrent work items is bound by the number of
        connections and connetions are independent from each other.  With
        the default concurrency level, different connections will be
        executed independently.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Sage Weil <sage@newdream.net>
      Cc: ceph-devel@vger.kernel.org
      Signed-off-by: NSage Weil <sage@newdream.net>
      f363e45f
    • J
      ceph: Always free allocated memory in osdmap_decode() · b0aee351
      Jesper Juhl 提交于
      Always free memory allocated to 'pi' in
      net/ceph/osdmap.c::osdmap_decode().
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NSage Weil <sage@newdream.net>
      b0aee351
    • S
      ceph: add dir_layout to inode · 6c0f3af7
      Sage Weil 提交于
      Add a ceph_dir_layout to the inode, and calculate dentry hash values based
      on the parent directory's specified dir_hash function.  This is needed
      because the old default Linux dcache hash function is extremely week and
      leads to a poor distribution of files among dir fragments.
      Signed-off-by: NSage Weil <sage@newdream.net>
      6c0f3af7
  15. 18 12月, 2010 2 次提交
  16. 14 12月, 2010 1 次提交
  17. 28 11月, 2010 1 次提交
  18. 23 11月, 2010 1 次提交
  19. 22 11月, 2010 1 次提交
  20. 10 11月, 2010 3 次提交
    • S
      ceph: explicitly specify page alignment in network messages · c5c6b19d
      Sage Weil 提交于
      The alignment used for reading data into or out of pages used to be taken
      from the data_off field in the message header.  This only worked as long
      as the page alignment matched the object offset, breaking direct io to
      non-page aligned offsets.
      
      Instead, explicitly specify the page alignment next to the page vector
      in the ceph_msg struct, and use that instead of the message header (which
      probably shouldn't be trusted).  The alloc_msg callback is responsible for
      filling in this field properly when it sets up the page vector.
      Signed-off-by: NSage Weil <sage@newdream.net>
      c5c6b19d
    • S
      ceph: make page alignment explicit in osd interface · b7495fc2
      Sage Weil 提交于
      We used to infer alignment of IOs within a page based on the file offset,
      which assumed they matched.  This broke with direct IO that was not aligned
      to pages (e.g., 512-byte aligned IO).  We were also trusting the alignment
      specified in the OSD reply, which could have been adjusted by the server.
      
      Explicitly specify the page alignment when setting up OSD IO requests.
      Signed-off-by: NSage Weil <sage@newdream.net>
      b7495fc2
    • S
      ceph: fix comment, remove extraneous args · e98b6fed
      Sage Weil 提交于
      The offset/length arguments aren't used.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e98b6fed
  21. 02 11月, 2010 1 次提交
    • S
      ceph: fix small seq message skipping · df9f86fa
      Sage Weil 提交于
      If the client gets out of sync with the server message sequence number, we
      normally skip low seq messages (ones we already received).  The skip code
      was also incrementing the expected seq, such that all subsequent messages
      also appeared old and got skipped, and an eventual timeout on the osd
      connection.  This resulted in some lagging requests and console messages
      like
      
      [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
      [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
      [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
      [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
      [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
      [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
      [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
      [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
      [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
      [233500.534614] ceph:  tid 6023602 timed out on osd22, will reset osd
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSage Weil <sage@newdream.net>
      df9f86fa
  22. 21 10月, 2010 5 次提交